Adding to a Format to a 'directory'

Using an Azure Storage here, though I doubt it is related.

I have a ‘directory’ in this , which I want to apply the Parquet format to.

Under this ‘directory’ are numerous ‘directories’ which have ONLY parquet files in.

After a while I get the ‘too many splits’ issue as the format is being applied.

So, while I could potentially apply the format to each individual directory - I don’t want to, as it is one ‘directory’ per day, per year (name format of each of these is yyyy-mm-dd 03:00:00)

Are there any suggestions how I could achieve this, or a good alternative?

@surreynorthern,

Our current limit for maximum number of splits for file system sources is 60,000. What is the row group size defined in the ETL job? Can it be altered to a bigger size so the total number of splits come down?

Thanks
Bali

Hi,

Like another poster, who has had the same issue, I don’t have control of the parquet, and the third party ship has sailed for reconfiguring and rerunning the parquet creation.