Adding to a Format to a 'directory'

Using an Azure Storage here, though I doubt it is related.

I have a ‘directory’ in this , which I want to apply the Parquet format to.

Under this ‘directory’ are numerous ‘directories’ which have ONLY parquet files in.

After a while I get the ‘too many splits’ issue as the format is being applied.

So, while I could potentially apply the format to each individual directory - I don’t want to, as it is one ‘directory’ per day, per year (name format of each of these is yyyy-mm-dd 03:00:00)

Are there any suggestions how I could achieve this, or a good alternative?


Our current limit for maximum number of splits for file system sources is 60,000. What is the row group size defined in the ETL job? Can it be altered to a bigger size so the total number of splits come down?



Like another poster, who has had the same issue, I don’t have control of the parquet, and the third party ship has sailed for reconfiguring and rerunning the parquet creation.