I’m creating table using CTAS and reading it. But it gives below error while refreshing metadata.
java.io.IOException: com.dremio.exec.store.parquet.Metadata$TooManySplitsException: Too many splits encountered when processing parquet metadata at file 1_4_0.parquet, maximum is 60000 but encountered 60002 splits thus far.
As I understand limit cannot be changed. So, Is it possible to reduce the parquet column group chunk(split size) of CTAS method?
@Dalai It looks like one of the PDS’s used in the CTAS query has hit the limit, with the current version of Dremio you should no longer hit this limit. What version of Dremio are you on?
My source data is csv on Minio hourly partitioned and I’m creating summarized table on Minio using CTAS method and output data which is Parquet format as default. My data is on Minio S3 storage and configured as a data lake on Dremio. But my target CTAS table partition is not refreshed because of this LIMIT and I’can’t query the target dataset.
Here is sample query to refresh target table partition
ALTER TABLE “s3-source”.“mydata_data” REFRESH METADATA FOR PARTITIONS ( “dir1” = ‘12M’,“dir2” = ‘21D’)
Error:
java.io.IOException: com.dremio.exec.store.parquet.Metadata$TooManySplitsException: Too many splits encountered when processing parquet metadata at file /prs/processed_data/prs_hourly/2022Y/12M/20D/03H/1_4_0.parquet, maximum is 60000 but encountered 60002 splits thus far.
@Dalai Since your source data is CSV you are hitting the 60K limit. In Dremio 24.0, the copy into feature will help, until then you have to do see if you can move to Parquet in batches or have bigger CSV files so you do not hit the limit