Failed to load the data file because of split issue

I am getting the below issue whenever I am trying to load one of my data files:-

Number of splits (54708) in dataset S3Datalake.elastic.logstash exceeds dataset split limit of 50000

In that data file, data on a daily basis getting synced up which ranges between 600GB to 1.2 TB

Hello @abhimanyu,

This is a “guardrail” limit that we introduced to Dremio. It restricts the number of files (splits) that Dremio will process for a table once filters have been applied.

See: https://docs.dremio.com/advanced-administration/limits.html

We will be increasing the default for this particular limit with our upcoming Dremio 3.3.x release. You should be able to access this file after upgrading.

For your current query, can you apply any additional filters to the table?

Hello @ben

I am not able to do anything on that table or any table curated form that table as its giving me the error even before loading. I would love to know if there is any workaround. I have three tables two of them are working fine its just the third one causing the issue but its a giant one.

That’s interesting! Can you give me a rough count for how much you are going to increase that limit. are you guys doubling it or more than that. Also, are you just going to increase the number of split which is 50,000 or also the combined number of split which is 250,000.

Thanks!

we are having the same split limits issue with 3.2.4. is there a way to override that limit or is it hard set? this is a big problem for us as most of our tables will fail almost every time

@aalamir3 aalamir3, I am also going through the same pain I wish if I had known this earlier. please post something if you got to know any solution.

\

@abhimanyu not sure if you are using the community or enterprise edition, we use community. we are looking into removing that restriction or increasing it directly in the code since it is not a configuration parameter. I’ll share the result once we have a successful fix, we just started looking at the code in question few hours ago

looks like release 3.2.8 made the split limit a configurable parameters. building now, will test and report back

to change the dataset and query split limits from 50000 and 25000 respectively, dremio 3.2.8 introduced the following keys:

@aalamir3… What keys? you haven’t mentioned the keys and if you were successfully able to configure it what is the limit to which we can extend it.

@abhimanyu I put the keys in yesterday. I think someone from Dremio removed my comments which goes against the policies of a community board.
planner.dataset_max_split_limit default to 50k i set at 100k and our queries worked fine
planner.query_max_split_limit default to 25k I set to 50k and it worked

@aalamir3 that’s really great! can you please tell em where do i have to make these changes. Actually some person set it up for my organization in AWS and he left and now ia m tryign to fix i. if you can guide me a little that would be really a great help!

@abhimanyu I assume you have dremio 3.2.8 running in your AWS environment? docker or an ec2.
once logged in, navigate to Admin screen in the UI, top right.
on left side select Support. on right side type the support key above exactly and select Show. it will display it with default value. change if you see fit then save.

Hello, @aalamir3 thanks a lot for your help! I really appreciate it.
Do you have any idea by how much we should increase it I mean if we can go beyond 100,000 and if yes will it break something?

I have not seen any splits over 80k yet in our queries, still ran fine but a bit long. I suppose the worst that you will encounter with high value is a failed query, failed executor node, or very slow query. the limits are set for a reason I suppose but it is not one size fits all when it comes to big data.

Did you updated the Dremio?