What is the recommended way to avoid this error? Number of splits (nn) in dataset xxx exceeds dataset split limit of 300000
I understand what’s causing it, but we do not have control over the source data. We had a reflection on the PDS, which failed during refresh because of this condition.
I raised this issue a while back. My post got automatically taken down by their Akismet spam filter awaiting “review” by staff. It’s been sitting in this hidden status for over a month now. I suspect this is an embarrassing problem for dremio and they want as little attention to this problem as possible.
This limit seems like a strange problem to have given dremio’s claims of being a data lake query engine. It claims to be able to handle petabytes of data when this limit restricts it to data sources that has less than 300k data files.
In my post I made reference to elasticsearch’s attitude towards their customers in the early days of always pushing the blame back on to the customer. Instead of simply allowing a larger search to take longer and eventually finish, elasticsearch would fervently hold on to it’s claim of being fast. So when a search is taking too long, the query would simply error out. Their support team would ask the customer to restructure their elasticsearch index and completely reload all their data into elasticsearch again.
It feels like dremio is going down a similar path with this 300k limit. In other forum posts we are being asked to restructure our source data into larger less numerous files to come under this 300k limit.
One piece of advice that others might find useful is to refrain from editing your post right after your initial submission. Even if the edit is to correct a simple typo, the over zealous Akismet spam filter will deem your post as spam and hide your post.
Another thought would be to always copy and paste out your post’s contents to a text editor or something before you submit it. When Akismet hides your post, you no longer have access to it. So if you don’t have a copy in your text editor, the thing you just spent 2 hours typing up is now gone forever. Thanks Dremio.
Hopefully Dremio addresses this issue soon and ditches their delusions that a “fast or nothing” approach is good for their brand image.