Dremio 800 column per table hard limit work around

While I was aware of this 800 column hard limit my understanding was that the SQL query had to refer all of those 800+ columns and only then it threw an error which is highly unlikely as very few monstrous queries may refer 800+ columns.
My altered understanding is that the dataset does not even get loaded and throws the error ‘Number of fields in dataset exceeded the maximum number of fields of 800’, if the SQL query has a table with 800+ columns even if the actual SQL query uses just 1-2 columns from that particular table.
In other words below query will fail in dremio

Select ColumnA from table ABC

if ABC is a table with 800+ columns

I get it that you guys need to load it and you have put a hard limit there but now is there any workaround this issue. Is there a way we can increase the hard limit to say 2000 instead of 800. If not please consider to make this configurable.

2 Likes

@gururajnayak That’s a great question ! The current model is to collect metadata for all the columns and hence you are getting this error, Dremio is looking at options where it will only collect metadata on the columns referenced in a VDS/query.

Ok thanks balaji. I can’t help but think why the hard limit at all ? Why can’t we set it to the system memory available instead of limiting it. I mean dremio can always throw a memory exception and handle it gracefully when the user say refers a table with 2000 columns or whatever which exceeds the memory available.
Limiting it makes it so hard. It is like, I have the memory but this hard limit stops me from going ahead with query even though i have the resources.
I would request the dev team to look…

hi @balaji.ramaswamy

is Dremio likely to be increasing the 800 limit, or exposing it as an option for v18? (querying ElasticSearch which frequently gets blocked by this)

Hi @johnshearer not in 18, 18 will have unlimited splits. Currently there is no ETA for the 800 column limit

1 Like

Looking into this a little bit from the source code here i see that the value is set as 800 but I can’t find a reasoning for this value? @balaji.ramaswamy, what are the implications of setting this value to 2400 or some other higher value?

@ankurcha You would have heap pressure during metadata collection / planning via Full GC pauses