I’ve encountered since I am on Community Edition v14.0 the error
List field exceeded the maximum number of elements 128, as some of my parquet files have list with more than 128 items.
I did not have this error message in the past (with v4.9)
Do you confirm it’s a new limit ? I do not see it in documentation about limits : https://docs.dremio.com/advanced-administration/limits.html
Is there any way to change this limit ?
Thanks
Here is the error log :
2021-03-29 09:50:56,145 [grpc-default-executor-16139] INFO c.d.service.jobs.JobResultsStore - User Error Occurred [ErrorId: 1e156f0f-2185-4028-b241-1e8b016b8327]
com.dremio.common.exceptions.UserException: List field ‘data_cart_items’ exceeded the maximum number of elements 128
at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:804)
at com.dremio.service.jobs.JobResultsStore.loadJobData(JobResultsStore.java:145)
at com.dremio.service.jobs.JobResultsStore$LateJobLoader.load(JobResultsStore.java:294)
at com.dremio.service.jobs.JobDataImpl.range(JobDataImpl.java:46)
at com.dremio.service.jobs.LocalJobsService.getJobData(LocalJobsService.java:906)
at com.dremio.service.jobs.JobsFlightProducer.getStream(JobsFlightProducer.java:76)
at org.apache.arrow.flight.FlightService.doGetCustom(FlightService.java:111)
at org.apache.arrow.flight.FlightBindingService$DoGetMethod.invoke(FlightBindingService.java:144)
at org.apache.arrow.flight.FlightBindingService$DoGetMethod.invoke(FlightBindingService.java:134)
at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:172)
at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
at io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
at io.grpc.util.TransmitStatusRuntimeExceptionInterceptor$1.onHalfClose(TransmitStatusRuntimeExceptionInterceptor.java:74)
at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)
at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:820)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Dremio did not support complex type resolution during planning in v4.9 and were treating the datatype is a mixed type. In the current version, the planner detects the right data type as ARRAY and hit the maximum elements per LIST limit
Thanks -
I found this did not fix the issue with the dremio university lesson - I increased the store.parquet.block-size incrementally to 8GB but the same message reported "
List field ‘friends’ exceeded the maximum number of elements 128.Show more"
Uploading this file from the university fails - anybody have any ideas how to get of this response please?
@irnerd Currently no, these settings have to be carefully modified under guidance as some can cause cluster instability, the specific one you have changed is generally safe, across versions these can also change,
We ran into this issue too. I would expect any valid parquet file to be readable by Dremio. As it stands, the user runs into issues unexpectant upon the first occurrence. Maybe you send a parquet file with 1k cols, fail… send a parquet file using the legacy list structure, fail, send a parquet file with list col over 128 entries, fail… this list of limitations on top of the parquet format leaves the end user questioning when the next generated parquet file will break things.
This is super confusing for new users - people doing a tutorial from the Vendor definitely expect it to work fully. Perhaps remove the friends field from the parquet file?
I disagree - I think it is an excellent use case and people should be aware - Given the complexity of data structures - this was a really useful exercise in being able to manipulate large parquets - vote for leaving it in, but maybe supplement the training session with a note about the configuration parameter?
@balaji.ramaswamy Is there any hard limit for this property ‘store.parquet.list_items.threshold’? We need to set it much higher(~20000) due to our business use case. What kind of issues we can face in this case?