DATA_READ_ERROR while trying to accelerate reflections


I have a physical dataset (purple icon) on Dremio from a connection to an S3 bucket. The data files are tab separated .tab files. When I am trying to create a reflection on this dataset I am getting this error:

DATA_READ ERROR: Error processing input: , line=263, char=105146. Content parsed:

Failure while reading file FILE Happened at or shortly before byte position 107970.

Caused By (org.apache.hadoop.fs.s3a.AWSClientIOException) read on FILE com.amazonaws.SdkClientException: Data read has a different length than the expected: dataLength=105146; expectedLength=152111;

I have presented the relevant errors above. However, I have noticed nothing strange about those lines in the files mentioned in the error.

What’s strange, is that I have tried to run the reflections job multiple times on the same dataset and each time, the file (the physical dataset is constructed on top of a folder having several files) mentioned in the error changes, and so do the line and character position of the error.

Is there any reason for this behavior? And what can be done to solve this issue?


Hi Ritik, so you can query your dataset successfully w/o a reflection? But when you create a reflection, you get an error? Can you post both job profiles? Thanks.