Dremio S3 Compatibilty Mode Unable to read parquet

We are not able to read parquet data stored in S3 Compatible Storage.
We are able to read any other file format (csv, excel, …) stored on OVH openstack Swift storage.
We are using OVH Openstack swift with S3 Compatitlity mode enabled.

Here are the steps to reproduce the issue :

  1. In Dremio UI, add a new Data Lake source, then choose Amazon S3

  2. Provide AWS Access & Secret , then switch to Advanced tab enable Compatibility mode + add extra property under Connection Properties (name:“fs.s3a.endpoint”, value: “s3.bhs.cloud.ovh.net”) and save:

  3. Try setting format & reading a simple CSV file located under the S3 compatible Data lake source - Result Should Work

  4. Try Setting the format for your parquet file - Should work

  5. After Saving the parquet format, try reading the parquet file formatted: This is where the following error will be printed:

I downloaded the Dremio Source code from Git, and tried to get more logs. the following log file provide more details.
dremio-error-read-parquet-s3.zip (3.6 KB)

@asakmedops

Can you try to disable async and try?

Thanks @balaji.ramaswamy it works.

I think this is the solution to all S3 Compatible related issues on the forum.

is there a config (xml) to disable async while adding S3 Compatible Obj as distributed storage(fs.s3a.endpoint kind of config) ?

@asakmedops

This is a source level property so afraid not

So the same issue on Distributed storage can’t be fixed at the moment . and that means ONLY AWS S3 & Azure storage can be used as Distributed Storage?

1 Like

@asakmedops To turn off async on reflections add below in dremio.conf on executors
debug.dist.async.enabled: false

Finally works. I tested with latest version 15.0.0

Its not usable in production as distributed store with async off. We had to rollback to Amazon S3. We were getting a lot of timeouts.