Data Format on Data Lake Storage

I want to know what kind of data format should be in Data Lake Storage for Dremio to work upon. I am aware of Parquet, JSON but i am looking full list of data format in data lake storage that Dremio support.

Any help or link is greatly appreciated.

Thanking in advance.


We support PARQUET, ORC, JSON, CSV but performance wise PARQUET and ORC would be best suited for wide tables, tables with any rows as partition pruning, filter pushdown capabilities are great and both PARQUET and ORC so metadata for efficient query processing. Out of the 2, PARQUET would be better as you can read PARQUET using Dremio via Hive or Directly via S3/HDFS/Azure storage while for ORC we need a Hive external table or Glue catalog. Dremio’s has vectorized readers for both formats but the Parquet reader is probably one of the fastest so if there is a choice PARQUET would be the first followed by ORC


Thank you Balaji for your replies, your replies shows expertise in Dremio and i appreciate that.

Thank you