Delta lake integration

Hello,

We are evaluating dremio and so far it looks quite promising in comparison to other distributed query engines. We however have not been able to figure out how dremio integrates with delta lake in terms of reading the delta table manifest files for ACID transactions on data files. To elaborate further, Delta lake maintains transaction log through which it finds out which table files are part of a version. It even produces manifest file mentioning list of files participating in a version.Can dremio read list of files for a table from manifest file vs always reading files via directory listing. Could you help me understand if this is feasible and point me to some relevant docs please.

Thank you,
Sandhya

5 Likes

Out of box Dremio integration with delta lake would be great. We have a use case where all the data ingestion and processing happens through spark and delta lake and we want to leverage interactive query performance of Dremio.

So far in all the articles and presentation we have seen dremio and spark data source integration. Other-way integration would be very helpful too.

Hello,

Any update on this please. Is an OOB Delta Lake + Dremio integration being planned for ? This is crucial for us if we decide to use Dremio as our Interactive Query Engine. Looking forward to an update.

Thank you,
Sandhya

2 Likes

I asked about this about 8 months ago and was directed to the Hive connector.

Please refer to the Hive Delta Connector and hope a new supported connect for Dremio will be available soon: https://github.com/delta-io/connectors

The link above covers how to connect Hive to Delta.

Closing the loop here. It’s added in the Feb release - Announcing Dremio February 2021 | Dremio

@sandhyaagarwal Please see update from Harsh, we have released support for Delta lake and Iceberg too