Can Dremio be co-used with other compute engine for modifying Iceberg table?

Let’s suppose a situation where a table is stored in Apache Iceberg format in S3. And for some reason we need to use Dremio and Spark to modify the table.

Question 1: In S3, to enable concurrent writes both writers need to be synchronized by the same lock due to eventual consistency of S3. In Spark we can configure how to do the locking. But how about Dremio? Can we configure Dremio so that all writers be configured for the same lock endpoint?

Question 2: If both Dremio and Spark modify table schema, would the change be reflected to the other’s respective metastore? In other words, will Spark’s ALTER TABLE be applied to Dremio metastore or vice versa? Or can Dremio use an external Hive Metastore or AWS Glue?

I’ve come up with these questions when imagining an open lake house architecture on AWS. Here all data is stored in Apache Iceberg format to exploit all the benefits of ACID and transaction, time travel and such. And ideally, we can use any tool that is most suitable for a particular workload. But simply sharing an Iceberg table seems not enough. At least metastore synchronization and shared lock are necessary.

Thanks in advance.

1 Like

Yes. Both engines (Dremio and Spark) need to use the same Iceberg Catalog when doing concurrent read/writes to the Iceberg table. Here is the current list of catalogs that Dremio supports: Dremio