Can I use Dremio as Catalog instead of Hive for Iceberg table in Spark?

quangbilly79 · May 14, 2025, 4:52am

I’m currently learning Iceberg and Dremio (Dremio Software), I’m also deploying these services on our Data LakeHouse cluster (which is mostly on-prem with other services like Yarn, ZooKeeper, HDFS, Spark,…, managed by Cloudera service).
I’m a bit confused about terms like Table Format, Catalog, LakeHouse Engine,…
About Iceberg, as far as I understand, it’s a Table Format. It needs a Catalog Service to manage Tables

But can Dremio act as a Catalog Service? I mean, in Dremio WebServer, Dremio can clearly see all my Iceberg tables and databases and manage them.

But can I use Dremio as a catalog instead of Hive for Spark job?

spark = (SparkSession.builder
    .appName(appName)
    .master("yarn")  # Set the cluster manager to YARN
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
    .config("spark.sql.catalog.spark_catalog", "org.apache.iceberg.spark.SparkSessionCatalog")
    .config("spark.sql.catalog.spark_catalog.type", "hive")
    .config("spark.sql.catalog.local", "org.apache.iceberg.spark.SparkCatalog")
    .config("spark.sql.catalog.local.type", "hadoop")
    .getOrCreate())

The above code currently works fine for me, but I wonder if I can change
.config("spark.sql.catalog.spark_catalog.type", "**hive**")
to
.config("spark.sql.catalog.spark_catalog.type", "**dremio**")
?

I’m still confused about what Dremio (Dremio Software) is. Does this act as an Execution Engine service to help me query data from the Iceberg table like Spark, or does it also act as a Catalog service like Hive/Nessie?
Could you guys show me some other tools/services like Dremio, so that I can understand what Dremio is more easily by comparing it to other tools?

AlexMerced · May 14, 2025, 5:33pm

Dremio enterprise edition is three things

Iceberg engine that can operate read and write on iceberg tables
Iceberg catalog that can track and govern iceberg tables
Table service provider that can automate performance optimization and clean up of tables

In Dremio CE, it’s just iceberg engine part and you’d bring your own catalog like Nessie or AWS Glue to the party.

quangbilly79 · May 19, 2025, 2:02am

Thanks.
I just checked my Dremio Software again, in the Dremio Webserver, turned out the “hive_source” in my image is just a Hive catalog that I added through “Add Source” option

But when I read on the internet, Dremio also has a built-in catalog

Dremio’s built-in lakehouse catalog is built on Apache Polaris (incubating). The catalog enables centralized, secure read and write access to your Iceberg tables across different REST-compatible query engines, and automates data maintenance operations to maximize query performance. Key features include:

I wonder if Dremio Software has that?

I mean in “Add Source” option in Dremio Webserver, there is a Dremio Source option, I wonder if that Apache Polaris Catalog people are talking about? Or is it just pointing to another Dremio instance

AlexMerced · May 19, 2025, 2:43am

The built in Dremio catalog is a new feature available in the enterprise version of Dremio which requires a deployment to a K8s environment. If you want you can try a free trial of enterprise on Dremio.com

Topic		Replies	Views
Best Catalog for Apache Iceberg w/ Dremio? Apache Iceberg	2	107	September 3, 2025
Dremio Iceberg JDBC catalog	9	2109	May 24, 2023
Dremio S3 Iceberg Catalog	5	335	July 31, 2025
Iceberg support GA	13	1826	November 7, 2021
Custom query engine?	1	53	August 19, 2024

Can I use Dremio as Catalog instead of Hive for Iceberg table in Spark?

Related topics