In my working scenario I have a client environment written in JAVA and setup a development architecture deploying GitHub - tabular-io/docker-spark-iceberg docker images and S3 compatible local storage (minio). In this case it using a rest iceberg catalog which is backed by a sqlite database. I am able to create and manipulate the iceberg tables programatically in spark (javas, sql, python) but when I try to connect them to dremio and this is the error I get:
This folder does not contain a filesystem-based Iceberg table. If the table in this folder is managed via a catalog such as Hive, Glue, or Nessie, please use a data source configured for that catalog to connect to this table.
On the other hand I have created successfully an Iceberg table from Dremio in the same Minio bucket.
I am wondering if a jdbc catalog (I could use postgreSQL for example) would be recognised by dremio… Or should I install HIVE and connect it to minio?
I would like to avoid the need for a spark environment if possible.
same message, same problem.
@txalaparta When you created the Iceberg table, which catalog was set?
I created the Iceberg table with a REST catalog built in tabulario/iceberg-rest docker image.
I also created another table in dremio and this saves data in minio. However cannot connect to it from JAVA API nor from PyIcberg. What is the type of the catalog created in dremio? According to dremio documentation at Dremio, mino should be a Hadoop Iceberg catalog right?
we are using dremio with minio and apache iceberg, by the moment the best result are with the iceberg catalog on hadoop type in the same bucket, because on dremio you must configure the catalog not the bucket on minio.
this is and examplo to create with spark the catalog on minio bucket,
then the table metadata and data area written in this bucket, and dremio can read that bucket and you can take every folder as table and format as apache iceberg,
we are wating for nessie integration…
Thanks for the info Nicolas.
It is very helpful.
I will try to configure a hadoop type catalog,
I used trino icberg connector to create catalog but it does not work. Just can’t read (same message) Iceberg connector — Trino 418 Documentation
Tryed Docker, Spark, and Iceberg: The Fastest Way to Try Iceberg! • Tabular as rest catalog for the Trino (Iceberg connector — Trino 418 Documentation) Then create iceberg table in s3 path using Trino.
But anyway Dremio can’t read this folder and i receive same message “This folder does not contain filesystem based Iceberg table…” also tryed different s3 provider but same…
I´ve left this issue aside for a while and not planning to continue yet. In any case I think the solution is by implementing a hadoop or hive catalog instead of jdbc or rest. I am quite sure the last two will not work in dremio.
I saw some documentation on how to configure hadoop to connect to Minio and also some github to create such a catalog in Trino. (Sorry but I don´t have the links right now)
But yes, I seems that Spark (or Trino) is needed.
i think the best way to work with minio iceberg, its getting the nessie conector, its pretty useful.
right now, we are using airflow-spark solution that can sabe data in raw format in iceberg tables using hadoop catalog, then with dremio we read all data formating the folder that contains data and metadata folders, for ddl task and datamanagement we are using jupyter notebooks.