If one is starting from scratch how is it that Iceberg / Nessie write to postgres - I would think I would need to create the table(s) and communicate that in a config file.
In the case of Dremio, is Dremio writing that catalog information to Postgres such that the Iceberg metadata files pointers are referenced in the Postgres tables?
It’s not clear to me who does what with Iceberg Nessie and Dremio. On the one hand, I get the impression that Nessie and Iceberg are code libraries from which Dremio can call certain functions. On the other hand, if Dremio has created its own SQL statement Dremio can use that directly without referencing those libraries. But then how does the Postgres catalog get updated? The answer could be “both”.
Thanks in advance for your patience and your kind response.
Dremio doesn’t write to the Nessie’s persistent store directly. I think Postgres in your case. Dremio talks to Nessie through Nessie’s APIs to retrieve Iceberg metadata locations and manage branching/multi-table transaction features.
Benny, Thanks for your reply! Your answer assumes I have Nessie and Iceberg deployed. I guess another way to ask my question, if I don’t have a catalog configured, what would I have to do? How do I ‘install’ Iceberg? I am assuming that would involve postgres but it’s not clear to me how I would implement Iceberg and or Nessie.
I think your main decision is which Iceberg catalog do you want to use? Once you decide on the catalog and setup that catalog as a source in Dremio, SELECT and DML queries on Iceberg tables just works. There’s nothing to ‘install’ for Iceberg table format specifically.
I think if you are just trying out Dremio, you can start with a filesystem source like S3 or NAS which uses the Hadoop Iceberg catalog behind the scenes. Once you are comfortable with Dremio and Iceberg tables, you can try out a more production ready catalog like Nessie.