Automating Iceberg table update

Hi,
I have original Parquet files resides in an S3 folder (lets call S3-A folder).

I created a new Iceberg table on different S3 folder (lets call S3-B folder).
I have ‘COPY INTO’ Parquet files from the table S3-A to Iceberg table S3-B.

When I am adding new Parquet files into the folder S3-A,
after the Metadata is refreshed, I manage to see the new records from the Parquet files.

However, I did not see the Iceberg table S3-B updated.

Is ‘COPY INTO’ command must be run on Iceberg table after I adding new Parquet files into S3-A folder?

S3-B (Iceberg) has no link or reference to S3-A (Parquet). An alter pds on S3-A will only make Dremio aware of the new files in that bucket (which is S3-A), you have to copy the new files or run a MERGE command to see new files in S3-B

Let me get back to you with the most efficient steps

1 Like

Is there a way to orchestrate / automate the MERGE command within Dremio, or is that something that must be done in an external tool?

1 Like

No, not merge specifically. For that you will have to use something like DBT or some script via Rest.

For ingestion you can use auto-ingest via pipes. That is now available on Dremio cloud for aws and will be available on software soon. To find out more read here - Introducing Auto Ingest Pipes: Event-Driven ingestion made easy | Dremio