Based on the documentation is it not clear to me how Dremio with reflection works on an updated dataset.
Let say we are refreshed my reflection at 12:00 and I have updated (append) my underlying delta lake dataset at 12:15. When I run a query at 12:30 will it have the data from 12:15 as well or just the reflection data from 12:00?
@andormarkus If the update was to the same file then you just need to refresh the reflection either vi the API or the background refresh. If the ETL adds new files to the lake then you would first need to refresh metadata via SQL or the background refresh complete
Delta Lake does not change files it just add files or remove.
Let say I refresh my metadata in every 10 minutes and I got the following situation:
12:00 - reflection was updated (next update 13:00)
12:15 - new files were added to the lake
12:20 - metadata was refreshed
12:30 - query runs
Will in this case Dremio pick up data till 12:00 from reflection and the remaining from S3 or it will pick up all of the data from S3?
Is metadata refresh is “expensive” operation?
@andormarkus 100% of the data will come reflection but it would stale (as of 12:00), metadata refresh can be expensive if you have too many small files