Reflection not using latest data

Hi!

I have a bucket where a new json file gets added every 5 minutes. We need the data to be in real time so I used ‘accelerator.enable.subhour.policies’ to schedule an incremental reflection refresh every 5 minutes.

The reflections do get refreshed, however the new records are still not shown and still takes around 1 hour to see the new records.

How do I fix this?

Thanks.

@emdomingo Reflection refreshes are based of the latest metadata collected which by default is every hour, as soon as you have new JSON files, perform an “ALTER PDS REFRESH METADATA” and then the reflection should get the new data. Also you can use the API to trigger the reflection refresh instead of the background refresh

Another option is to move the JSN to Iceberg files using COPY INTO

1 Like

I like Bali’s suggestion. Ingesting into the Iceberg table is like creating an incremental reflection. Then you don’t need to use reflections and just directly query the Iceberg table. If you store the Iceberg table in Hive catalog, then there’s a max 60 second delay to see the data in Dremio. If you use Nessie, then there is zero delay.

Metadata refresh is a big pain which is why we are working on eliminating it completely for table formats such as Iceberg and Delta.

1 Like