Friends, good morning, how are you?
I have an extraction process done in TALEND that transforms an excel spreadsheet (.CSV) that is extracted hourly into parquet and sent to S3, still in TALEND I run the command ALTER PDS <parquet_path> FORGET METADATA and then another DREMIO command ALTER PDS <parquet_path> REFRESH METADATA FORCE UPDATE.
In DREMIO, my idea would be to leave the PDS as never refresh and never expire so that TALEND can control the time to expire and refresh the PDS. This is not working correctly, that is, sometimes it works, sometimes it doesn’t.
Would this be good practice? what do you recommend me in this case?
Thank you very much in advance.
@Gerbasi You are mixing Refresh of metadata and refresh of reflections.never refresh
and never expire
are only for Reflections. To avoid automatic background refresh you can only decrease the frequency, if you decide to that, make sure to do 3x on expiry also, you have to do it the source settings Metadata tab (not reflection tab)
Also you do not have to FORGET metadata, just do REFRESH and FORCE UPDATE is not needed too
@balaji.ramaswamy, thanks for the feedback.
Honestly, I still don’t understand how this DREMIO issue works, updating the new data sent to S3 (Metadata) - PDS and updating this data in the VDS via reflection.
for example, I have a data update at 7:00 AM daily (one time only), and in the metadata settings I left it as in the image below:
I would like that as soon as I save the new data in the metadata I trigger the VDS reflection to update it with the new information and that DREMIO no longer needs to carry out new reflection updates during the day.
If you have any documentation or videos explaining how reflections work better, I’d be grateful if you could share.
tks
@Gerbasi If you have a reflection on the VDS then you have to do the below and only in this order
-
New data comes into lake
-
Refresh metadata (not reflection, the screenshot you have is reflection tab, there will be a metadata page), you can use dataset level refresh if you want at once
-
Refresh the reflection - you can use the API refresh if you want at once
-
Refresh metadata SQL - Refreshing Metadata | Dremio Documentation
-
Refresh reflection via API - Table | Dremio Documentation - The ID is the PDS ID and if the VDS has multiple PDS’s, you can use any one