Failure during setup error when from REST Application

Hello, we are getting Failure during setup when running query through REST Application. INVALID_DATASET_METADATA ERROR in profile.
Running same query friom UI works fine and afterwards REST Application runnin same query is also green in logs.
Changing WHERE clause(id in (different value)) in query makes REST app fail again.
Running REFRESH METADATA over whole table helps to resolve the issue for any WHERE clause, but than error returns after a while.

We are querying Aws Glue table directly(simple SELECT)

select * from "Aws glue".xxxx w
here id in  (50372)

Log attached 63121f32-633a-4ffe-972a-96f4c7d1176e.zip (19,0 )

Build
25.0.7-202407181421320869-2632b04f
Edition
AWS Edition (activated)

@vladislav-stolyarov This is the actual error

Caused By (java.io.FileNotFoundException) Version of file changed data/media_segments/data.gz.parquet

UI query may work as it might be truncating the query. Can you please send us the UI profile too?

Any chance you are overwriting this file?

Yes you are right - we overwrite this file occasionally.
Here how it looks like. So UI run was able to resolve metadata change and agter that API calls succeed.

This is ui call profile
d122ba14-9c6c-4ab0-92b8-fb2370558d03.zip (33,9 КБ)

Than my question is why UI call was able to reset error and start with new metadata, but api call not? is it possible to work this around in Rest api call?

Are there any other workarounds?

We have single file glue table without partitions, so can not do partition swicth of whatever, seems on prev versions of dremio this issue was not so critical.
Should we call refresh metadata right after we changed file?

@vladislav-stolyarov This is expected behavior, if you see, the jobs has 2 attempts

attempt 1 - File not found
attempt 2 - successful query

But if you also select job type internal, you will see another “REFRESH DATASET” job that was triggered between the 2 attempts, that is why the second attempt started 20 seconds later. I do not see any issues with this. The other option is what you are saying

  • Delete file
  • Add file
  • Run “ALTER PDS REFRESH METADATA”
  • Run UI or REST query

Yep but why this type of job is not triggered when query failed multiply times from REST? Only after UI query.

@vladislav-stolyarov Understood, it looks like the retry did not work via REST. Can you try via JDBC and see if the retry works? Just want to make sure it is indeed REST

For ODBC Client it does 2 attempts with metadata refresh behind the scenes.

82025f33-19d8-4531-b859-dc2d3842ba7b.zip (30,8 КБ)

Thanks for testing this, looks like REST is not automatically firing, we would need to prioritize and repro this internally, meanwhile, can u do below?

  • File removed
  • run ALTER PDS <PDS_NAME> REFRESH METADATA
  • run REST query

Sorry, missed your message.
What do you mean under File removed? In our scenario file is overwritten, not fully removed. Do you want me to remove it and refresh metadata and run query?

@vladislav-stolyarov Sorry, I assumed file was removed, can you please try below

  • File replaced
  • Run “ALTER PDS REFRESH METADATA”
  • run REST query

Yes it works fine. Thats the original solution we applied at the end.

Another one. As i told it works fine to manually call refresh metadata.
But my problem is that it takes so long. 05m:06s to refresh it. Its a glue table with single parquet file (5.5 mb).
Can you help.
here is a profile if it helps
c2d0446d-fb70-4fac-a483-beed85aa03cb.zip (7,4 КБ)

@vladislav-stolyarov There will be an internal job called “REFRESH DATASET” for this table "Aws glue"."fcp_live"."media_segments", can you please send that profile?

Click on Jobs page
Job type choose only internal
in search bar put media_segments

3940a23a-d82f-4fd5-9c60-caf71a8d44c8.zip (354,8 КБ)
Attached profile.

meanwhile in last 3 days i can see refresh works fine in just 1-5 seconds, so looks like it was a temporal issue. The only change i did is added reflection over media_segments.

@vladislav-stolyarov All time spent on WRITER_COMMITTER, this is a known limitation with PARQUET, are you having plans to move to Iceberg table format?

Can you please elborate.
I do not quite understand what does it mean and why changing from parquet to iceberg would help here? media_segments is a single file parquet (5mb) dataset. I guess reading it and parsing wont take more than a couple of seconds.
So what does dremio wait on and why wont it do thge same if media_segments would be iceberg table not small parquet file.

Also interesting - from execution log over time period i can see that metadata refresh internal job takes either 1-3 seconds or 2.55-3:05 minutes, so this commit write wait has always pretty deteministic wait time.

@vladislav-stolyarov There are known limitations in the code that the WRITER_COMMITTER takes time, for Iceberg Metadata rerfresh is completely eliminated