Failure during setup error when from REST Application

vladislav-stolyarov · September 23, 2024, 10:46am

Hello, we are getting Failure during setup when running query through REST Application. INVALID_DATASET_METADATA ERROR in profile.
Running same query friom UI works fine and afterwards REST Application runnin same query is also green in logs.
Changing WHERE clause(id in (different value)) in query makes REST app fail again.
Running REFRESH METADATA over whole table helps to resolve the issue for any WHERE clause, but than error returns after a while.

We are querying Aws Glue table directly(simple SELECT)

select * from "Aws glue".xxxx w
here id in  (50372)

Log attached 63121f32-633a-4ffe-972a-96f4c7d1176e.zip (19,0 )

Build
25.0.7-202407181421320869-2632b04f
Edition
AWS Edition (activated)

balaji.ramaswamy · September 25, 2024, 4:21am

@vladislav-stolyarov This is the actual error

Caused By (java.io.FileNotFoundException) Version of file changed data/media_segments/data.gz.parquet

UI query may work as it might be truncating the query. Can you please send us the UI profile too?

Any chance you are overwriting this file?

vladislav-stolyarov · September 25, 2024, 2:13pm

Yes you are right - we overwrite this file occasionally.
Here how it looks like. So UI run was able to resolve metadata change and agter that API calls succeed.

This is ui call profile
d122ba14-9c6c-4ab0-92b8-fb2370558d03.zip (33,9 КБ)

Than my question is why UI call was able to reset error and start with new metadata, but api call not? is it possible to work this around in Rest api call?

Are there any other workarounds?

We have single file glue table without partitions, so can not do partition swicth of whatever, seems on prev versions of dremio this issue was not so critical.
Should we call refresh metadata right after we changed file?

balaji.ramaswamy · September 25, 2024, 8:50pm

@vladislav-stolyarov This is expected behavior, if you see, the jobs has 2 attempts

attempt 1 - File not found
attempt 2 - successful query

But if you also select job type internal, you will see another “REFRESH DATASET” job that was triggered between the 2 attempts, that is why the second attempt started 20 seconds later. I do not see any issues with this. The other option is what you are saying

Delete file
Add file
Run “ALTER PDS REFRESH METADATA”
Run UI or REST query

vladislav-stolyarov · September 26, 2024, 9:02am

Yep but why this type of job is not triggered when query failed multiply times from REST? Only after UI query.

balaji.ramaswamy · October 2, 2024, 5:31am

@vladislav-stolyarov Understood, it looks like the retry did not work via REST. Can you try via JDBC and see if the retry works? Just want to make sure it is indeed REST

vladislav-stolyarov · October 2, 2024, 12:14pm

For ODBC Client it does 2 attempts with metadata refresh behind the scenes.

82025f33-19d8-4531-b859-dc2d3842ba7b.zip (30,8 КБ)

balaji.ramaswamy · October 3, 2024, 5:48am

Thanks for testing this, looks like REST is not automatically firing, we would need to prioritize and repro this internally, meanwhile, can u do below?

File removed
run ALTER PDS <PDS_NAME> REFRESH METADATA
run REST query

vladislav-stolyarov · October 14, 2024, 11:59am

Sorry, missed your message.
What do you mean under File removed? In our scenario file is overwritten, not fully removed. Do you want me to remove it and refresh metadata and run query?

balaji.ramaswamy · October 18, 2024, 4:59pm

@vladislav-stolyarov Sorry, I assumed file was removed, can you please try below

File replaced
Run “ALTER PDS REFRESH METADATA”
run REST query

vladislav-stolyarov · October 21, 2024, 2:58pm

Yes it works fine. Thats the original solution we applied at the end.

vladislav-stolyarov · October 25, 2024, 10:54am

Another one. As i told it works fine to manually call refresh metadata.
But my problem is that it takes so long. 05m:06s to refresh it. Its a glue table with single parquet file (5.5 mb).
Can you help.
here is a profile if it helps
c2d0446d-fb70-4fac-a483-beed85aa03cb.zip (7,4 КБ)

balaji.ramaswamy · October 27, 2024, 11:10pm

@vladislav-stolyarov There will be an internal job called “REFRESH DATASET” for this table "Aws glue"."fcp_live"."media_segments", can you please send that profile?

Click on Jobs page
Job type choose only internal
in search bar put media_segments

vladislav-stolyarov · October 28, 2024, 9:25am

3940a23a-d82f-4fd5-9c60-caf71a8d44c8.zip (354,8 КБ)
Attached profile.

meanwhile in last 3 days i can see refresh works fine in just 1-5 seconds, so looks like it was a temporal issue. The only change i did is added reflection over media_segments.

balaji.ramaswamy · November 1, 2024, 1:11am

@vladislav-stolyarov All time spent on WRITER_COMMITTER, this is a known limitation with PARQUET, are you having plans to move to Iceberg table format?

vladislav-stolyarov · November 8, 2024, 6:06pm

Can you please elborate.
I do not quite understand what does it mean and why changing from parquet to iceberg would help here? media_segments is a single file parquet (5mb) dataset. I guess reading it and parsing wont take more than a couple of seconds.
So what does dremio wait on and why wont it do thge same if media_segments would be iceberg table not small parquet file.

Also interesting - from execution log over time period i can see that metadata refresh internal job takes either 1-3 seconds or 2.55-3:05 minutes, so this commit write wait has always pretty deteministic wait time.

balaji.ramaswamy · November 13, 2024, 6:22am

@vladislav-stolyarov There are known limitations in the code that the WRITER_COMMITTER takes time, for Iceberg Metadata rerfresh is completely eliminated

Topic		Replies	Views
Failure mode for Dremio metadata refresh for asynchronously mutating data sets	4	1580	January 26, 2022
Metadata Retrieval at query time (AWS Glue)	7	1733	January 16, 2021
Failure while retrieving metadata for table	2	2604	February 15, 2018
Near real time metadata refresh	8	2457	December 10, 2021
Unable to create/read Glue Iceberg tables	12	2308	January 6, 2023

Failure during setup error when from REST Application

Related topics