REST results incomplete on 4.2.2 and 4.3

I am issuing a query to the REST API with /api/v3/sql. After the results are ready I am loading them with /api/v3/job/JOB_ID/results?offset=0&limit=500.

The results at 1772 but when I step through the results the 0 offset returns 500. The 500 offset only returns 139. The other two batches (1000 and 1500) returns 0. The total rows returned is 639.

This was working in our environment on version 4.1.3.

The cluster is deployed to AWS EKS. I have tested with our normal 4 executor nodes as well as running with just 1 executor node. There is 1 coordinator node.

Any ideas what might be going on?

1 Like

@mlilius, this is a known issue we will address in a future release. The current workaround is to adjust the offset according to how many records were last returned. This generally should not happen for most queries.

@ben, first thank you for taking the time.

I’m assuming you mean if I get 500 rows in the first batch then 215 in the second batch the third batch offset should be 715. Is that correct? If it is that did not work.

Interestingly I have a result set that should have 1398 rows. When I get offset 696 with a limit of 500 there are 9 rows. When I increase the offset by just one and the same 500 limit I get 0 rows.

@mlilius, can you provide the query profile for the job? The other instances where we have seen this problem, the full set of records is eventually available, even though you have to page through them in the way I described.

If you run the exact same query in the Dremio UI, does is it appear you are able to scroll through all the results? If you open your browsers developer tools and track the requests being made to the jobs endpoint, do you see similar behavior?

@ben

Here is the query profile:
4d9a187c-dc0a-4777-809b-1ed5f6f69f03.zip (55.1 KB)

When I run (not preview) it via the UI it loads the first 3 batches of 100. The forth returns 0.

When I use postman to load the data from /apiv2/job/JOBID/data?offset=#&limit=100 just like the UI does here are the row counts for each offset:
0: 100
100: 100
200: 100
300: 0
400: 66
every other step returns 0

With postman again and using /api/v3/job/JOBID/results?offset=#&limit=100 the number of rows returned matches the same pattern from above.

Thanks @mlilius, when you do a RUN (rather than a Preview) of the query in the UI, are you able to see all of the records? I see the query should return 1,772 records in total.

This is the first instance of this issue where I have seen the result set is not fully available. Can you share any additional details about the EKS cluster or the dataset?

@ben

I setup the new AWS Marketplace version which is build: 4.2.1-202004111451200819-0c3ecaea

I configured the same sources, PDSs, VDSs on this environment and ran the same query. There should be 1772 results. Here is what I got for each batch of 100:

offset : results
0 : 100
100 : 100
200 : 100
300 : 0
400 : 66
500* : 0
total = 366

When I get the records in batches of 500 using your recommended logic of using the result count to calculate the offset:
offset : results
0 : 500
500 : 139
639 : 229
868 : 15
883 : 1
884 : 0
total = 884

Using developer tools to see the data loaded as I scroll down through the results here is what I see.
/apiv2/job/213a3850-5166-8fbc-d0c9-b4aadfe67e00/data?offset=0&limit=100 - rows = 100
/apiv2/job/213a3850-5166-8fbc-d0c9-b4aadfe67e00/data?offset=100&limit=100 - rows = 100
/apiv2/job/213a3850-5166-8fbc-d0c9-b4aadfe67e00/data?offset=200&limit=100 - rows = 100
/apiv2/job/213a3850-5166-8fbc-d0c9-b4aadfe67e00/data?offset=300&limit=100 - rows = 0
/apiv2/job/213a3850-5166-8fbc-d0c9-b4aadfe67e00/data?offset=0&limit=5001 - rows = 1772

The last request loads all the rows.

Based on this I tried loading the data using apiv2/job/__/data with a limit of 50,000 and offset of 0. Surprisingly this worked and loaded all 1772 rows.

/api/v3/job/___/results?offset=0&limit=50000 does not work because it only allows a limit up to 500.

My guess would be that this is not an EKS problem since I am seeing similar results in a different environment.

My other guess would be that it is not dataset related since it works with apiv2 and not with api/v3.

Our EKS setup was originally 3.x (I don’t remember the exact version). It has been upgraded through versions. It is now running 4.3. While trying to diagnose this issue I rolled back from 4.3 to an earlier version by restoring from a dremio backup.

Normally it is running 4 executor nodes but I tested it with just 1 with the same results.

The dataset sources are S3 and MySQL. They are joined together. Then there is a VDS that uses ROW_NUMBER() OVER. The next VDS does a self join to do a moving window average. The final VDS does a GROUP BY and has a MAX() OVER in it.

I hope this helps. I also want to say how awesome I think Dremio is.

Matthew

Thanks Mathew, this helps to reproduce the issue.