@ben
I setup the new AWS Marketplace version which is build: 4.2.1-202004111451200819-0c3ecaea
I configured the same sources, PDSs, VDSs on this environment and ran the same query. There should be 1772 results. Here is what I got for each batch of 100:
offset : results
0 : 100
100 : 100
200 : 100
300 : 0
400 : 66
500* : 0
total = 366
When I get the records in batches of 500 using your recommended logic of using the result count to calculate the offset:
offset : results
0 : 500
500 : 139
639 : 229
868 : 15
883 : 1
884 : 0
total = 884
Using developer tools to see the data loaded as I scroll down through the results here is what I see.
/apiv2/job/213a3850-5166-8fbc-d0c9-b4aadfe67e00/data?offset=0&limit=100 - rows = 100
/apiv2/job/213a3850-5166-8fbc-d0c9-b4aadfe67e00/data?offset=100&limit=100 - rows = 100
/apiv2/job/213a3850-5166-8fbc-d0c9-b4aadfe67e00/data?offset=200&limit=100 - rows = 100
/apiv2/job/213a3850-5166-8fbc-d0c9-b4aadfe67e00/data?offset=300&limit=100 - rows = 0
/apiv2/job/213a3850-5166-8fbc-d0c9-b4aadfe67e00/data?offset=0&limit=5001 - rows = 1772
The last request loads all the rows.
Based on this I tried loading the data using apiv2/job/__/data with a limit of 50,000 and offset of 0. Surprisingly this worked and loaded all 1772 rows.
/api/v3/job/___/results?offset=0&limit=50000 does not work because it only allows a limit up to 500.
My guess would be that this is not an EKS problem since I am seeing similar results in a different environment.
My other guess would be that it is not dataset related since it works with apiv2 and not with api/v3.
Our EKS setup was originally 3.x (I don’t remember the exact version). It has been upgraded through versions. It is now running 4.3. While trying to diagnose this issue I rolled back from 4.3 to an earlier version by restoring from a dremio backup.
Normally it is running 4 executor nodes but I tested it with just 1 with the same results.
The dataset sources are S3 and MySQL. They are joined together. Then there is a VDS that uses ROW_NUMBER() OVER. The next VDS does a self join to do a moving window average. The final VDS does a GROUP BY and has a MAX() OVER in it.
I hope this helps. I also want to say how awesome I think Dremio is.
Matthew