Pagination query does not return as requested records

WJustHM · November 27, 2019, 7:43am

the result has 2677 ,but when i wanna got the 1000 to 1500 just return 91 records

ben · November 27, 2019, 4:57pm

@WJustHM this is a known issue with the API when loading results from some datasets. Using the developer tools in your browser, do you see the same behavior when you run the same query from the Dremio UI?

WJustHM · November 28, 2019, 9:37am

@ben The result is correct in browser,Exists only in code

WJustHM · November 28, 2019, 9:46am

hello @ben

How should i solve this problem

Best Regards

ben · November 29, 2019, 6:40pm

@WJustHM, if it is a physical dataset, you might try forgetting the metadata and reformatting, but this will effect downstream virtual datasets.

Could you share the profile for the job you are returning the results from?

WJustHM · January 8, 2020, 3:28am

@ben Sorry,The problem has always been,I don’t know why,Please

Best Regardsdb2f5277-fda7-4c20-acf8-55bf8b407ff8.zip (41.4 KB)

david.lee · May 6, 2020, 11:27pm

This bug is still a problem. Running Sql in the dremio web UI returns 6700 rows, but running the same SQL using the REST api using row offsets only returns back 4000 rows total.

david.lee · May 6, 2020, 11:42pm

Is there anyway to change the rest api limit from 500?

offset - Sets where you want to start from. Default: 0
limit - Determines how many rows are returned (maximum of 500). Default: 100

Farhan · May 24, 2020, 12:32am

I am also facing the same issue.

I have a query like this :

select distinct columnA "value" from "spaceA"."datasetA" where columnA is not null and columnA not in (select distinct "value" from "spaceA"."datasetB" where "type" = 'someType')

This query returns 27,000+ results.

I am using SQL rest API to execute this.

Then I use jobs API to retrieve results in batches, each of size 500 (max limit), by explicitly setting offset and limit in each rest call like this :

I start from offset 0 and keep shifting in each call.

http://localhost:9047/api/v3/job/21365044-f881-d591-7e91-3dc1284fb400/results?offset=2000&limit=500

The rows list returned after offset 2000 is empty… In all, I don’t get results more than 2.5k…

Response with empty result:

{
    "rowCount": 27527,
    "schema": [
        {
            "name": "value",
            "type": {
                "name": "VARCHAR"
            }
        }
    ],
    "rows": []
}

I am expecting close to millions of records like this, to be retrieved sequentially.

What should be the resolution for this, please advice.

doron · July 16, 2020, 5:18pm

This was fixed in Dremio 4.5.0.

edksk · August 25, 2020, 4:52am

Hi, my team has recently upgraded our Dremio to version 4.7.2 (Community Edition).

However, we are still facing this issue when retrieving results using the REST API, even after clearing/refreshing the metadata of the physical datasets.

From the screenshot below, although there are 2341 results, only 1871 results are retrieved.

We hope that this bug can be resolved soon.

edksk · August 25, 2020, 4:55am

For those who are facing the same issue, the current workaround that my team did was to check the number of results retrieved. If the number of results is lesser than what was expected, re-retrieve the results in that offset range using smaller limits. If the number of results retrieved in that offset range is lesser than what was expected, re-retrieve the results using limit=1. So far, we managed to retrieve all the results using this implementation (Although this will increase the number of Dremio API calls needed and slow down the retrieval speed).

The screenshot below contains the snippet of the Python code and results of the above-mentioned implementation.

Hope this helps!

ben · August 25, 2020, 3:46pm

Thanks for pointing this out @edksk,

How many executors are you running, and where are you storing your results? The latter is configured in dremio.conf under results

edksk · August 26, 2020, 2:05am

Hi @ben,

We are running Dremio on cluster deployment (on AWS EC2) with a single coordinator node and a single executor node, and the results are stored locally (/var/lib/dremio/pdfs/results).

Below is the config we are using for the executor node:

paths: {
  # the local path for dremio to store data.
  local: "/var/lib/dremio",
  spilling: ["/var/ephemeral/dremio_spill"]

  # the distributed path Dremio data including job results, downloads, uploads, etc
  #dist: "pdfs://"${paths.local}"/pdfs"
}

dli16 · August 27, 2020, 3:08am

Thank you for sharing the workaround.

I think the issue comes from putting the job result in locally stored PDFS.

Essentially, your code is searching for the boundary offset by increasing one at a time. Maybe we can do better with fewer API calls by implementing a binary search for that offset.

ben · August 27, 2020, 3:31am

@edksk, @dli16,

We have reproduced this issue internally and we are working on a fix that should be available in the near future.

edksk · August 27, 2020, 4:22am

Hi @dli16,

You are welcome. Thanks for your suggestion. Indeed, using a binary search method could likely reduce the number of API calls performed (e.g. setting the limits for retrieval/re-retrieval to be 100, 50, 25, 12, 6, 3, 1).

When we were debugging the issue on our end, we observed that setting a limit of 1 will return exactly 1 result (observed till now that this is true) while any limit > 1 may return less results than intended. However, it would not be practical to retrieve the results one by one.

Thus, for my team, we fixed the limits for retrieval/re-retrieval to be 100, 10 and 1 for the ease of implementation as our actual implementation for the data retrieval is not in Python but using a software. The python code snippet I have shared previously was to illustrate this bug and workaround.

For those facing the same issue, you can choose other values as your limits that would suit your requirements. Just note that your chosen limit for the final re-retrieval attempt should be 1 to ensure that all your results can be successfully retrieved.

Hi @ben,

Thank you for your prompt response. We look forward to the bug fix and other new features. Thank you!

Topic		Replies	Views
Dremio Rest API not retrieving 500 records with offset and limit	6	3686	July 21, 2020
REST API SQL BUG with "offset"	2	2031	July 16, 2020
REST results incomplete on 4.2.2 and 4.3	7	1849	May 22, 2020
REST API Job Results 500 row limit actually 499	1	1671	July 6, 2018
Rest api on sql and job	3	2483	May 23, 2022

Pagination query does not return as requested records

Related topics