After running a query, clicking on JSON or CSV download generates a dump of the table in a file but for some reason, the output file does not match the query result in the browser. Aren’t they supposed to match? To reproduce this problem, try adding “ORDER BY” to your sql statement.
Hi Tim,
Can you describe a bit more about how they are different (perhaps with screenshots)?
When we get questions like this it often has to do with the difference between Preview and Run in the web app.
To make it easier to quickly iterate on a query, Dremio works with a sampling of the data by default:
To see the results from the complete set of data, you can choose to Run your query:
Downloads always do a full Run (though are limited to the first million results).
This happens very nondeterministically so let me try to find a public dataset so that you can reproduce it.
I cannot share our own data but this happens with using SQL Server as the data source and executing a query with ORDER BY … e.g.
SELECT col1
FROM table
WHERE …
ORDER BY col1
If you add “limit” then CSV does seem to produce the same result set as what’s shown in the browser.
Uploading the screenshot below where the left side shows “run” query result in dremio and right side is csv output in excel. Results in CSV are not ordered.
Hi Tim, thanks for the additional info - I’m now able to reproduce this.
I’ve created an internal ticket to track this issue and will get back to you once it’s resolved.
I would like to add my vote for this functionality. Our Dremio deployment is growing and our users are become better at SQL. While not a show stopper, asking our users to resort the data once it opens in Excel is time they could be spending on data analysis, especially if there is a complicate ORDER BY statement.
Hi @chris, I also encounter this issue in 3.2.4-201906051751050278-1bcce62. Could you let me know if this issue fixed?
@popejune, I’m not seeing this on the latest version of the Dremio (4.0.5). Are you sure you are Running (rather than Previewing) the query after you have sorted on the desired column?
The CSV export should preserve the ordering…
@ben, I am using Running without any order.
I am not trying version 4 yet. Seems @chris reproduce this issue and have an internal ticket to trace. May I know which version this issue been fixed?
My apologies @popejune; I did not take the LIMIT into account.
If you use a LIMIT and you do not order the results on a unique column (or set of columns that produce a unique tuple), then Dremio does not guaranteed you will get the same records each time, only that the ordering is satisfied.
When doing the CSV export, Dremio re-runs the query and introduces a LIMIT 1000000. So if your query is:
SELECT col_a, col_b FROM "my-table" ORDER BY col_a LIMIT 42
… the SQL for the CSV export will be something like:
CREATE TABLE "__datasetDownload"."ed7c64a7-9564-498d-a8f8-8b2f306e2ae0" STORE AS (type => 'text', fieldDelimiter => ',', lineDelimiter => ' ') WITH SINGLE WRITER AS SELECT * FROM ( SELECT col_a, col_b FROM "my-table" ORDER BY col_a LIMIT 42 ) LIMIT 1000000
(you can see these jobs in the Jobs page if you filter for “Internal” jobs)
If you want the same results each time, you should order by some unique column, or set of columns that form a unique tuple.
Thanks @ben, I will do some testing.