Why Dremio is Only Returning a Subset of Results?

yalmasri · August 11, 2022, 11:31am

Hi,

Is there a way to change this behavior?

only result subset returned

dacopan · August 11, 2022, 2:15pm

Ui is only for preview or test your queries, if yo need fetch all records please use odbc or jdbc or any IDE as Datagrip

yalmasri · August 11, 2022, 2:48pm

Thanks, and I agree with you, although it seems natural to me that while building and curating my VDS to keep validating my work incrementally via observing the (actual) number of rows returned from queries, returning a subset might be misleading in this case.

Also what if I want to export (download) the results locally, I want the whole set to work with not just part of it. Why Dremio is availing this download feature in the first place when they expect you to go for an external tool?

dacopan · August 11, 2022, 3:14pm

I understand your point, I can suggest you Use IntelliJ or Datagrip, or dbeaver, you can connect direct to dremio and run your querys here.

yalmasri · August 11, 2022, 4:41pm

Thank you dacopan for sharing that. I think Dremio team will consider this at some point because they know a well-designed semantic layer is a key entry point to the hearts and minds of their customers (specially those with low maturity in data).

Hope also they give me strong grounds to promote this to my customers in turn

balaji.ramaswamy · August 14, 2022, 5:46am

@yalmasri For data correctness from UI, please count instead select * as the UI results truncation is done due to a few reasons and currently there are no plans to change this behavior

As @dacopan mentioned populating over a million rows on the UI is not very useful
For data correctness use count
UI queries generate arrow results files and as rows increase can fill up the local disk
If local disk is slow then writes can be slow

yalmasri · August 14, 2022, 1:30pm

Thank you Balaji again.

Then in this case, you might want to change the behavior in two ways:

When you return truncated results, mention how much is that out of the whole total like: A subset (170,624) of total 1,000,000 rows has been…
When I want to download the results, it should download the whole set

MrJava · August 15, 2022, 10:50am

@yalmasri : why would you like to download such a hugh number of rows? Would you like to further process using Excel or python or any other tooling? Then you can use one of the suggested connectors. Or are you checking query results using notepad?

yalmasri · August 15, 2022, 11:22am

Thank you @MrJava.

We have a lot of legacy-minded customers (data/business analysts) who are familiar with Excel only, and are very good at it. Until they get “modernized”, they demand a no-change to their existing processes at the beginning, which tells me that my analytics engineers need to prepare data and push it to them on Excel format for analysis. I believe this goes in line with the non-invasive approach Dremio is adopting.

Creating a VDS, version it, annotate it, and collaborate over it is a very advanced stage to them and won’t come in a day. They cannot for example connect from Excel to Dremio over views that are context-less for them. Someone has to resolve that for them first.

I want to add also a third reason, which is query optimization, I’m not sure how business domain owners (we are talking about data mesh here) will meet their SLA’s without actually measuring up the query total time (as opposed to subset query time), you know relation is not linear.

MrJava · August 15, 2022, 12:59pm

Ok, not sure if I understood it correctly but we use Excel too.
Our users use the Excel Data Connection (aka PowerQuery using ODBC) to retrieve the data comming from a VDS in Dremio and we provide the VDS as they request them.

BTW: be aware that Excel will geht really slow when presenting more that 500 mil. rows.

Topic		Replies	Views
How to increase download limit	19	5254	February 10, 2024
Dremio UI issue	4	1220	June 1, 2021
Cannot get all records in Dremio query on v19	6	1363	April 13, 2022
Unable to read data more than 10K when i upload in Dremio	2	1532	February 15, 2019
How to get large queries from Dremio	0	1389	May 3, 2021

Why Dremio is Only Returning a Subset of Results?

Related topics