How to increase download limit

sathakkathullah · August 16, 2017, 9:01am

I tried to export 1.5 million records of csv as JSON but I am getting pop-up like below screenshot

csv_ - Dremio

doron · August 17, 2017, 4:43pm

Hi,

Dremio has a hardcoded limit of one million right now and we don’t have a way to change that right now (it is on our list of things to do).

In the meantime you could create 2 virtual datasets and split the dataset between then.

kelly · August 17, 2017, 5:48pm

Could you say more about why you want to export data? One of our goals with Dremio is removing the need to make copies of data and instead to query any dataset through Dremio with any tool.

ashitabh_kumar · June 21, 2018, 4:06pm

Does the restriction of 1 million applies to JDBC connected third party tools like Talend also which other team want to use to connect to this virtualization layer.

kelly · June 21, 2018, 4:11pm

The limit only applies to downloads via the browser.

Connections via odbc, jdbc, and rest do not have this limit.

Giovanni_Tummarello · June 23, 2018, 2:07pm

Kelly (specific) need was to create a parquet file where i would materialize some stuff e.g. the “rowNumber” to create a primary key. I was wondering if there if you guys have it on your roadmap at some point to turn a classic dremio query into an actual saved source table.

Other people we talk to need the full export for analysis in specialized tools (E.g. for advanced numerical analysis , machine learning etc). Valid use case i guess? cheers

anthony · June 23, 2018, 2:32pm

@Giovanni_Tummarello “Turn a classic Dremio query into an actual saved source table” = https://docs.dremio.com/sql-reference/sql-commands/create-table-as.html

Another option I would actually recommend first is to create and use our Reflections (basically an accelerated physically optimized materialized representation of the data). More about that here - https://www.dremio.com/tutorials/getting-started-with-data-reflections/

Giovanni_Tummarello · June 23, 2018, 2:48pm

Ah right yes, i saw that but i thought that given it had no security etc. Is this going to become secure and all (e.g. per user etc) ? if so do you see it as a short/medium term thing? thanks!

kelly · June 23, 2018, 3:04pm

Users don’t access reflections directly. They would access the anchor dataset, which you can secure like any other dataset in Dremio Enterprise Edition. In the Community Edition everyone is an admin.

Regarding other tools, why wouldn’t they be able to access the data via odbc/jdbc/rest?

Giovanni_Tummarello · June 23, 2018, 3:32pm

Kelly was referring to securing the $scretch space. I don’t feel reflections would work as a replacement for staging as you would do after a geocoding or remote service enrichment (to have full control on how many times the remote service is invoked )

kelly · June 23, 2018, 3:35pm

Agreed. Reflections are not for staging. See the other thread for an example of using files for staging per the geocode example.

Giovanni_Tummarello · June 23, 2018, 4:39pm

Files… which are limited to 1M row, so we got a new great use case here for unlimited row download

kelly · June 23, 2018, 4:44pm

Files are not limited to 1M. Only downloads are.

You can write files of any size to S3, ADLS, HDFS, NAS, etc and then read them through Dremio.

Does that make sense? Maybe I don’t understand what you’re trying to do?

Giovanni_Tummarello · June 23, 2018, 5:11pm

For a moment here i thought dremio could SAVE a materialized table of any size to S3 HDFS NAS etc, what you mean instead is that one can put a file of any size in there and read it trough Dremio - ok i see that.

Anyway, my case is still lingering i guess.

If i could download a file of arbitrary size i could put it BACK into dremio e.g. S3 HTDFS NAS and this would effectively do a “staging” which i need e.g. when i want to do operations like “create a primary key from a row count” or do an expensive field computation (E.g. UDF for NLP to extract entities) or when i want to enrich via remote service lookup (via UDF again i guess)

At the moment the only way for this staging would be via the $scratch space. I guess if this functionality gets powered up e.g. via security you could have the best of both late / last mile ETL (or ELT) and classic pipelines. Thanks for the interaction

kelly · June 23, 2018, 5:22pm

Maybe the confusion is regarding what is meant by download.

In Dremio you can only downloads via the browser. You have to click a button. This doesn’t work for any kind of automated processing.

I’m suggesting your script would save intermediate results back to a shared file system or object store, in a folder called staging if you like. You would then call the rest API to add it as a new dataset that Dremio understands, then you can access this new file as a dataset through Dremio.

You don’t want to perform big, complex ETL through Dremio because the system is designed for low latency workloads. Jobs that run for many hours for example may experience node failures or network partitions that might cause the job to fail and you would need to start over. Dremio doesn’t perform any check pointing or other measures to recover from mid job failures like this.

Giovanni_Tummarello · June 24, 2018, 10:50pm

Thanks Kelly i get it : use a script + JDBC to save a file and then via API use it as a staged file. Could work.

I think a ui action for that would be useful in many cases e.g. analysts working on one off operations and not wanting a script. e.g. an issue like “ability to “save to” HDFS/NAS directly in the ui”

would be cool if it was possible to open issues on github for Dremio
cheers

crazyisjen · January 31, 2024, 2:05am

hi all
was the download limit for ver. 19 still hardcoded as 1mil by default? how to view this param. can I enlarge it? thx
sometime would like to download it via ui in one-off, thanks

balaji.ramaswamy · January 31, 2024, 6:53am

@crazyisjen Download limit is 1 Million, can you use Create table as select … instead?

crazyisjen · February 5, 2024, 4:27am

is it possible to export to my pc local drive by using create table as select instead of s3?

balaji.ramaswamy · February 10, 2024, 7:45am

@crazyisjen Unfortunately no

Topic		Replies	Views
How dremio supports machine learning	1	1341	September 2, 2018
How to increase download limit over 1 million	2	2147	July 30, 2019
Unable to fetch file count from Dremio	8	1842	May 22, 2018
Unable to read data more than 10K when i upload in Dremio	2	1544	February 15, 2019
Upload/Download API	1	1747	April 30, 2019

How to increase download limit

Related topics