Feature requests + Bug reporting

Download File Formats:

  1. JSON download produces invalid JSON. The array output should contain an open bracket at the start, commas between records and a close bracket at the end.
  2. Can we add a TAB delimited text option for downloads? Saving to CSV is problematic if your data contains commas.
  3. Can we add a Parquet option for downloads as well?

SQL Bugs

  1. A dataset which contains a calculated column defaults to a CHAR datatype instead of VARCHAR.
    A. Create a test.txt text file with two lines. ‘a’ for header and ‘XYZ’ as the row
    B. Create a data set with the following SQL statement and call it test_union_all
    SELECT ‘123456789’ as new_column, a FROM "test.txt"
    union all
    SELECT ‘987’ as new_column, a FROM "test.txt"
    C. The following statements works:
    SELECT * FROM “@test_user”.test_union_all where new_column = '123456789’
    SELECT * FROM “@test_user”.test_union_all where new_column like '987%'
    D. The following statement fails:
    SELECT * FROM “@test_user”.test_union_all where new_column = '987’
    E. Reworking the original SQL statement and introducing cast( as varchar) fixes the dataset.
    SELECT cast(‘123456789’ as varchar) as new_column, a FROM “Drill Examples”."test.txt"
    union all
    SELECT cast(‘987’ as varchar) as new_column, a FROM “Drill Examples”.“test.txt”

  2. The char vs varchar column definitions also impacts Microsoft SQL Server tables and joins between char/varchar columns of varying length.
    A value of ‘12345’ saved in a my_column_name char(6) SQL Server column can only be queried or joined passing in my_column_name = '12345 ’ with an space padded at the end.

Windows Installer

  1. Add the option to install in a directory other than c:\programs files\Dremio… Not everyone has administrative rights on their PC in our team.

Hi David!

Thanks for sharing these. We will investigate and report back.

Regarding saving as tab delimited, are we not escaping commas correctly? Would you happen to have an example? Same question for JSON - your comments sound like we aren’t handling some very basic cases, and we certainly should.

Finally, could you say more about your use case for downloading Parquet? First, our hope is that you don’t need to export data and that you can simply query things via Dremio. Second, if you did need Parquet, then downloading might not be the right option, and storing to a server-side filesystem might be a better bet.

Thanks again!

Comma works, but if your data has a commas in it already like “San Francisco, CA 94105” then a comma delimited file isn’t useful.

Json is not handled correctly
Three records with values of 1, 2, 3 are being saved as
1
2
3
when it should be
[
1,
2,
3
]

Parquet download would be used to compress JSON files and eliminate them permanently with parquet replacements.

For JSON, the format we choose is not to have one single object per file, but one JSON object per record, one record per line. One of the advantages of this format is that it is easy to split a JSON file to do parallel processing. This approach is not specific to Dremio but used by multiple data analytics frameworks.

For CSV file, I believe Dremio will put the field under double quotes (as the standard mandates) if the field contains a comma. Isn’t it not the case, or do you prefer a TSV file format as you are not using a CSV parsing library to process the content?

I can confirm that CSV is not using a string quoted identifier… Adding TSV shouldn’t be that difficult.

SELECT domicile_country_cd,fund_nm FROM “@bfm_user”.web_extract_funds where fund_nm like ‘%,%’

domicile_country_cd,fund_nm
US,BlackRock Corporate High Yield Fund, Inc.
US,BlackRock Floating Rate Income Strategies Fund, Inc.
US,BlackRock Debt Strategies Fund, Inc.
US,BlackRock Income Trust, Inc.
US,BlackRock MuniVest Fund, Inc.
US,BlackRock MuniYield California Fund, Inc.
US,BlackRock MuniHoldings Fund, Inc.
US,BlackRock Muni Intermediate Duration Fund, Inc.
US,BlackRock MuniYield Arizona Fund, Inc.

For JSON I’m trying to take the JSON file and read it into PYTHON using:

with open(‘data.json’) as data_file:
data = json.load(data_file)

Our JSON records aren’t simple records, but contain multiple levels of nested keys and arrays.

For CSV file, the quoting issue will be fixed in the next version. And I will open a enhancement request as for TSV support.

For JSON, even if the data is complex, it shouldn’t matter, and should you still be able to read each record by iterating over the lines in the file and decoding them individually (there are also some Python library like jsonlines which does this for you).