Skip first line does not seem to work for Text(delimited) format

dfleckinger · September 17, 2018, 12:41pm

I have a folder containing csv.gz files .
They have a title line, then a header line containing the field names.

However i’m unable to use both options “Skip first line” and “Extract field names” at the same time.

When I have the “Extract field names” checkbox ticked, it seems that Dremio is ignoring the “Skip first line” option.

hugom · January 24, 2020, 1:13pm

I have a similar instance where the csv file has a title line, but no header row. So I want to skip the first row and have the columns identified from the second, but this does not seem to be the behavior when using the web interface.

ben · January 24, 2020, 2:59pm

@hugom, can you share and example file that shows this behavior?

hugom · January 27, 2020, 9:23am

Hi @ben

Thank you for the response, I created an example file with the below content:

TitleLine
data,entry,1
data,entry,2
another,data,entry

The title line should be ignored, and values read from line 2. However, trying to setup the folder with only this file as a physical dataset that is comma delimited, it only extracts the first columns values:

To see what it would do, I changed the delimiter to TAB and can see the values sitting there as expected, they are just not being broken up into csv columns.

tster.zip (202 Bytes)

ben · January 27, 2020, 3:48pm

I see. This is not a standard text format. For best results with Dremio, you can delete the first line and “Extract Field Name” and other settings should work as expected.

hugom · January 28, 2020, 5:23am

Hi @ben

Thank you for the feedback.

For interest sake, how is the “skip first line” function expected to work?

ben · January 28, 2020, 7:45pm

The “Skip First Line” options still expects the formatting of the first line to match that of all the others. That is, it should have the same number of tab or comma separated fields (even if they are empty). So a file that looks like this:

TitleLine,,
data,entry,1
data,entry,2
another,data,entry

… would parse correctly.

irnerd · August 10, 2022, 4:08pm

Shame this hasn’t progressed as a feature - using version 4.91 the behaviour is still the same

An idea might be to have some interface to a simple editor (sed) with regex-
Most files from suppliers contain a file title, then second row contains column headers. Otherwise there has to be an interstitial step to remove the title record - which is bad practice as we only want to keep a single version of all received datasets - Don’t suppose there is a features and suggestions page?

Topic		Replies	Views
Extract Field names not working for CSV files	1	2314	February 6, 2019
Export csv options	6	4495	April 16, 2021
Comment lines issues when formating CSV	5	313	October 1, 2024
Pointing Dremio to tab delimited text - ignore field > 65535?	3	1012	March 1, 2021
Handling large csv	1	1268	December 18, 2017

Skip first line does not seem to work for Text(delimited) format

Related topics