Skip first line does not seem to work for Text(delimited) format

I have a folder containing csv.gz files .
They have a title line, then a header line containing the field names.

However i’m unable to use both options “Skip first line” and “Extract field names” at the same time.

When I have the “Extract field names” checkbox ticked, it seems that Dremio is ignoring the “Skip first line” option.

I have a similar instance where the csv file has a title line, but no header row. So I want to skip the first row and have the columns identified from the second, but this does not seem to be the behavior when using the web interface.

@hugom, can you share and example file that shows this behavior?

Hi @ben

Thank you for the response, I created an example file with the below content:


TitleLine
data,entry,1
data,entry,2
another,data,entry


The title line should be ignored, and values read from line 2. However, trying to setup the folder with only this file as a physical dataset that is comma delimited, it only extracts the first columns values:

To see what it would do, I changed the delimiter to TAB and can see the values sitting there as expected, they are just not being broken up into csv columns.

tster.zip (202 Bytes)

I see. This is not a standard text format. For best results with Dremio, you can delete the first line and “Extract Field Name” and other settings should work as expected.

Hi @ben

Thank you for the feedback.

For interest sake, how is the “skip first line” function expected to work?

The “Skip First Line” options still expects the formatting of the first line to match that of all the others. That is, it should have the same number of tab or comma separated fields (even if they are empty). So a file that looks like this:

TitleLine,,
data,entry,1
data,entry,2
another,data,entry

… would parse correctly.

1 Like

Shame this hasn’t progressed as a feature - using version 4.91 the behaviour is still the same

An idea might be to have some interface to a simple editor (sed) with regex-
Most files from suppliers contain a file title, then second row contains column headers. Otherwise there has to be an interstitial step to remove the title record - which is bad practice as we only want to keep a single version of all received datasets - Don’t suppose there is a features and suggestions page?