I have a similar instance where the csv file has a title line, but no header row. So I want to skip the first row and have the columns identified from the second, but this does not seem to be the behavior when using the web interface.
The title line should be ignored, and values read from line 2. However, trying to setup the folder with only this file as a physical dataset that is comma delimited, it only extracts the first columns values:
To see what it would do, I changed the delimiter to TAB and can see the values sitting there as expected, they are just not being broken up into csv columns.
I see. This is not a standard text format. For best results with Dremio, you can delete the first line and “Extract Field Name” and other settings should work as expected.
The “Skip First Line” options still expects the formatting of the first line to match that of all the others. That is, it should have the same number of tab or comma separated fields (even if they are empty). So a file that looks like this:
Shame this hasn’t progressed as a feature - using version 4.91 the behaviour is still the same
An idea might be to have some interface to a simple editor (sed) with regex-
Most files from suppliers contain a file title, then second row contains column headers. Otherwise there has to be an interstitial step to remove the title record - which is bad practice as we only want to keep a single version of all received datasets - Don’t suppose there is a features and suggestions page?