I’m looking to use Dremio to point to an ASDL hosted file which is tab-delimited text. One of the columns is a Remarks column, and sometimes they run to > 65535 characters
Is there anyway I can tell Dremio to ignore this column when creating the format?
I’ve taken a look, and found the issue
For my file format, I had the following settings
Format: Text (delimited)
Field Delimiter: Tab
Quote: DoubleQuote (this was the root of my problem)
Extract Fieldnames: Ticked
Trim fieldnames: Ticked
On painfully looking through my large text file, I finally tracked it down to a field value having a doublequoted value in it… e.g.
Johnny “Pirate” Depp
This meant that the remainder of the file was considered the rest of the line.
I then changed the Quote:Doublequote to ‘custom’ value ‘~’
Is this the right way to go about it? My workaround feels hacky.
Looks like the field delimiter was the issue, and what you did was the right method, unfortunately today Dremio does not automatically suggest delimiter
As I’m working with my data, setting the formats, I’m finding most of the character I could use as delimited are actually used in the remarks - even the tilde, in some of them.
Other ETL tools can actually let you set the Quote equivalent to None since the content of a Tab delimited field, or indeed a CSV, may not have encapsulated string.
I’m concerned that picking a surrogate character e.g using a caret or a curly bracket may end up being entered by users in the data.