Hi,
connecting to an hdfs source i notice that the varchar columns are interpreted as varbinary.
This not happend for columns with data format, like you can see from the picture.
Another strange thing, the column dir0 has the true format. it is the partition column of the Impala table.
Using the Hive connector this not happend.
All this cause problems to cast these varbinary columns into varchar.
With api requests i have extracted the infomation about the pds:
{
“entityType” : “dataset”,
“id” : “8c096312-4df7-4f16-abf7-54c0b870038b”,
“type” : “PHYSICAL_DATASET”,
“path” : [ “HDFS”, “test”, “sample_20200110” ],
“createdAt” : “2020-02-04T10:41:07.193Z”,
“tag” : “31”,
“format” : {
“type” : “Parquet”,
“ctime” : 0,
“isFolder” : true,
“location” : “/test/sample_20200110”,
“autoCorrectCorruptDates” : true
},
“approximateStatisticsAllowed” : false,
“fields” : [ {
“name” : “xxx”,
“type” : {
“name” : “VARBINARY”
}
}, {
“name” : “scheda”,
“type” : {
“name” : “VARBINARY”
}
}, {
“name” : “ts”,
“type” : {
“name” : “TIMESTAMP”
}
}, {
“name” : “transaction_id”,
“type” : {
“name” : “TIMESTAMP”
}
}, {
“name” : “dir0”,
“type” : {
“name” : “VARCHAR”
}
} ]
}
thanks!