Bug: Columns shifted in result set when the dataset includes an empty directory

I have my parquet files organized in a mydata dataset format.

/mydata/year=2018/month=01/.parquet
/mydata/year=2018/month=02/
.parquet
/mydata/year=2018/month=03/*.parquet
etc…

However if I create an empty directory:
/mydata/year=2017/month=12

Then the column headers in a Dremio result set is no longer aligned with the rows of data.

This is on HDFS…

Hi @david.lee

To get a better understanding of your problem, can you please describe what we are trying to achieve by creating the empty folder? Is possible to send a screenshot of the output? (In case you do not have sensitive data?)

Thanks,
@balaji.ramaswamy

Something else is wrong… The dataset isn’t being created properly…

Here’s a screenshot of Dataset Setting.

before

However after hitting Save, several columns end up as Alpha Numeric and after built in directory columns.

Running queries against the saved dataset produces inconsistent results.

For some reason schema learning is also kicking in on a tabular dataset…
LessThanOneYearBond::int32
LessThanOneYearBond::union<int32, boolean>
LessThanOneYearBond::union<int32, boolean, boolean>

The parquet files were created using Apache Drill 1.12

I probably have a corrupted parquet file somewhere that is messing up the mydata dataset.

Working my way top down by applying formatting year and then each month folder to see at what point do I see corrupted results.

My directories month=03, month=05, month=06 and month=07 are ending up with alphanumeric columns…

Going through each parquet file now…
“year=2017”.“month=03”.“1_0_0.parquet”
“year=2017”.“month=03”.“1_0_1.parquet"
through
"year=2017”.“month=03”.“1_5_1.parquet”

Ok I found the problem… Not sure how to fix it… Going to try to recreate the files from Drill using explicit CAST() functions to start.
It looks like in some of the parquet files the entire column is NULL and Dremio is showing it as a Numeric column.
In other parquet files the same column is partially populated and Dremio is showing it as a character column.

Ok that seems to work… Not ideal, but I need to regenerate all my parquet files from Drill using explicit cast functions…

cast(c.HoldingDetail.LocalMarketValue as float) as LocalMarketValue,
cast(c.HoldingDetail.LocalMarketValue_CurrencyCode as char) as LocalMarketValue_CurrencyCode,