"Coercing" column types when there shouldn't be any need to?

Hi, all. Attempting to use a Hex notebook to extract records from a table, and write them to partitioned parquet files for heavier analysis work that doesn’t tax the database we’re extracting from. However, I keep getting errors about coercing Boolean columns to int, or vice versa, and I’m not sure what to do with this.

The first error I saw was when I attempted a metadata refresh after running an extract:

Exception: Database returned error while executing query 
exception while executing query: UNSUPPORTED_OPERATION ERROR: Unable to coerce from the file's data type "int32" to the column's data type "boolean", column "OnlineSelfSchedule"

The extract runs hourly, so there is a chance it may only pull records where the Boolean columns have null values. I wondered whether the data type is defaulting to int when the data in the column is all null.

I started explicitly casting all of the columns, in case something like that is happening. Unfortunately I just get a similar error:

We received the following error when executing this query:

Hide error details

Exception: Database returned error while executing query 
Failed to create prepared statement: error_id: "406d93d5-a77f-4619-a588-e4926a61ade6"
endpoint {
  address: "10.0.50.14"
  user_port: 31010
  fabric_port: 45678
  roles {
    sql_query: true
    java_executor: false
    master: false
  }
  start_time: 1668730221054
  max_direct_memory: 4294967296
  available_cores: 5
  node_tag: ""
  conduit_port: 9000
  dremio_version: "24.0.0-202211150536100932-39e3fe32"
}
error_type: PLAN
message: "PLAN ERROR: \nDremio does not support casting or coercing boolean to int.
[SELECT with 138 explicitly cast columns here]

I don’t know what to do from here. As far as I know, at no point is my processing casting or coercing anything to a different type, so these error messages are confusing.

@jeremybrezovan

What is the file format of the source files? Is it an RDBMS? Do you have the profile of the failed job?

This error typically occurs when you have multiple files in a folder (such as a folder in an S3 bucket) and attempt to “Format” that folder. Dremio will read the files (such as parquet) in that folder and determine the schema/data type. If the schema is different between files --“mycolumn” is an FLOAT in one parquet file but the same column “mycolumn” is a INT in another file then Dremio will use DOUBLE. However, there are a number of coercions that won’t work (such as BOOLEAN to INT). The fix is to group your files so that all files in the same folder have the same schema.

https://docs.dremio.com/software/sql-reference/data-types/coercions/

For example, the following parquet files in the same folder will result in the error that int32 cannot be converted to boolean. The column “in_stock” is an INT32 in one file and BOOLEAN in another.

% parquet-tools show DEBUGPARQUET5_mixed.parquet
±-------±----------±-----------±--------±---------------------------±-----------+
| name | my_code | quantity | price | date | in_stock |
|--------±----------±-----------±--------±---------------------------±-----------|
| apples | hello | 10 | 2.5 | 2022-12-06 19:34:08.261000 | 1 |
| apples | 1 | 10 | 2.5 | 2022-12-06 19:34:08.261000 | 1 |
| apples | public | 10 | 2.5 | 2022-12-06 19:34:08.261000 | 1 |
| apples | apples | 10 | 2.5 | 2022-12-06 19:34:08.262000 | 0 |
| apples | apples | 10 | 2.5 | 2022-12-06 19:34:08.262000 | 0 |
| apples | apples | 10 | 2.5 | 2022-12-06 19:34:08.262000 | 0 |
| apples | apples | 10 | 2.5 | 2022-12-06 19:34:08.262000 | 0 |
±-------±----------±-----------±--------±---------------------------±-----------+
% parquet-tools show DEBUGPARQUET5.parquet
±-------±----------±-----------±--------±---------------------------±-----------+
| name | my_code | quantity | price | date | in_stock |
|--------±----------±-----------±--------±---------------------------±-----------|
| apples | hello | 10 | 2.5 | 2022-10-21 23:44:56.978000 | True |
| apples | 1 | 10 | 2.5 | 2022-10-21 23:44:56.979000 | True |
| apples | public | 10 | 2.5 | 2022-10-21 23:44:56.979000 | True |
| apples | apples | 10 | 2.5 | 2022-10-21 23:44:56.979000 | True |
| apples | apples | 10 | 2.5 | 2022-10-21 23:44:56.979000 | True |
| apples | apples | 10 | 2.5 | 2022-10-21 23:44:56.979000 | True |
| apples | apples | 10 | 2.5 | 2022-10-21 23:44:56.979000 | True |
±-------±----------±-----------±--------±---------------------------±-----------+

% parquet-tools inspect DEBUGPARQUET5_mixed.parquet

############ Column(in_stock) ############
name: in_stock
path: in_stock
max_definition_level: 0
max_repetition_level: 0
physical_type: INT32
logical_type: None
converted_type (legacy): NONE

% parquet-tools inspect DEBUGPARQUET5.parquet

############ Column(in_stock) ############
name: in_stock
path: in_stock
max_definition_level: 0
max_repetition_level: 0
physical_type: BOOLEAN
logical_type: None
converted_type (legacy): NONE