How to create a array in dremio with ctas and how to do a select query on it

CREATE TABLE sample_data.posts_array AS
select id,
title,
body,
CONVERT_FROM(tags, ‘json’) from DBShift.dbshift.sample_data.posts_val

when running select it running but with CTAS it is throughing error

[Code: 0, SQL State: ] UNSUPPORTED_OPERATION ERROR: Type conversion error for column EXPR$3

[Error Id: a125c8e0-090b-43dc-8d91-43d4dfd0622a on localcent.ss.local:31010]

(java.lang.UnsupportedOperationException) Unsupported arrow type : Null
com.dremio.exec.store.iceberg.SchemaConverter$1.visit():292
com.dremio.exec.store.iceberg.SchemaConverter$1.visit():289
org.apache.arrow.vector.types.pojo.ArrowType$Null.accept():286
com.dremio.exec.store.iceberg.SchemaConverter.toIcebergType():289
com.dremio.exec.store.iceberg.SchemaConverter.toIcebergColumn():275
com.dremio.exec.store.iceberg.SchemaConverter.toIcebergColumn():266
com.dremio.exec.store.iceberg.SchemaConverter.lambda$toIcebergSchema$3():246
java.util.stream.ReferencePipeline$3$1.accept():193
java.util.stream.ReferencePipeline$2$1.accept():175
java.util.ArrayList$ArrayListSpliterator.forEachRemaining():1384
java.util.stream.AbstractPipeline.copyInto():482
java.util.stream.AbstractPipeline.wrapAndCopyInto():472
java.util.stream.ReduceOps$ReduceOp.evaluateSequential():708
java.util.stream.AbstractPipeline.evaluate():234
java.util.stream.ReferencePipeline.collect():566
com.dremio.exec.store.iceberg.SchemaConverter.toIcebergSchema():247
com.dremio.exec.store.iceberg.SchemaConverter.toIcebergSchema():238
com.dremio.exec.store.iceberg.IcebergUtils.getIcebergPartitionSpecFromTransforms():472
com.dremio.exec.planner.sql.handlers.query.DataAdditionCmdHandler.convertToDrel():310
com.dremio.exec.planner.sql.handlers.query.DataAdditionCmdHandler.getPlan():196
com.dremio.exec.planner.sql.handlers.query.CreateTableHandler.doCtas():100
com.dremio.exec.planner.sql.handlers.query.CreateTableHandler.getPlan():69
com.dremio.exec.planner.sql.handlers.EnterpriseCreateTableHandler.getPlan():38
com.dremio.exec.planner.sql.handlers.commands.HandlerToExec.plan():59
com.dremio.exec.work.foreman.AttemptManager.plan():502
com.dremio.exec.work.foreman.AttemptManager.lambda$run$4():400
com.dremio.service.commandpool.ReleasableBoundCommandPool.lambda$getWrappedCommand$3():137
com.dremio.service.commandpool.CommandWrapper.run():62
com.dremio.context.RequestContext.run():96
com.dremio.common.concurrent.ContextMigratingExecutorService.lambda$decorate$3():199
com.dremio.common.concurrent.ContextMigratingExecutorService$ComparableRunnable.run():180
java.util.concurrent.Executors$RunnableAdapter.call():511
java.util.concurrent.FutureTask.run():266
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():750

@vigneshpathapati

If you leave the convert_from, does the CTAS complete?

yeah it is working.
for better understanding, data in array column is like this
example: ‘[“water”,“drink”,“milk”]’
actually datatype of that column is varchar
how can i convert to array

I am experiencing the same issue. Dremio 24.0.0 CE on-premise.

I am parsing several million text strings into JSON lists and then flattening the list items for keyword analysis.

Given a text column keywords_json data such as:

["essco", "distributors"]
["turf", "and", "ornamental", "sales"]
["envu", "environmental"]
["xtendimax", "herbicide", "label"]
["approach", "fungicide"]

I can run a query like this:

    select 
        id,
        keywords_raw,
        flatten(convert_fromJSON(keywords_json)) as keyword, 
    from "local_nfs_storage_0"."gsc_data_mart_pds"."combined"."base_all"

and get back the expected results. BUT! If I wrap the expression like so:

create table "local_nfs_storage_0"."gsc_data_mart_pds"."combined"."keyword_parse_test1" as (
    select 
        id,
        keywords_raw,
        flatten(convert_fromJSON(keywords_json)) as keyword, 
    from "local_nfs_storage_0"."gsc_data_mart_pds"."combined"."base_all"
)

Then the job fails with Type conversion error for column keyword and the same job error @vigneshpathapati is experiencing, Unsupported arrow type : Null in com.dremio.exec.store.iceberg.SchemaConverter.

Attached is the full query profile.
7dc692c7-b808-489b-86c1-26ebd73c71e8.zip (10.4 KB)

I am materializing these JSON values prior to execution, so there is nothing abstract happening, it is literally a column of unicode text formatted as a list object stored on my server’s disk. How can I resolve this strange behavior?

@AWaschick Can you try something real quick? Can you make this a 2 step approach and see if that works?

create vds flatten_vds as
select 
        id,
        keywords_raw,
        flatten(convert_fromJSON(keywords_json)) as keyword, 
    from "local_nfs_storage_0"."gsc_data_mart_pds"."combined"."base_all"
create table "local_nfs_storage_0"."gsc_data_mart_pds"."combined"."keyword_parse_test1" as select * from flatten_vds

@balaji.ramaswamy

Excellent suggestion, however it produces the same outcome:


The dreaded type conversion error message. Super weird!

@AWaschick Let me try to reproduce this issue and get back to you, are you able to create a reflection though?

@balaji.ramaswamy Now that is very interesting! Yes indeed, if I create a view and then create a reflection to materialize it, it works fine.

@AWaschick Hope this is a practical workaround, meanwhile I was able to reproduce the issue and will get back to you if there are any other workarounds without reflection

1 Like