Dremio does not support casting or coercing timestamp to intervalyear

Hello we have a strange error when try run a Merge sentence, when run only the select part works well
but when run merge we have the error “Dremio does not support casting or coercing timestamp to intervalyear”

MERGE INTO lake.scan.posts target
USING (
SELECT id_post as mergeKey, updates.*
       FROM scan.preparation.fb_posts updates
       WHERE updates.id_post ='x'
       
       ) as st_update

ON target.id_post = st_update.mergeKey
WHEN MATCHED THEN
    UPDATE
    SET SHARES = st_update.SHARES
WHEN NOT MATCHED THEN
    INSERT (ID_POST,
             ID_PAGE,
            TEXT,
            SHARES)
    VALUES (st_update.ID_POST,
            st_update.ID_PAGE,
            st_update.TEXT,
            st_update.SHARES)

The probematic part is with scan.preparation.fb_posts that is defined as VDS with a field

(CAST(REPLACE(REPLACE(a.created_at, 'T', ' '),'+0000', '') AS TIMESTAMP)) as created_date,

when change the query to use a subselect the error not occur

select *, TO_TIMESTAMP(created_date_str, 'YYYY-MM-DD HH24:MI:SS') as created_date
from (Select a._id                                                 as id,
             a._id                                                 as id_post,
             a.pageId                                              as id_page,
             a.engagement                                          as engagement,
             REPLACE(REPLACE(a.created_at, 'T', ' '), '+0000', '') as created_date_str,
             "timestamp"                                           as modified_date,
             a."date",
             --(CAST(REPLACE(REPLACE(a."date", 'T', ' '),'+0000', '') AS TIMESTAMP)) as modified_date,
             
             a.shares                                              as shares,
             
      from scan_db.qliksocial.pagePosts a) as xy

For some reason Dremio plan is wrong when planning the select statement when inside merge statement
This are the query profiles, with error and without error, I’m using last version of Dremio

ad51665b-51ba-42d3-8d23-928da546ff2d_ok.zip (38,1 KB)
4ede3a7f-7314-44f0-9bd6-8cd4f69783e5_fail.zip (30,1 KB)

helo @balaji.ramaswamy can you please help me?

@Benny_Chow can you help here? I think is a critical bug with merge statement, also fails when have a list

making some debugging in org.apache.calcite.rex.RexUtil#generateCastExpressions(RexBuilder rexBuilder, RelDataType lhsRowType, RelDataType rhsRowType)
in this method lhsRowType and rhsRowType not has same order I think that this cause that types not match and Dremio tries make a wrong cast

We have an internal tool that can reproduce queries using a query profile.

I think the issue is in the view scan.preparation.“fb_posts”. This expression is getting an invalid input:

TO_TIMESTAMP(created_date_str, 'YYYY-MM-DD HH24:MI:SS') as created_date

I would double check that all input rows contain a well formatted created_date_str that can be cast to timestamp.

no, the error is not by dates, even If I remove the field from select part, the issue persist
also please check this query profle the error is why dremio tries cast array to integer because wrong order and the issue fixes when put in order the columns in the select part of merge

for example in this profile we invert orders in select part “HASHTAGS, REDS” to “RED,HASHTAGS” and we have error of cast, in resumme I think that all this errors are by wrong order on how dremio parse columns ih the line codes reported above.

5b0cbfc2-6ec2-4fe9-b928-87d7c8a956b7.zip (54,0 KB)

6f6b73c3-896f-4a70-85d2-aa6c8ce3889d.zip (29,7 KB)

You are right… it’s not a data issue. It looks like physical planning for the MERGE statement is adding a bunch of CAST expressions underneath the join:

04-03   NestedLoopJoin(condition=[true], joinType=[left]) : rowType = RecordType(VARCHAR(65536) id_page, INTEGER shares, TIMESTAMP(3) id, VARCHAR(65536) engagement, INTEGER created_date, VARCHAR(65536) D_R_E_M_I_O_D_A_T_A_F_I_L_E_F_I_L_E_P_A_T_H, BIGINT D_R_E_M_I_O_D_A_T_A_F_I_L_E_R_O_W_I_N_D_E_X): rowcount = 1621818.0, cumulative cost = {4865458.0 rows, 1.8326547043638E8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 11096
04-05      Project(id_page=[$1], shares=[$2], id=[CAST($0):TIMESTAMP(3)], engagement=[CAST($4):VARCHAR(65536)], created_date=[CAST(TO_TIMESTAMP(REPLACE(REPLACE($3, 'T':VARCHAR(1), ' ':VARCHAR(1)), '+0000':VARCHAR(5), '':VARCHAR(0)), 'YYYY-MM-DD HH24:MI:SS':VARCHAR(21))):INTEGER]) : rowType = RecordType(VARCHAR(65536) id_page, INTEGER shares, TIMESTAMP(3) id, VARCHAR(65536) engagement, INTEGER created_date): rowcount = 1621818.0, cumulative cost = {3243636.0 rows, 2.108366743636E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 11091

yes! and I think is cause by the line codes reported above, and caused by wrong order of columns

@Benny_Chow Hello, any news about this bug?
because I find another bug, with merge sentence, that generate too many table scans
And I don’t understand why, if is only a direct merge
please show this plan also

7864551a-6bbe-4089-833b-1a7690b7a61f.zip (175,0 KB)

Yes, I can confirm that the first issue that you reported is indeed a bug. We are fixing it. Thanks for reporting it.

Thank you, and please review the other merge strange plan reported.