Incorrect Query optimization for subselect

Hello, I’m using the latest community version of Dremio 24.0.0-202302100528110223-3a169b7c

We have a simple query with subselect in select clause, for a strange reason Dremio “optimice” this query to make 4 inner/left joins that produce a very slow query, instead of work the original query

I’ve tested with dremio db2 offical and with my custom connector (that works fine with 20.0 vesion) in muy custom connector subqueries are enabled

This are the querys, and attached query profile

Original:

SELECT T.ID,
       T.FECHA_TRANSACCION,
       T.SERVICIO_CONFIG_ID,
       T.VALOR,
       T.RESPUESTA_SERVICIO,
       (SELECT SUM(VAL_MOV_TRANSACC)
        FROM "mupi.MUPI".AH_TRANSACCION S
        WHERE S.NUMERO_TRANSACC = T.NUM_TRANS_COMISION
          AND S.CUENTA = T.CUENTA) AS VALOR_COMISION,

       T.ID_CANAL,
       T.PS_TIPO_TRANSACCION,
       T.CUENTA,
       T.RESPUESTA_SERVICIO        AS RESPUESTA_CONCILIACION,
       MONTH(T.FECHA_TRANSACCION)     MES,
       YEAR(T.FECHA_TRANSACCION)      ANIO

FROM "mupi.MUPI".PS_TRANSACCION T
where RESPUESTA_SERVICIO = 'OK'

Unoptimal Dremio final query:

SELECT
    "PS_TRANSACCION"."ID",
    "PS_TRANSACCION"."FECHA_TRANSACCION",
    "PS_TRANSACCION"."SERVICIO_CONFIG_ID",
    "PS_TRANSACCION"."VALOR",
    "PS_TRANSACCION"."RESPUESTA_SERVICIO",
    "t1"."EXPR$0" AS "VALOR_COMISION",
    "PS_TRANSACCION"."ID_CANAL",
    "PS_TRANSACCION"."PS_TIPO_TRANSACCION",
    "PS_TRANSACCION"."CUENTA",
    "PS_TRANSACCION"."RESPUESTA_SERVICIO" AS "RESPUESTA_CONCILIACION",
    MONTH("PS_TRANSACCION"."FECHA_TRANSACCION") AS "MES",
    YEAR("PS_TRANSACCION"."FECHA_TRANSACCION") AS "ANIO"
FROM
    (
    SELECT
        *
    FROM
        "MUPI"."PS_TRANSACCION" AS "PS_TRANSACCION"
    WHERE
        "PS_TRANSACCION"."RESPUESTA_SERVICIO" = 'OK') AS "PS_TRANSACCION"
LEFT JOIN (
    SELECT
        "PS_TRANSACCION0"."CUENTA" AS "CUENTA0",
        "PS_TRANSACCION0"."NUM_TRANS_COMISION",
        SUM("AH_TRANSACCION"."VAL_MOV_TRANSACC") AS "EXPR$0"
    FROM
        "MUPI"."AH_TRANSACCION" AS "AH_TRANSACCION"
    INNER JOIN (
        SELECT
            "PS_TRANSACCION0"."CUENTA",
            "PS_TRANSACCION0"."NUM_TRANS_COMISION"
        FROM
            "MUPI"."PS_TRANSACCION" AS "PS_TRANSACCION0"
        WHERE
            "PS_TRANSACCION0"."RESPUESTA_SERVICIO" = 'OK'
        GROUP BY
            "PS_TRANSACCION0"."CUENTA",
            "PS_TRANSACCION0"."NUM_TRANS_COMISION") AS "PS_TRANSACCION0" ON
        (CAST("AH_TRANSACCION"."NUMERO_TRANSACC" AS decimal(16,
        2)) = "PS_TRANSACCION0"."NUM_TRANS_COMISION"
            AND "AH_TRANSACCION"."CUENTA" = "PS_TRANSACCION0"."CUENTA")
    GROUP BY
        "PS_TRANSACCION0"."CUENTA",
        "PS_TRANSACCION0"."NUM_TRANS_COMISION") AS "t1" ON
    ("PS_TRANSACCION"."CUENTA" = "t1"."CUENTA0"
        AND "PS_TRANSACCION"."NUM_TRANS_COMISION" = "t1"."NUM_TRANS_COMISION"

Note.- All querys with subselect are transforming to unoptimal join querys
2fdf1c23-0142-4794-b775-6e28a468bb93.zip (17,1 KB)

We’re doing a lot of work with correlated sub-queries.

In order to get pushdown of the desired SQL, you could try re-writing this SQL to use a join instead. Another option is to use an external query: Dremio

1 Like

Hi, thank you for your reply. I think that is not ideal rewrite all queries to use joins, even in my use case “join” is more expensive that use a subquery, like Dremio in the past versions has done with subqueries.
Is there any way to achiviement the work of this functionality as in the past versions?

we have a lot of queries, and is not the best solution this current functionality, also I think that solution propused is not the best because in each version upgrade I should refactor, adicionatilly in the change log doesn’t notify of this breaking changes that impacts in the time of the queries

Can you provide a profile from the past version that shows correlated subquery pushdown to SQL?