The filter of function left is not pushed down

Env info:

Dremio dremio-community-15.5.0

DB type: mysql5.7

Table struct:

CREATE TABLE prj_working1 (
PRJ_W_DATE date NULL DEFAULT NULL,
EMP_ID varchar(36) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL
) ENGINE = InnoDB CHARACTER SET = latin1 COLLATE = latin1_swedish_ci ROW_FORMAT = Dynamic;


Used sql below to query the mysql table, check the physical plan found filter is not pushed down.

SELECT *
FROM prj_working1
WHERE LEFT(PRJ_W_DATE, 7) = LEFT(NOW(), 7)
    AND EMP_ID = '7bcfa8e5-7ca1-45df-b371-a7d06bf9b7a9'

00-00 Screen : rowType = RecordType(VARCHAR(65536) Fragment, BIGINT Records, VARCHAR(65536) Path, VARBINARY(65536) Metadata, INTEGER Partition, BIGINT FileSize, VARBINARY(65536) IcebergMetadata): rowcount = 1.0, cumulative cost = {9.1 rows, 14.790119999999998 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 6393
00-01 Project(Fragment=[$0], Records=[$1], Path=[$2], Metadata=[$3], Partition=[$4], FileSize=[$5], IcebergMetadata=[$6]) : rowType = RecordType(VARCHAR(65536) Fragment, BIGINT Records, VARCHAR(65536) Path, VARBINARY(65536) Metadata, INTEGER Partition, BIGINT FileSize, VARBINARY(65536) IcebergMetadata): rowcount = 1.0, cumulative cost = {9.0 rows, 14.690119999999999 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 6392
00-02 WriterCommitter(final=[/opt/dremio-community-15.5.0-202105040508160960-dddd4dca/data/pdfs/results/1f588834-866c-2fe0-7506-d0a22a5ed900]) : rowType = RecordType(VARCHAR(65536) Fragment, BIGINT Records, VARCHAR(65536) Path, VARBINARY(65536) Metadata, INTEGER Partition, BIGINT FileSize, VARBINARY(65536) IcebergMetadata): rowcount = 1.0, cumulative cost = {8.0 rows, 14.69005 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 6391
00-03 Writer : rowType = RecordType(VARCHAR(65536) Fragment, BIGINT Records, VARCHAR(65536) Path, VARBINARY(65536) Metadata, INTEGER Partition, BIGINT FileSize, VARBINARY(65536) IcebergMetadata): rowcount = 1.0, cumulative cost = {7.0 rows, 13.69005 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 6390
00-04 Project(PRJ_W_DATE=[$0], EMP_ID=[$1]) : rowType = RecordType(DATE PRJ_W_DATE, VARCHAR(65536) EMP_ID): rowcount = 1.0, cumulative cost = {6.0 rows, 12.69005 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 6389
00-05 Project(PRJ_W_DATE=[$0], EMP_ID=[CAST(‘7bcfa8e5-7ca1-45df-b371-a7d06bf9b7a9’):VARCHAR(65536) CHARACTER SET “ISO-8859-1” COLLATE “ISO-8859-1$en_US$primary”]) : rowType = RecordType(DATE PRJ_W_DATE, VARCHAR(65536) EMP_ID): rowcount = 1.0, cumulative cost = {5.0 rows, 12.69003 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 6388
00-06 SelectionVectorRemover : rowType = RecordType(DATE PRJ_W_DATE, VARCHAR(65536) EMP_ID): rowcount = 1.0, cumulative cost = {4.0 rows, 8.69002 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 6387
00-07 Filter(condition=[AND(=(LEFT($0, 7), ‘2021-05’), =($1, ‘7bcfa8e5-7ca1-45df-b371-a7d06bf9b7a9’))]) : rowType = RecordType(DATE PRJ_W_DATE, VARCHAR(65536) EMP_ID): rowcount = 1.0, cumulative cost = {3.0 rows, 7.69002 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 6386
00-08 Project(PRJ_W_DATE=[$0], EMP_ID=[$1]) : rowType = RecordType(DATE PRJ_W_DATE, VARCHAR(65536) EMP_ID): rowcount = 1.0, cumulative cost = {2.0 rows, 1.00002 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 6385
00-09 Jdbc(sql=[SELECT prj_working1.PRJ_W_DATE, prj_working1.EMP_ID
FROM test.prj_working1]) : rowType = RecordType(DATE PRJ_W_DATE, VARCHAR(65536) EMP_ID): rowcount = 1.0, cumulative cost = {tiny}, id = 6298


Change the function LEFT(PRJ_W_DATE, 7) to LEFT(CAST(PRJ_W_DATE AS CHAR), 7) , and run again, check the physical plan found this query can be pushed down.

SELECT *
FROM prj_working1
WHERE LEFT(CAST(PRJ_W_DATE AS CHAR), 7) = LEFT(NOW(), 7)
    AND EMP_ID = '7bcfa8e5-7ca1-45df-b371-a7d06bf9b7a9'

00-00 Screen : rowType = RecordType(VARCHAR(65536) Fragment, BIGINT Records, VARCHAR(65536) Path, VARBINARY(65536) Metadata, INTEGER Partition, BIGINT FileSize, VARBINARY(65536) IcebergMetadata): rowcount = 1.0, cumulative cost = {9.1 rows, 5.10013 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 6118
00-01 Project(Fragment=[$0], Records=[$1], Path=[$2], Metadata=[$3], Partition=[$4], FileSize=[$5], IcebergMetadata=[$6]) : rowType = RecordType(VARCHAR(65536) Fragment, BIGINT Records, VARCHAR(65536) Path, VARBINARY(65536) Metadata, INTEGER Partition, BIGINT FileSize, VARBINARY(65536) IcebergMetadata): rowcount = 1.0, cumulative cost = {9.0 rows, 5.00013 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 6117
00-02 WriterCommitter(final=[/opt/dremio-community-15.5.0-202105040508160960-dddd4dca/data/pdfs/results/1f588b63-bfcc-0bef-2155-05911c7a4800]) : rowType = RecordType(VARCHAR(65536) Fragment, BIGINT Records, VARCHAR(65536) Path, VARBINARY(65536) Metadata, INTEGER Partition, BIGINT FileSize, VARBINARY(65536) IcebergMetadata): rowcount = 1.0, cumulative cost = {8.0 rows, 5.00006 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 6116
00-03 Writer : rowType = RecordType(VARCHAR(65536) Fragment, BIGINT Records, VARCHAR(65536) Path, VARBINARY(65536) Metadata, INTEGER Partition, BIGINT FileSize, VARBINARY(65536) IcebergMetadata): rowcount = 1.0, cumulative cost = {7.0 rows, 4.00006 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 6115
00-04 Project(PRJ_W_DATE=[$0], EMP_ID=[$1]) : rowType = RecordType(DATE PRJ_W_DATE, VARCHAR(65536) EMP_ID): rowcount = 1.0, cumulative cost = {6.0 rows, 3.0000600000000004 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 6114
00-05 Project(PRJ_W_DATE=[$0], EMP_ID=[$1]) : rowType = RecordType(DATE PRJ_W_DATE, VARCHAR(65536) EMP_ID): rowcount = 1.0, cumulative cost = {5.0 rows, 3.0000400000000003 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 6113
00-06 Project(PRJ_W_DATE=[$0], EMP_ID=[$1]) : rowType = RecordType(DATE PRJ_W_DATE, VARCHAR(65536) EMP_ID): rowcount = 1.0, cumulative cost = {4.0 rows, 3.00002 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 6112
00-07 Jdbc(sql=[SELECT prj_working1.PRJ_W_DATE, CAST(‘7bcfa8e5-7ca1-45df-b371-a7d06bf9b7a9’ AS CHAR(65536)) AS EMP_ID
FROM test.prj_working1
WHERE LEFT(CAST(prj_working1.PRJ_W_DATE AS CHAR(65536)), 7) = ‘2021-05’ AND prj_working1.EMP_ID = ‘7bcfa8e5-7ca1-45df-b371-a7d06bf9b7a9’]) : rowType = RecordType(DATE PRJ_W_DATE, VARCHAR(65536) EMP_ID): rowcount = 1.0, cumulative cost = {3.0 rows, 3.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 6032

@wujx Able to reproduce the behavior, have asked about this internally if it is a bug, will keep you posted

@wujx :

I found that if I created a derived column, and then filter on that instead, then the filter is pushed down:
eg.

  • Create VDS:

    CREATE VDS prj_working1_vds
    AS SELECT PRJ_W_DATE
    , LEFT(CAST(PRJ_W_DATE AS CHAR), 7) AS PRJ_W_DATE_YYYY_MM
    , EMP_ID
    FROM prj_working1;

  • Use the VDS in the query:
    SELECT PRJ_W_DATE, EMP_ID
    FROM prj_working1_vds
    WHERE PRJ_W_DATE_YYYY_MM = LEFT(NOW(), 7)
    AND EMP_ID = ‘7bcfa8e5-7ca1-45df-b371-a7d06bf9b7a9’;

@Michael_Flower @balaji.ramaswamy
Thanks a lot for the solution and updates

Now we have workaround for this issue, and it’s an optimization issue for product. So I have two concerns:

  1. If this issue will be fixed in your plan? So we can run the sql more convenience without create vds or change sql.

  2. What’s your suggest usage of VDS? Can VDS resolve sql pushdown issue in many situation?

@wujx

#1 Is currently in Engineering backlog
#2 In general push down should happen, if not depending on the workaround, a VDS might help

Thank you for your helps~ Will try more about it.