Dremio 3.3.2 CE version only support 1 pushdown filter on ParquetScan?

Today I ran a SQL on parquet though Dremio 3.3.2 with 2 filters. But when I looked at the physical plan, only 1 filter push downed into parquet scan. I ran the same SQL on Apache Drill and 2 filters are push downed.

1 Like

@fengliu77

Please send us the job profile from the jobs page

Share a query profile

71add1dd-d654-4f90-9688-055b54f510e1.zip (10.9 KB)

Hi Balaji,

I have attached the job profile. I ran the following query in Dremio 3.3.2 UI
SELECT * FROM “yelp-business” where ADDRESS=‘3077 Mayfield Rd’ and CITY=‘Cleveland Heights’

yelp-business is a yelp dataset in parquet format.

In physical plan, only 1 filter is push-down into parquet scan.
ParquetScan(table=["@fliu".“yelp-business”], columns=[ADDRESS, ATTRIBUTES, BUSINESS_ID, CATEGORIES, CITY, HOURS, IS_OPEN, LATITUDE, LONGITUDE, NAME, POSTAL_CODE, REVIEW_COUNT, STARS, STATE], splits=[1], filters=[[Filter on CITY: equal(CITY, ‘Cleveland Heights’) ]]) : rowType = RecordType(VARCHAR(65536) ADDRESS, ANY ATTRIBUTES, VARCHAR(65536) BUSINESS_ID, VARCHAR(65536) CATEGORIES, VARCHAR(65536) CITY, ANY HOURS, BIGINT IS_OPEN, DOUBLE LATITUDE, DOUBLE LONGITUDE, VARCHAR(65536) NAME, VARCHAR(65536) POSTAL_CODE, BIGINT REVIEW_COUNT, DOUBLE STARS, VARCHAR(65536) STATE): rowcount = 28891.35, cumulative cost = {28891.35 rows, 1675698.2999999998 cpu, 1675698.2999999998 io, 1675698.2999999998 network, 0.0 memory}, id = 407

Today I ran the same query using Spark and Drill. Both are pushdown 2 filters in parquet scan. I am not sure why Dremio only support 1 filter in ParquetScan. This will introduce much more data from parquet to Dremio.

I investigated this issue more and just want to check if dremio is doing row group filter on parquet scan?