Today I ran a SQL on parquet though Dremio 3.3.2 with 2 filters. But when I looked at the physical plan, only 1 filter push downed into parquet scan. I ran the same SQL on Apache Drill and 2 filters are push downed.
71add1dd-d654-4f90-9688-055b54f510e1.zip (10.9 KB)
Hi Balaji,
I have attached the job profile. I ran the following query in Dremio 3.3.2 UI
SELECT * FROM âyelp-businessâ where ADDRESS=â3077 Mayfield Rdâ and CITY=âCleveland Heightsâ
yelp-business is a yelp dataset in parquet format.
In physical plan, only 1 filter is push-down into parquet scan.
ParquetScan(table=["@fliu".âyelp-businessâ], columns=[ADDRESS
, ATTRIBUTES
, BUSINESS_ID
, CATEGORIES
, CITY
, HOURS
, IS_OPEN
, LATITUDE
, LONGITUDE
, NAME
, POSTAL_CODE
, REVIEW_COUNT
, STARS
, STATE
], splits=[1], filters=[[Filter on CITY
: equal(CITY
, âCleveland Heightsâ) ]]) : rowType = RecordType(VARCHAR(65536) ADDRESS, ANY ATTRIBUTES, VARCHAR(65536) BUSINESS_ID, VARCHAR(65536) CATEGORIES, VARCHAR(65536) CITY, ANY HOURS, BIGINT IS_OPEN, DOUBLE LATITUDE, DOUBLE LONGITUDE, VARCHAR(65536) NAME, VARCHAR(65536) POSTAL_CODE, BIGINT REVIEW_COUNT, DOUBLE STARS, VARCHAR(65536) STATE): rowcount = 28891.35, cumulative cost = {28891.35 rows, 1675698.2999999998 cpu, 1675698.2999999998 io, 1675698.2999999998 network, 0.0 memory}, id = 407
Today I ran the same query using Spark and Drill. Both are pushdown 2 filters in parquet scan. I am not sure why Dremio only support 1 filter in ParquetScan. This will introduce much more data from parquet to Dremio.
I investigated this issue more and just want to check if dremio is doing row group filter on parquet scan?