I am new to dremio, having gone through the documentation I tried to evaluate the community edition in terms of latency between presto and athena but facing following issues
- My data is in parquet format partitioned on date
the preview show the column as dir0 with values as
dir0
eventdate=20200927
eventdate=20200930
eventdate=20200919
eventdate=20200918
eventdate=20200922
eventdate=20200929
eventdate=20200921
eventdate=20200920
.
.
.
I am trying to run a query with left join over 3 such data set filtered over above date partitioned over for 10 date. partition
I ran same query on similar config presto cluster it takes around 7-8 min on presto , while same query when I am trying on the dremio takes more than 30 min (then I cancelled without waiting for the result)
few things I noticed were
presto scanned around 200GB of data to give the result
while when dremio query was cancelled showed the data scan
ie
Input bytes : 600GB +
I am having hardtime to understand as I have why dremio is not doing partiton pruning/ filtering and scanning the whole lot amount of the data.
Note: I have not tried using reflections