We are running Dremio 4.1.7 on top of a 16 core machine with 6 GB of heap memory and 8 GB of direct memory
We are running a FULL OUTER JOIN query on two rather small S3 data-sets, consisting of several parquet files each, Sorting by timestamp in ascending order on the joined result-set.
WHEN tbl1.“timestamp” IS NULL THEN tbl2.“timestamp”
END AS ts,
FROM “s3”.“src1” AS tbl1
FULL OUTER JOIN “s3”.“src2” AS tbl2
ON tbl1.“timestamp” * 10 = tbl2.“timestamp”
ORDER BY ts ASC
LIMIT 500 OFFSET 0 ROWS
Now the bigger the OFFSET is (say, 50000), the longer the query takes to execute, with most of the time being spent on TOP_N_SORT.
While that makes sense in regards to what the query is actually doing, queries that take 6 seconds with OFFSET 0 take 25 seconds with OFFSET 50000.
My question is - is there a way speed up TOP_N_SORT? is there something wrong in the way the query is written/executed?
10x in advance