I have one data source (Greenplum through PostgreSQL connector) and multiple datasets over it.
I join these datasets and the result contains only 8 rows. One of these datasets is fact and contains 160k rows.
And Dremio downloads all data before joining:
Input Input Bytes: 11.70 MB Input Records: 168,280 Output Output Bytes: 164 B Output Records: 8
And the query plan consists of reading data by 97%.
I wrote pure PostgreSQL query analog (join in the database, return the only result) and it works 10x faster.
So my question - how to push down join? (I can’t use external query because of multiple virtual datasets).
57f2076e-f31b-4be4-bb7a-9f07814f7a02.zip (70.4 КБ)