As a bit of background, I have an app that requires massive amounts of data pulls from different databases and I’m planning to use Dremio to streamline connecting to all these databases. I’m planning to do big data pulls (1.5 million rows at a time or more) and so timing is of utmost importance.
Currently, when I connect to my Postgres db directly through sparksql (the connector that must be used since I’m planning to do computations on the data sets), it is 30% faster than when I connect to my Postgres db through sparksql and Dremio using jdbc (12 sec user time vs 9 sec user time). I’ve kept the configuration options the same as the default configurations but I have looked at them and they look to be very good for my data. Is a 30% slow down something that should be expected when connecting through Dremio? Or should the configuration options be changed to support faster queries?
[All numbers above are for querying 1.5 million rows in a PostgreSQL database]