Difference in time between connecting through Dremio and connecting directly

supert165 · July 20, 2017, 11:22pm

Hello,

As a bit of background, I have an app that requires massive amounts of data pulls from different databases and I’m planning to use Dremio to streamline connecting to all these databases. I’m planning to do big data pulls (1.5 million rows at a time or more) and so timing is of utmost importance.

Currently, when I connect to my Postgres db directly through sparksql (the connector that must be used since I’m planning to do computations on the data sets), it is 30% faster than when I connect to my Postgres db through sparksql and Dremio using jdbc (12 sec user time vs 9 sec user time). I’ve kept the configuration options the same as the default configurations but I have looked at them and they look to be very good for my data. Is a 30% slow down something that should be expected when connecting through Dremio? Or should the configuration options be changed to support faster queries?

[All numbers above are for querying 1.5 million rows in a PostgreSQL database]

Thanks

steven · July 21, 2017, 1:31am

Putting Dremio in between sparksql and Postgres does introduce an extra hop, and this take some. It’s hard to say exactly how much slowdown to expect. You can get some idea of what is taking time by looking at the job profile. Find it in the jobs page of the Dremio UI. If you want, you can download it and share it here.

benoy · July 21, 2017, 3:14am

You can speed up any subsequent queries on the same dataset by enabling reflections. Reflections effectively cache the dataset (raw or aggregated) and serve queries from the cache.
More info here: https://docs.dremio.com/acceleration/

dmarkhas · July 21, 2017, 10:37am

It makes sense this will be slower as you are effectively moving the same data twice over the network (from Postgres to Dremio, then from Dremio to Spark).

Topic		Replies	Views
Performances comparisons	18	11807	February 1, 2021
Improving performance with MS SQL	5	2848	October 31, 2017
Query speed, Inside Dremio vs using ODBC driver	2	1677	October 19, 2018
Dremio ODBC driver vs Dremio JDBC driver	1	1910	November 26, 2020
Dremio slower than SQL Developer?	7	1797	May 15, 2020

Difference in time between connecting through Dremio and connecting directly

Related topics