We are working on a node.js application that is querying Dremio to populate some dashboards. We are seeing dramatic data transfer speed differences between CentOS and MacOs.
We are benchmarking against a query that returns about 57k records.
On our local MacOS based development machines, connecting to a Dremio cluster using the Mac ODBC connection is very fast. The query runs and receives the data in about 4 seconds. During the query node runs at 60% CPU for < 2 seconds.
On the production server running CentOS and using the Linux ODBC connection to connect to the same cluster, and the same query takes 2+ minutes. During these 2 minutes, node is running at 100% CPU.
We have ruled out network topologies and speeds as a factor.
Are there known performance issues with the Linux ODBC connector?
Are there settings to tune in the Linux ODBC connector that could improve its performance?
Have you compared the Dremio query profiles for the 2 different runs? In the final (0) phase of each query, what times do you see for the “Blocked on Downstream”? You can share the profiles as described here.
To answer your question, there is no known performance issue with the Linux ODBC connector.
On the slow one Dremio completed processing records in < 1s but the ODBC client is not ready to accept the records, that was is called “Blocked on Downstream” on Phase 0 as there is nothing downstream in Dremio after that, its the ODBC client
Now you also said ODCB from Mac is faster. Tell us about the locations of the Mac and CentOS with respect to Dremio?
What happens if you transfer say a 100 MB file from the Dremio coordinator to the MAC and to the CentOS servers, is it the same speed?
Transferring a 100MB file from the Dremio Coordinator to the MAC takes about 7 seconds, and to CentOS server takes < 1 sec so I don’t think network speed is the issue…
There seems to be something bottlenecking at the Node / ODBC driver level on CentOS, but I’m not sure how to dig deeper into that to discover exactly what.