If we have a situation that we should deploy two dremio clusters in two CDH ,because the two clusters are isolation.And we want to connect one to another to analyze data。So,i have developed a Dremio-connector using dremio ARP.But i don’t have much self-confidense about the performance.Can anyone give some suggestions,or make me know i am in correct way .
You could do that, but the problem is you’ll be pulling data from one cluster to the other through a single JDBC connection. If your network is fast between the clusters, it’s probably better to simply read the data in CDH1 from a Dremio environment on CDH2. You could use C3 (columnar cloud cache) if the network isn’t fast enough.
Thanks for your suggestion!
And i have test the Dremio-connector developed by ARP. The result is just like you said “you’ll be pulling data from one cluster to the other through a single JDBC connection”.
When i write a sql “select * from CDH2.table where age > 10” using CDH1 dremio , the sql CDH2 dremio received is “select * from table”.So i want to know why the “where age > 10” did’nt be sent to CDH2 dremio?It means that there is no effective about “query pushdown” .
Can you tell me ,how should i develop a “pushdown” Dremio-connector ?
(Because the two CDH cluster is isolation,and we must use nginx proxy to connect each other ,so the dremio-connector is needed)
You can implement push-downs in an ARP connector. You should probably read some of the documents on this and explore one of the existing open source connectors like this https://github.com/narendrans/dremio-snowflake