If we have a situation that we should deploy two dremio clusters in two CDH ,because the two clusters are isolation.And we want to connect one to another to analyze data。So,i have developed a Dremio-connector using dremio ARP.But i don’t have much self-confidense about the performance.Can anyone give some suggestions,or make me know i am in correct way .
You could do that, but the problem is you’ll be pulling data from one cluster to the other through a single JDBC connection. If your network is fast between the clusters, it’s probably better to simply read the data in CDH1 from a Dremio environment on CDH2. You could use C3 (columnar cloud cache) if the network isn’t fast enough.
Thanks for your suggestion!
And i have test the Dremio-connector developed by ARP. The result is just like you said “you’ll be pulling data from one cluster to the other through a single JDBC connection”.
When i write a sql “select * from CDH2.table where age > 10” using CDH1 dremio , the sql CDH2 dremio received is “select * from table”.So i want to know why the “where age > 10” did’nt be sent to CDH2 dremio?It means that there is no effective about “query pushdown” .
Can you tell me ,how should i develop a “pushdown” Dremio-connector ?
(Because the two CDH cluster is isolation,and we must use nginx proxy to connect each other ,so the dremio-connector is needed)
You can implement push-downs in an ARP connector. You should probably read some of the documents on this and explore one of the existing open source connectors like this https://github.com/narendrans/dremio-snowflake
I saw the gihub snowflake arp connector build in java. can it be build in other language? Also once developed, how to deploy it?
Currently only in Java, Have you gone through the contribute section?