We have two REST API data sources, e.g. A and B.
We build two plugins to connect them, e.g. PluginA and PluginB.
When we do the join (select A., B. from A, B where A.join_key = B.join_key ), logically it will get all result from both of A and B, then do the hash join with join_key.
Currently, we want to do some optimization: execute PluginA firstly and get the join_key, then send the join_key to PluginB.
But we found issues that the execution order of PluginA and PluginB is not consistent (seems it will execute the plugin with more fields). We hope it always run PluginA firstly. Could anyone advice how to control the execution order?
I tried to update the cost for different plugin to let dremio change the execution order. But seems not work. From the screenshot below, the different cost but have same execution order plan
@popejune What are the 2 sources? when you you build 2 plugins, are they any standard data source?
They are not standard one. It’s some kinds of REST API. We built it by ourselves since dremio not support it.
@popejune This is just for testing, try and disable
planner.enable_join_optimization and see if the join order is maintained as in SQL. This is a global setting so may affect other queries
It’s working. Let me do more testing.
@popejune This may affect reordering of joins, so testing would be great
Hi @popejune, don’t you mind share how to build customized plugin? Search around the web but I found only tutorial from Dremio, tried that but no luck at all.
@chulucninh09 Have you seen the below link?
Hi @balaji.ramaswamy, I read those link and the tutorial to make custom ARP connectors. Unfortunately, ARP is for SQL-like interface with type mapping & JDBC driver.
I want to know how @popejune do with API, since we have to handle a lot of things differently such as authentication, API throttling & retrying…
I do read Hive custom connector but It seemed overkill for me, if @popejune can give an overall idea, I believe I can do it easier.
Hi @chulucninh09 , you have to check the dremio source code and refer to some existing plugins, e.g. ElasticSearch.
Thanks @popejune, I looked into Hive instead of ES plugin. Thanks so much for pointing the right direction for me!