Hey everyone, I’ve been using Dremio-OSS inside my organization for a couple of years. We use it at what you might call a “boutique-scale” where we are managing recordsets topping out at a few hundred million rows. We implement RKE2 Kubernetes on a custom deployment platform we developed in-house to run on bare metal, virtual machines, or cloud instances with the same standards & architecture.
Having this superpower, then, we’re able to spawn instances to suit any use-case. Recently requirements demanded we isolate a portion of a client’s data on VMs in a HIPAA-compliant isolated cloud provider environment. However, their overall data-volume is such that we didn’t want to spend vast sums managing ALL their data on the cloud host when we have cheap, plentiful CPU and storage back at home that we don’t have to rent by the hour.
So, our solution is to run our K8s with Dremio on-premise for the larger non-sensitive data, and run our K8s with Dremio in the cloud for the more specific sensitive data. We can then consume both sources in the Tableau instance ALSO in the cloud, and provide the requested decision-support.
What I would LIKE to do, in addition to this, is allow the secure, cloud-based Dremio instance to consume records from the on-premise Dremio instance, giving us the ability to cache pre-aggregated result sets from the big remote data lake, and thereupon, provide a synthesis of sensitive-data results through a single ODBC connection to Tableau that doesn’t cost us big money to host.
I don’t see a way to accomplish this simply. I actually assumed there was a happy little narwhal in the external sources or data lakes connection options, and was sad to discover there’s none to be found.
I can mount the on-prem instance’s Minio storage in the cloud instance and sort of get what I want, but that doesn’t give me access to any of the Dremio virtual table goodness. Instead I have to materialize virtual table results there by doing CTAS queries or running DBT jobs. What I really want is direct ODBC or Arrow Flight access from one instance to the other, but I’m not sure how to implement that.
My only other idea was to look for some kind of ODBC proxy that could consume any source with a driver and represent it as something pedestrian like Postgres or MySQL. I considered trying AWS Glue, but that looks complicated and expensive.
Any other options or suggestions? Is this a Dremio Enterprise feature? Are there some tools or services I should be considering?