Questions around Elasticsearch-Dremio integration

We are looking at using Dremio on top of our existing Elasticsearch cluster.We had the following queries around ES-Dremio integration

1)Query Interception

Scenario - Support different authorization models

Can we intercept ES queries before they are sent to the database and modify the query or index against which the query is fired

Use cases

a)The same query for two different clients (please note that we are a multi tenant system) must go to two separate ES indexes .

b)We want to add a filter to every ES query to support our authiorization constructs (currently based on ES terms based look up in a different index - https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html#_terms_lookup_twitter_example )

2)Joins and Concurrency

Scenario - Join b/w data in ES and Mysql

I am assuming Dremio must be pushing queries to ES and Mysql ,loading the data in memory or in a DB and performing the joins their . What is the data isolation provided in this temporary store .Is it possible that two users doing exactly the same join operation can see each other’s data ?

3)Special Operations
We want to run some special operations via user interface for eg group by immediate reportees - which gets a user’s immediate reportees from mysql and runs a group by user query on ES including only those users

4)ES features supported

We heavily use ES nested fields and parent child relations .I think they are not supported in Dremio currently ? Are these support in enterprise version

Let me try and answer some of these questions…

1.a - An Elasticsearch index will appear as a table in Dremio. You can query the appropriate index accordingly.
1.b - You could pass the list of terms in a WHERE clause using IN. I suppose with this approach you would first query Elasticsearch to get the list, then use that to construct the SQL expression. I don’t know of a way to take advantage of the terms lookup mechanism via SQL.

Related, each Dremio query has the notion of an external user. You can use this information in a case statement to affect the query at query time. See here: https://docs.dremio.com/security/row-level-permissions.html

  1. Dremio performs joins in-memory. Queries are isolated and cannot see each other’s data. No data is persisted to disk, with two exceptions: 1) there is insufficient memory, in which case Dremio may spill data to disk in a secure, temporary area; 2) Dremio’s Data Reflections allow you to materialize data for accelerating different query patterns. Access to data stored in Data Reflections is the same as access controls applied to the physical source.

  2. This should work fine.

  3. Dremio supports nested fields in JSON in Elasticsearch. You can elaborate more to see if there is a known limitation.

Hope this helps