Questions around Elasticsearch-Dremio integration

anshal · November 9, 2018, 5:53am

We are looking at using Dremio on top of our existing Elasticsearch cluster.We had the following queries around ES-Dremio integration

1)Query Interception

Scenario - Support different authorization models

Can we intercept ES queries before they are sent to the database and modify the query or index against which the query is fired

Use cases

a)The same query for two different clients (please note that we are a multi tenant system) must go to two separate ES indexes .

b)We want to add a filter to every ES query to support our authiorization constructs (currently based on ES terms based look up in a different index - https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html#_terms_lookup_twitter_example )

2)Joins and Concurrency

Scenario - Join b/w data in ES and Mysql

I am assuming Dremio must be pushing queries to ES and Mysql ,loading the data in memory or in a DB and performing the joins their . What is the data isolation provided in this temporary store .Is it possible that two users doing exactly the same join operation can see each other’s data ?

3)Special Operations
We want to run some special operations via user interface for eg group by immediate reportees - which gets a user’s immediate reportees from mysql and runs a group by user query on ES including only those users

4)ES features supported

We heavily use ES nested fields and parent child relations .I think they are not supported in Dremio currently ? Are these support in enterprise version

kelly · November 14, 2018, 7:58pm

Let me try and answer some of these questions…

1.a - An Elasticsearch index will appear as a table in Dremio. You can query the appropriate index accordingly.
1.b - You could pass the list of terms in a WHERE clause using IN. I suppose with this approach you would first query Elasticsearch to get the list, then use that to construct the SQL expression. I don’t know of a way to take advantage of the terms lookup mechanism via SQL.

Related, each Dremio query has the notion of an external user. You can use this information in a case statement to affect the query at query time. See here: https://docs.dremio.com/security/row-level-permissions.html

Dremio performs joins in-memory. Queries are isolated and cannot see each other’s data. No data is persisted to disk, with two exceptions: 1) there is insufficient memory, in which case Dremio may spill data to disk in a secure, temporary area; 2) Dremio’s Data Reflections allow you to materialize data for accelerating different query patterns. Access to data stored in Data Reflections is the same as access controls applied to the physical source.
This should work fine.
Dremio supports nested fields in JSON in Elasticsearch. You can elaborate more to see if there is a known limitation.

Hope this helps

Topic		Replies	Views
Dremio performance on Elasticsearch cluster	10	1424	September 21, 2018
ES-Dremio: error in retrieving tables	12	1523	December 7, 2018
Query pushdown on elasticsearch	12	1823	August 22, 2019
Unlocking SQL on Elasticsearch Tutorials	0	1698	July 29, 2017
How to get all data of ElasticSearch to Dremio	5	1584	August 11, 2022

Questions around Elasticsearch-Dremio integration

Related topics