Will data load imediately from elasticsearch to dremio database?

  • Hi guys, i had many index from ES and it’s updated frequently every second. My question here is when an doc updated on my elasticsearch, will dremio relize it? Cause i see in the setting the min reload time is 1h. And if yes, what should i do to config dremio to do it. I’d try on small data and saw that it update imediatly, but i don’t know will it work with bigger?
  • I’m now connect ES with dremio and use JDBC to connect Java with dremio to extract SQL on dremio and ES. Is there any way setup better than my way?
    Thanks so much.

@robocon20x

You can write SQL via JDBC or directly via the Dremio UI, for the part where Dremio has to recognize the data at once, you can run the alter pds command once a a new index is created

But if the an existing index that Dremio already knows about is updated, then Dremio would anyway do a pushdown to Elasticsearch which may not require a metadata refresh

Do these updates just add documents to an existing index?

1 Like

ah no sir, i just need Dremio to use join sql with elasticsearch DB, my ES is updated with Hbase-Indexer. My question here just if i update an row on elasticsearch, will Dremio imediately know it? i had test with ES index with small data, but i’m not sure with the big one, which may have more than 2B doc per index. Thank a lot sir.

Hi @robocon20x Sorry for the late reply, in order for Dremio to know the ES data, you have to refresh metadata (happens automatically via background every hour)

Apologies for the late reply again

1 Like

Hi @robocon20x ,
I am not sure to understand your question, but if your question is : when will new source data be visible in dremio, you have two answers :

  • If you did not create any reflection (or there is no reflection matching your query plan), dremio will hit the source to retrieve the data, bringing back possibly new or updated data.

  • If dremio can use a reflection to answer quickly to your query, you will have a possibly outdated version of the data, because reflection refresh is made on a configurable schedule.

Dremio is better than a regular db engine where you have to mention the materialized view you want to use : it can guess the best available reflection to use in order to answer your query, even if you did not mention it in your SQL. But in the end you might end up with a possibly outdated result.

Did that help ?

1 Like