Dremio 17 + ES7.8: IllegalStateException: Invalid response

marcosllama · September 15, 2021, 2:43pm

Hello,

My Dremio 17 on ES7.8 was working fine and stable, with reflections refreshing every hour. Today, reflections stop refreshing and queries stop working on 2 of the 3 promoted ES indexes and all dependent datasets and reflections. No changes have been made in Dremio or ES lately. Indexes are OK in ES.

Error message is: IllegalStateException: Invalid response

Versions:
dremio-community-17.0.0-202107060524010627_31b5222b_1.noarch on CentOS 7
elasticsearch-7.8.0-1.x86_64 on CentOS 7

/var/log/dremio/queries.json:

> {"queryId":"1ebdfd6d-cfc2-16a9-fed3-0b68c5b53700","schema":"[***, al*****e]","queryText":"SELECT * FROM \"_doc\"","start":1631715985527,"finish":1631715987281,"outcome":"FAILED","outcomeReason":"SYSTEM ERROR: IllegalStateException: Invalid response\n\nSqlOperatorImpl ELASTICSEARCH_SUB_SCAN\nLocation 1:1:5\nFragment 1:0\n\n[Error Id: 6af94f73-8458-4720-8026-e9f94b79faff on gr*****.es:0]\n\n (java.lang.IllegalStateException) Invalid response\n com.google.common.base.Preconditions.checkState():508\n com.dremio.plugins.elastic.execution.ElasticSearch7JsonReader.getScrollAndTotalSizeThenSeekToHits():51\n com.dremio.plugins.elastic.execution.ElasticsearchRecordReader.getFirstPage():227\n com.dremio.plugins.elastic.execution.ElasticsearchRecordReader.next():272\n com.dremio.sabot.op.scan.ScanOperator.outputData():323\n com.dremio.sabot.driver.SmartOp$SmartProducer.outputData():521\n com.dremio.sabot.driver.StraightPipe.pump():56\n com.dremio.sabot.driver.Pipeline.doPump():108\n com.dremio.sabot.driver.Pipeline.pumpOnce():98\n com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():345\n com.dremio.sabot.exec.fragment.FragmentExecutor.run():294\n com.dremio.sabot.exec.fragment.FragmentExecutor.access$1600():94\n com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():747\n com.dremio.sabot.task.AsyncTaskWrapper.run():112\n com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():243\n com.dremio.sabot.task.slicing.SlicingThread.run():171\n","username":"admin"}

Restarted ES and Dremio, but the problem remains.

Any help would be appreciated

Thank you
Marcos

balaji.ramaswamy · September 16, 2021, 7:26am

@marcosllama Are you able to send the profile when this was working and failing?

marcosllama · September 16, 2021, 8:03am

da77f504-eda1-4297-8a79-77f1d6f8efcc.zip (30,0 KB)
935f6eb0-156f-44dd-848c-0e0ec7729d70.zip (30,5 KB)

Hello, this is a reflection job working on sep 14th ~3AM and failing on sep 15th ~3AM

Thank you
Marcos

marcosllama · September 17, 2021, 10:57am

Update: I installed a fresh instance of Dremio on a different machine, and I can query the promoted ES indexes just fine, while queries on the same ES indexes keep failing on my production Dremio.

balaji.ramaswamy · September 21, 2021, 6:15am

@marcosllama From an Elasticseach client, are you able to run the below ES command, like say from Kibana and see if you get results back?

{
  "from" : 0,
  "size" : 4000,
  "query" : {
    "range" : {
      "@timestamp" : {
        "from" : "2021-09-14T02:29:02.351Z",
        "to" : null,
        "include_lower" : false,
        "include_upper" : true,
        "format" : "date_time",
        "boost" : 1.0
      }
    }
  }
}

marcosllama · September 21, 2021, 7:36am

ES.zip (107,4 KB)

Here it is. Sensitive data has been “Obfuscated”.

Thank you,
Marcos

balaji.ramaswamy · September 25, 2021, 10:30pm

@marcosllama Seems like intermittent, is this failure very consistent? are there multiple elasticsearch servers that one is probably giving back the “Invalid response”?

marcosllama · September 26, 2021, 8:21pm

Hi,

No, it’s not intermittent. It fails every time I try to query 2 of the 3 promoted ES indexes. The remaining index and all dependent datasets and reflections work fine.

There is only one ES server.

Marcos

marcosllama · September 27, 2021, 10:15am

One particular thing about the indexes: My ES indexes are created and named on a monthly basis (i.e. indexABC-2021.08, indexABC-2021.09…) so in order to keep name consistency, I have created rolling aliases (alias-indexABC) containing the two last indexes: It means every month the aliases are modified in order to add the last index and delete the index from two months back. ¿Maybe this operation can affect Dremio?

Update: I performed a test on the remaining working index: I re-added an index from the previous month to the alias (which had been previously removed as per the procedure described above), and the alias index started failing on Dremio, exactly with the same error messages: VALIDATION ERROR: Table ‘json’ not found and IllegalStateException: Invalid response. I see now that on the other two failing indexes, something similar might have happened (previously removed
months were readded to the alias).

2nd update: Confirmed & SOLVED: In the original failing aliases, indexes from previous months (that had been initially correctly removed) had been erroneously re-added to the aliases, causing them to fail (does this make sense to you? HINT: Reflection refresh is set to “Incremental” using the ES @timestamp field). I removed these indexes from the aliases and THEY STARTED WORKING AGAIN. Though not immediately: I had to wait over 12 hours for the queries to start responding again in Dremio (I don’t completely understand this, as both reflection&metadata refresh periods are set to 1 hour.

balaji.ramaswamy · September 29, 2021, 7:08am

@marcosllama It could be your background metadata refresh is taking > 1 hour?

marcosllama · September 30, 2021, 11:09am

That’s possibly the reason. Thank you for your help!

Topic		Replies	Views
Elasticsearch 7.4.2 compatibility	4	1968	March 4, 2020
IllegalStateException: Dataset is not read correctly	1	20	May 7, 2025
Dremio compatibility with elasticsearch 7.0.1	10	2573	September 23, 2020
Error while epxloring Dataset Elasticsearch	6	1602	April 20, 2018
IllegalStateException when performing a select * on an AWS ELasticsearch	0	1247	July 25, 2019

Dremio 17 + ES7.8: IllegalStateException: Invalid response

Related topics