For some time (2+ weeks) Dremio started giving such errors during CREATE TABLE query (and some SELECT):
VALIDATION ERROR: this IndexWriter is closed
SQL Query CREATE TABLE etl_out."/folder1/folder2/folder3" AS SELECT * FROM etl_in."/folder4/results/400/data.parquet" (org.apache.lucene.store.AlreadyClosedException) this IndexWriter is closed
org.apache.lucene.index.IndexWriter.ensureOpen():749
org.apache.lucene.index.IndexWriter.ensureOpen():763
org.apache.lucene.index.IndexWriter.updateDocument():1567
com.dremio.datastore.indexed.LuceneSearchIndex.update():296
com.dremio.datastore.indexed.CoreIndexedStoreImpl.index():215
com.dremio.datastore.indexed.CoreIndexedStoreImpl.put():209
com.dremio.datastore.indexed.CoreIndexedStoreImpl.put():52
com.dremio.datastore.CoreBaseTimedStore.put():77
com.dremio.datastore.CoreBaseTimedStore$TimedIndexedStoreImplCore.put():143
com.dremio.datastore.indexed.LocalIndexedStore.put():103
com.dremio.service.namespace.NamespaceServiceImpl$DatasetMetadataSaverImpl.savePartitionChunk():538
com.dremio.exec.catalog.SafeNamespaceService$1.lambda$savePartitionChunk$2():304
com.dremio.exec.catalog.ManagedStoragePlugin$SafeRunner.doSafe():981
com.dremio.exec.catalog.SafeNamespaceService$1.savePartitionChunk():304
com.dremio.exec.catalog.DatasetSaver.save():107
com.dremio.exec.catalog.DatasetSaver.save():154
com.dremio.exec.catalog.DatasetManager.getTableFromPlugin():349
com.dremio.exec.catalog.DatasetManager.getTable():209
com.dremio.exec.catalog.CatalogImpl.getTable():130
com.dremio.exec.catalog.SourceAccessChecker.lambda$getTable$3():103
com.dremio.exec.catalog.SourceAccessChecker.checkAndGetTable():82
com.dremio.exec.catalog.SourceAccessChecker.getTable():103
com.dremio.exec.catalog.DelegatingCatalog.getTable():66
com.dremio.exec.catalog.CachingCatalog.getTable():93
com.dremio.exec.catalog.DremioCatalogReader.getTable():94
com.dremio.exec.catalog.DremioCatalogReader.getTable():71
org.apache.calcite.sql.validate.EmptyScope.getTableNamespace():76
org.apache.calcite.sql.validate.DelegatingScope.getTableNamespace():197
org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl():102
org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl():120
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():943
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():924
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2971
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2956
org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():3197
org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():943
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():924
org.apache.calcite.sql.SqlSelect.validate():226
org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():899
org.apache.calcite.sql.validate.SqlValidatorImpl.validate():609
com.dremio.exec.planner.sql.SqlConverter.validate():229
com.dremio.exec.planner.sql.handlers.PrelTransformer.validateNode():184
com.dremio.exec.planner.sql.handlers.PrelTransformer.validateAndConvert():173
com.dremio.exec.planner.sql.handlers.PrelTransformer.validateAndConvert():169
com.dremio.exec.planner.sql.handlers.query.CreateTableHandler.getPlan():84
com.dremio.exec.planner.sql.handlers.commands.HandlerToPreparePlan.plan():89
com.dremio.exec.work.foreman.AttemptManager.plan():421
com.dremio.exec.work.foreman.AttemptManager.lambda$run$0():324
com.dremio.service.commandpool.CommandWrapper.run():62
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748
Caused By (org.apache.lucene.index.CorruptIndexException) checksum failed (hardware problem?) : expected=c72546fd actual=afb25909 (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/opt/dremio/data/db/search/metadata-dataset-splits/core/_6r_Lucene54_0.dvd")))
org.apache.lucene.codecs.CodecUtil.checkFooter():419
org.apache.lucene.codecs.CodecUtil.checksumEntireFile():526
org.apache.lucene.codecs.lucene54.Lucene54DocValuesProducer.checkIntegrity():474
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.checkIntegrity():366
org.apache.lucene.codecs.DocValuesConsumer.merge():137
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge():153
org.apache.lucene.index.SegmentMerger.mergeDocValues():167
org.apache.lucene.index.SegmentMerger.merge():111
org.apache.lucene.index.IndexWriter.mergeMiddle():4356
org.apache.lucene.index.IndexWriter.merge():3931
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge():624
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run():661
The usual solution for this problem was to manually restart Dremio and for some time queries were working. But, lately, it got kinda worse (or I messed up something while trying to fix the issue )
After searching on the Internet found this thread - https://lucene.472066.n3.nabble.com/checksum-failed-hardware-problem-td4407328.html . It suggests that it can be a problem with disks health.
I have 3 disks mounted on a single Linux machine. smartctl
short test shows no sign of bad sectors, etc. Also, I do not have any other issues with reading/writing on those disks. So, hardware health doesn’t seem like a problem.
From the error message above, you can see that Dremio (Lucene to be precise) is trying to read a .dvd
file from the path: /opt/dremio/data/db/search/metadata-dataset-splits/core/_6r_Lucene54_0.dvd
. On every failed query with IndexWriter error, the reason is failed checksum for this file (checksum’s actual and expected values from the error are the same for different queries).
Also, I spend some time reading Lucene Java code but didn’t find any clues (I am not a Java dev).
Did you have a similar issue? Do you have any ideas how I can investigate it further? Is it because of multiple used disks together or, maybe, filesystem? Or a bug in Lucene?
Thank you very much!
Apache Arrow is used for querying.
Dremio version: 4.1.7
Lucene version: 6.6.0 (https://github.com/apache/lucene-solr/tree/branch_6_6/lucene/core/src/java/org/apache/lucene)