FileAlreadyExistsException in LuceneSearchIndex.commit()

Hey!

Today my Dremio running in production crashed and I seeing the logs I saw this stacktrace.

2019-10-30 18:25:37,108 [qtp1160298887-2380] ERROR c.d.d.server.GenericExceptionMapper - Unexpected exception when processing GET http://dremio.myhost.com/apiv2/jobs?sort=st&order=DESCENDING&filter=%28qt%3D%3D%22UI%22%2Cqt%3D%3D%22EXTERNAL%22%29%3B%28st%3Dgt%3D1572438343274%29&limit=100000 : org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
    at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:749) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
    at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:763) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
    at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3206) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
    at com.dremio.datastore.indexed.LuceneSearchIndex.commit(LuceneSearchIndex.java:267) ~[dremio-services-datastore-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
    at com.dremio.datastore.indexed.LuceneSearchIndex.access$400(LuceneSearchIndex.java:58) ~[dremio-services-datastore-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
    at com.dremio.datastore.indexed.LuceneSearchIndex$CommitterThread.commitLoop(LuceneSearchIndex.java:164) ~[dremio-services-datastore-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
    at com.dremio.datastore.indexed.LuceneSearchIndex$CommitterThread.access$000(LuceneSearchIndex.java:124) ~[dremio-services-datastore-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
    at com.dremio.datastore.indexed.LuceneSearchIndex$CommitterThread$1.run(LuceneSearchIndex.java:133) ~[dremio-services-datastore-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
    at java.lang.Thread.run(Thread.java:748) [na:1.8.0_222]
Caused by: java.nio.file.FileAlreadyExistsException: /mnt/dremio-metadata/db/search/jobs/core/_e3s.cfs
    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:88) ~[na:1.8.0_222]
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[na:1.8.0_222]
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[na:1.8.0_222]
    at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) ~[na:1.8.0_222]
    at java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434) ~[na:1.8.0_222]
    at java.nio.file.Files.newOutputStream(Files.java:216) ~[na:1.8.0_222]
    at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:413) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
    at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:409) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
    at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
    at org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
    at org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
    at org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
    at org.apache.lucene.codecs.lucene50.Lucene50CompoundFormat.write(Lucene50CompoundFormat.java:82) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:30:05]
    at org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:4945) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
    at org.apache.lucene.index.DocumentsWriterPerThread.sealFlushedSegment(DocumentsWriterPerThread.java:529) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
    at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:481) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
    at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:539) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
    at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:653) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
    at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:445) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
    at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:291) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
    at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:266) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
    at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:256) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
    at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:140) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
    at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:156) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
    at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
    at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
    at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
    at com.dremio.datastore.indexed.LuceneSearchIndex.checkIfChanged(LuceneSearchIndex.java:258) ~[dremio-services-datastore-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
    at com.dremio.datastore.indexed.LuceneSearchIndex.search(LuceneSearchIndex.java:380) ~[dremio-services-datastore-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
    at com.dremio.datastore.indexed.CoreSearchIterable$SearchIterator.hasNext(CoreSearchIterable.java:126) ~[dremio-services-datastore-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
    at com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:42) ~[guava-20.0.jar:na]
    at com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:42) ~[guava-20.0.jar:na]
    at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:249) ~[guava-20.0.jar:na]
    at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:209) ~[guava-20.0.jar:na]
    at com.dremio.dac.resource.JobsResource.getJobs(JobsResource.java:89) ~[dremio-dac-backend-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
    at sun.reflect.GeneratedMethodAccessor221.invoke(Unknown Source) ~[na:na]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_222]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_222]
    at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81) ~[jersey-server-2.25.1.jar:na]
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144) ~[jersey-server-2.25.1.jar:na]
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161) ~[jersey-server-2.25.1.jar:na]
    at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205) ~[jersey-server-2.25.1.jar:na]
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99) ~[jersey-server-2.25.1.jar:na]
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389) ~[jersey-server-2.25.1.jar:na]
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347) ~[jersey-server-2.25.1.jar:na]
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102) ~[jersey-server-2.25.1.jar:na]
    at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326) ~[jersey-server-2.25.1.jar:na]
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) [jersey-common-2.25.1.jar:na]
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) [jersey-common-2.25.1.jar:na]
    at org.glassfish.jersey.internal.Errors.process(Errors.java:315) [jersey-common-2.25.1.jar:na]
    at org.glassfish.jersey.internal.Errors.process(Errors.java:297) [jersey-common-2.25.1.jar:na]
    at org.glassfish.jersey.internal.Errors.process(Errors.java:267) [jersey-common-2.25.1.jar:na]
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317) [jersey-common-2.25.1.jar:na]
    at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305) [jersey-server-2.25.1.jar:na]
    at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154) [jersey-server-2.25.1.jar:na]
    at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473) [jersey-container-servlet-core-2.25.1.jar:na]
    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427) [jersey-container-servlet-core-2.25.1.jar:na]
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388) [jersey-container-servlet-core-2.25.1.jar:na]
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341) [jersey-container-servlet-core-2.25.1.jar:na]
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228) [jersey-container-servlet-core-2.25.1.jar:na]
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:873) [jetty-servlet-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623) [jetty-servlet-9.4.15.v20190215.jar:9.4.15.v20190215]
    at com.dremio.dac.server.SecurityHeadersFilter.doFilter(SecurityHeadersFilter.java:52) [dremio-dac-backend-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610) [jetty-servlet-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:51) [jetty-servlets-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610) [jetty-servlet-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) [jetty-servlet-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) [jetty-server-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345) [jetty-server-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) [jetty-server-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) [jetty-servlet-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) [jetty-server-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) [jetty-server-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) [jetty-server-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.server.handler.RequestLogHandler.handle(RequestLogHandler.java:56) [jetty-server-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) [jetty-server-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.server.Server.handle(Server.java:502) [jetty-server-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370) [jetty-server-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267) [jetty-server-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) [jetty-io-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) [jetty-io-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) [jetty-io-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) [jetty-util-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) [jetty-util-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) [jetty-util-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) [jetty-util-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) [jetty-util-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765) [jetty-util-9.4.15.v20190215.jar:9.4.15.v20190215]
    at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683) [jetty-util-9.4.15.v20190215.jar:9.4.15.v20190215]
    ... 1 common frames omitted

And after this I had this one:

2019-10-30 18:25:37,378 [shutdown-thread] INFO  c.d.d.s.TimingApplicationEventListener - ApplicationEventListener.onEvent DESTROY_FINISHED
2019-10-30 18:25:37,383 [shutdown-thread] INFO  c.d.exec.server.NodeRegistration - Waiting for work to complete before shutdown.
2019-10-30 18:25:37,384 [shutdown-thread] INFO  c.d.exec.server.NodeRegistration - Unregistering node ip-10-225-129-190.ec2.internal:31010
2019-10-30 18:25:37,412 [shutdown-thread] INFO  c.d.service.jobs.LocalJobsService - Stopping JobsService
2019-10-30 18:25:37,412 [shutdown-thread] INFO  c.d.s.s.LocalSchedulerService - Cancelling task com.dremio.service.jobs.LocalJobsService$JobResultsCleanupTask@1dcf83ea
2019-10-30 18:25:37,412 [shutdown-thread] INFO  c.d.s.s.LocalSchedulerService - Cancelling task com.dremio.service.jobs.LocalJobsService$JobProfilesCleanupTask@3cd300e8
2019-10-30 18:25:37,412 [shutdown-thread] INFO  c.d.service.jobs.LocalJobsService - Stopped JobsService
2019-10-30 18:25:37,413 [shutdown-thread] INFO  c.d.s.s.LocalSchedulerService - Cancelling task metadata-refresh-INFORMATION_SCHEMA
2019-10-30 18:25:37,413 [shutdown-thread] INFO  c.d.s.s.LocalSchedulerService - Cancelling task metadata-refresh-parquet_jcb_2
2019-10-30 18:25:37,413 [shutdown-thread] INFO  c.d.s.s.LocalSchedulerService - Cancelling task metadata-refresh-__jobResultsStore
2019-10-30 18:25:37,413 [shutdown-thread] INFO  c.d.s.s.LocalSchedulerService - Cancelling task metadata-refresh-histvlr
2019-10-30 18:25:37,415 [shutdown-thread] INFO  c.d.s.s.LocalSchedulerService - Cancelling task metadata-refresh-parquet_jcb_1
2019-10-30 18:25:37,415 [shutdown-thread] INFO  c.d.s.s.LocalSchedulerService - Cancelling task metadata-refresh-__logs
2019-10-30 18:25:37,415 [shutdown-thread] INFO  c.d.s.s.LocalSchedulerService - Cancelling task metadata-refresh-__support
2019-10-30 18:25:37,416 [shutdown-thread] INFO  c.d.s.s.LocalSchedulerService - Cancelling task metadata-refresh-__datasetDownload
2019-10-30 18:25:37,416 [shutdown-thread] INFO  c.d.s.s.LocalSchedulerService - Cancelling task metadata-refresh-sys
2019-10-30 18:25:37,416 [shutdown-thread] INFO  c.d.s.s.LocalSchedulerService - Cancelling task metadata-refresh-$scratch
2019-10-30 18:25:37,416 [shutdown-thread] INFO  c.d.s.s.LocalSchedulerService - Cancelling task metadata-refresh-parquet_usa_1
2019-10-30 18:25:37,416 [shutdown-thread] INFO  c.d.s.s.LocalSchedulerService - Cancelling task metadata-refresh-__home
2019-10-30 18:25:37,537 [shutdown-thread] INFO  c.d.s.s.LocalSchedulerService - Cancelling task metadata-refresh-Mirror-usa
2019-10-30 18:25:37,539 [shutdown-thread] INFO  c.d.s.s.LocalSchedulerService - Cancelling task metadata-refresh-__accelerator
2019-10-30 18:25:37,539 [shutdown-thread] INFO  c.d.s.s.LocalSchedulerService - Cancelling task metadata-refresh-parquet_bar_1
2019-10-30 18:25:37,539 [shutdown-thread] INFO  c.d.s.s.LocalSchedulerService - Cancelling task metadata-refresh-parquet_bar_2
2019-10-30 18:25:37,539 [shutdown-thread] INFO  c.d.s.s.LocalSchedulerService - Cancelling task catalog-source-synchronization
2019-10-30 18:25:37,540 [shutdown-thread] INFO  c.d.s.s.LocalSchedulerService - Cancelling task com.dremio.service.namespace.SplitOrphansCleanerService$$Lambda$133/1978096077@704e7cdb
2019-10-30 18:25:37,540 [shutdown-thread] INFO  c.d.s.s.LocalSchedulerService - Stopping SchedulerService
2019-10-30 18:25:38,541 [shutdown-thread] INFO  c.d.s.s.LocalSchedulerService - Stopped SchedulerService
2019-10-30 18:25:38,549 [shutdown-thread] INFO  c.d.sabot.rpc.user.UserRPCServer - [USER]: Server shutdown.
2019-10-30 18:25:38,551 [shutdown-thread] INFO  c.d.datastore.LocalKVStoreProvider - Stopping LocalKVStoreProvider
2019-10-30 18:26:23,430 [shutdown-thread] INFO  c.d.datastore.LocalKVStoreProvider - Stopped LocalKVStoreProvider
2019-10-30 18:26:23,431 [shutdown-thread] INFO  c.d.services.fabric.FabricServer - [FABRIC]: Server shutdown.
2019-10-30 18:26:23,432 [FABRIC-3] INFO  c.d.services.fabric.FabricClient - [FABRIC]: Channel closed null <--> null (fabric client)
2019-10-30 18:26:23,432 [FABRIC-5] INFO  c.d.services.fabric.FabricServer - [FABRIC]: Channel closed /10.225.129.190:45678 <--> /10.225.128.232:60170 (fabric server)
2019-10-30 18:26:23,433 [FABRIC-6] INFO  c.d.services.fabric.FabricServer - [FABRIC]: Channel closed /10.225.129.190:45678 <--> /10.225.130.30:54202 (fabric server)
2019-10-30 18:26:23,435 [shutdown-thread] INFO  c.d.s.c.TaskLeaderStatusListener - Stopping TaskLeaderStatusListener for: MASTER
2019-10-30 18:26:23,435 [shutdown-thread] INFO  c.d.s.c.TaskLeaderStatusListener - Stopped TaskLeaderStatusListener for: MASTER
2019-10-30 18:26:23,439 [shutdown-thread] INFO  c.d.s.coordinator.zk.ZKClusterClient - Stopping ZKClusterClient
2019-10-30 18:26:23,461 [shutdown-thread] INFO  c.d.s.coordinator.zk.ZKClusterClient - Stopped ZKClusterClient
2019-10-30 18:26:23,461 [shutdown-thread] INFO  com.dremio.dac.daemon.ZkServer - Stopping Zookeeper at localhost:2181
2019-10-30 18:26:23,487 [shutdown-thread] INFO  com.dremio.dac.daemon.ZkServer - Stopped Zookeeper at localhost:2181
2019-10-30 18:26:24,975 [main] INFO  com.dremio.common.config.SabotConfig - Configuration and plugin file(s) identified in 63ms.
Base Configuration:

I really don’t know what happened but looks like a race condition error.

I was investigating the code and I saw this:

private void commit() throws IOException {
    try (CommitCloser committer = commitWrapper.open(name)) {
      writer.commit();
      committer.succeeded();
    }
  }

and

try (WarningTimer watch = new WarningTimer("LuceneSearchIndex commit", 5000)) {
          try {
            commit();
          } catch (Throwable e) {
            commitException = e;
            return; // stop commit thread, next call to any other method will throw an exception
          }
        }

So, I did not understand this comment: stop commit thread, next call to any other method will throw an exception.

I have 2 questions:
1 - What can cause this error?
2 - Should we stop the thread and close all resources in this case?

2 Likes

@juliofalbo.ds

Did you run into max open files or disk full situation?

1 Like

Hi @balaji.ramaswamy, thanks for the help!

Are you talking about OS or Dremio config?

@juliofalbo.ds,

I am talking about Operating system open file limit for the Dremio user and if you ran out of OS disk space