If you do not have too many reflections, you should be able to just disable-save-enable-save and reflection should turn green. Or you can wait for the next scheduled refresh too
Thanks
@balaji.ramaswamy
If you do not have too many reflections, you should be able to just disable-save-enable-save and reflection should turn green. Or you can wait for the next scheduled refresh too
Thanks
@balaji.ramaswamy
How many reflections is too many? We’re having a similar issue. Is there a way to set the refresh schedule at the datasource level so some would get refreshed at 1AM and others at 6AM?
From the UI, there’s currently not a way to really schedule a refresh at a specific time. You can “manually” do this using the method described above (disable-save-enable-save) at the time you want the refresh to start. Alternatively you could schedule a REST API call to do the same action. See this post for an example of how you might achieve this:
Ben,
We have a bigger issue. Every day we have to recreate our reflections and the following day they all have a “Something went wrong – The IndexWriter is closed” error. When we try to run a query we get the following error message:
Error while running command to get file permissions : java.io.IOException: Cannot run program “ls”: error=24, Too many open files at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:913) at org.apache.hadoop.util.Shell.run(Shell.java:869) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170) at org.apache.hadoop.util.Shell.execCommand(Shell.java:1264) at org.apache.hadoop.util.Shell.execCommand(Shell.java:1246)
at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1078) at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:697) at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:672)
at com.dremio.exec.store.dfs.PseudoDistributedFileSystem$GetFileStatusTask.reduce(PseudoDistributedFileSystem.java:1169) at com.dremio.exec.store.dfs.PseudoDistributedFileSystem$GetFileStatusTask.reduce(PseudoDistributedFileSystem.java:1114) at com.dremio.exec.store.dfs.PseudoDistributedFileSystem$PDFSDistributedTask.get(PseudoDistributedFileSystem.java:861)
at com.dremio.exec.store.dfs.PseudoDistributedFileSystem.getFileStatus(PseudoDistributedFileSystem.java:702) at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1450) at com.dremio.exec.store.dfs.FileSystemWrapper.isDirectory(FileSystemWrapper.java:971)
at com.dremio.service.jobs.JobResultsStore.loadJobData(JobResultsStore.java:191) at com.dremio.service.jobs.LocalJobsService$InternalJobLoader.load(LocalJobsService.java:1063) at com.dremio.service.jobs.JobDataImpl.range(JobDataImpl.java:52) at com.dremio.service.jobs.JobDataWrapper.range(JobDataWrapper.java:37)
at com.dremio.dac.model.job.JobDataWrapper.range(JobDataWrapper.java:37) at com.dremio.dac.resource.JobResource.getDataForVersion(JobResource.java:161) at sun.reflect.GeneratedMethodAccessor535.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161) at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99) at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102) at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
at org.glassfish.jersey.internal.Errors.process(Errors.java:315) at org.glassfish.jersey.internal.Errors.process(Errors.java:297) at org.glassfish.jersey.internal.Errors.process(Errors.java:267) at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305) at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154) at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473) at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669) at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83) at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:365) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.handler.RequestLogHandler.handle(RequestLogHandler.java:95) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:258) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: error=24, Too many
open files at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) … 65 more
Is there something that Dremio does daily that would be causing this? Is it a Linux permissions error of some kind? I’ve already set the system NOFILES to 63556.
Also, community.dremio.com seems to be down this morning.
Matt
Ben,
We have a bigger issue. Every day we have to recreate our reflections and the following day they all have a “Something went wrong – The IndexWriter is closed” error. When we try to run a query we get the following error message:
Error while running command to get file permissions : java.io.IOException: Cannot run program “ls”: error=24, Too many open files at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:913) at org.apache.hadoop.util.Shell.run(Shell.java:869) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170) at org.apache.hadoop.util.Shell.execCommand(Shell.java:1264) at org.apache.hadoop.util.Shell.execCommand(Shell.java:1246)
at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1078) at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:697) at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:672)
at com.dremio.exec.store.dfs.PseudoDistributedFileSystem$GetFileStatusTask.reduce(PseudoDistributedFileSystem.java:1169) at com.dremio.exec.store.dfs.PseudoDistributedFileSystem$GetFileStatusTask.reduce(PseudoDistributedFileSystem.java:1114) at com.dremio.exec.store.dfs.PseudoDistributedFileSystem$PDFSDistributedTask.get(PseudoDistributedFileSystem.java:861)
at com.dremio.exec.store.dfs.PseudoDistributedFileSystem.getFileStatus(PseudoDistributedFileSystem.java:702) at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1450) at com.dremio.exec.store.dfs.FileSystemWrapper.isDirectory(FileSystemWrapper.java:971)
at com.dremio.service.jobs.JobResultsStore.loadJobData(JobResultsStore.java:191) at com.dremio.service.jobs.LocalJobsService$InternalJobLoader.load(LocalJobsService.java:1063) at com.dremio.service.jobs.JobDataImpl.range(JobDataImpl.java:52) at com.dremio.service.jobs.JobDataWrapper.range(JobDataWrapper.java:37)
at com.dremio.dac.model.job.JobDataWrapper.range(JobDataWrapper.java:37) at com.dremio.dac.resource.JobResource.getDataForVersion(JobResource.java:161) at sun.reflect.GeneratedMethodAccessor535.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161) at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99) at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102) at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
at org.glassfish.jersey.internal.Errors.process(Errors.java:315) at org.glassfish.jersey.internal.Errors.process(Errors.java:297) at org.glassfish.jersey.internal.Errors.process(Errors.java:267) at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305) at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154) at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473) at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669) at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83) at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:365) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.handler.RequestLogHandler.handle(RequestLogHandler.java:95) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:258) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: error=24, Too many
open files at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) … 65 more
When I run the Linux lsof command it shows that dremio has 486299 open files, which exceeds the limit set in the OS.
Is this expected behavior?
Our server.out contains:
Wed Feb 6 13:52:49 EST 2019 Starting dremio on XXX.XXX.com
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 125002
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 125002
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
And server.log contains:
Caused by: java.nio.file.FileSystemException: /app/dremio/data/db/search/materialization_store/core/_6j4_Lucene54_0.dvd: Too many open files
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) ~[na:1.8.0_191]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[na:1.8.0_191]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[na:1.8.0_191]
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) ~[na:1.8.0_191]
at java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434) ~[na:1.8.0_191]
at java.nio.file.Files.newOutputStream(Files.java:216) ~[na:1.8.0_191]
at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.(Lucene54DocValuesConsumer.java:73) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.codecs.lucene54.Lucene54DocValuesFormat.fieldsConsumer(Lucene54DocValuesFormat.java:108) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.getInstance(PerFieldDocValuesFormat.java:213) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addNumericField(PerFieldDocValuesFormat.java:111) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.NumericDocValuesWriter.flush(NumericDocValuesWriter.java:96) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.DefaultIndexingChain.writeDocValues(DefaultIndexingChain.java:258) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:142) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:444) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:539) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:653) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:445) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:291) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:266) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:256) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:140) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:156) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at com.dremio.datastore.indexed.LuceneSearchIndex.checkIfChanged(LuceneSearchIndex.java:265) ~[dremio-services-datastore-3.1.0-201901172111160703-dc6f6e5.jar:3.1.0-201901172111160703-dc6f6e5]
at com.dremio.datastore.indexed.LuceneSearchIndex.search(LuceneSearchIndex.java:387) ~[dremio-services-datastore-3.1.0-201901172111160703-dc6f6e5.jar:3.1.0-201901172111160703-dc6f6e5]
at com.dremio.datastore.indexed.CoreSearchIterable$SearchIterator.hasNext(CoreSearchIterable.java:126) ~[dremio-services-datastore-3.1.0-201901172111160703-dc6f6e5.jar:3.1.0-201901172111160703-dc6f6e5]
at com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:42) ~[guava-20.0.jar:na]
at com.google.common.collect.Iterators.getNext(Iterators.java:818) ~[guava-20.0.jar:na]
at com.google.common.collect.Iterables.getFirst(Iterables.java:753) ~[guava-20.0.jar:na]
at com.dremio.service.reflection.store.MaterializationStore.getMostRecentMaterialization(MaterializationStore.java:221) ~[dremio-services-accelerator-3.1.0-201901172111160703-dc6f6e5.jar:3.1.0-201901172111160703-dc6f6e5]
at com.dremio.service.reflection.store.MaterializationStore.getRefreshesExclusivelyOwnedBy(MaterializationStore.java:259) ~[dremio-services-accelerator-3.1.0-201901172111160703-dc6f6e5.jar:3.1.0-201901172111160703-dc6f6e5]
at com.dremio.service.reflection.ReflectionManager.deleteMaterialization(ReflectionManager.java:469) ~[dremio-services-accelerator-3.1.0-201901172111160703-dc6f6e5.jar:3.1.0-201901172111160703-dc6f6e5]
at com.dremio.service.reflection.ReflectionManager.deleteDeprecatedMaterializations(ReflectionManager.java:245) ~[dremio-services-accelerator-3.1.0-201901172111160703-dc6f6e5.jar:3.1.0-201901172111160703-dc6f6e5]
at com.dremio.service.reflection.ReflectionManager.run(ReflectionManager.java:178) ~[dremio-services-accelerator-3.1.0-201901172111160703-dc6f6e5.jar:3.1.0-201901172111160703-dc6f6e5]
at com.dremio.common.WakeupHandler$1.run(WakeupHandler.java:63) [dremio-common-3.1.0-201901172111160703-dc6f6e5.jar:3.1.0-201901172111160703-dc6f6e5]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_191]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_191]
… 1 common frames omitted
Our dremio.conf file is:
paths: {
local: “/app/dremio/data”
#dist: “pdfs://”${paths.local}"/pdfs"
}
services: {
coordinator.enabled: true,
coordinator.master.enabled: true,
executor.enabled: true
}
Your open files is at default. Kindly increase.As I have explained in the other thread, make sure you are running into the Linux bug and actually increasing files. needs to be at least 65536
Thanks
@balaji.ramaswamy
Thanks Balaji,
I have updated the /etc/systemd/system/Dremio.service file to include:
LimitNOFILE=65536
I stopped the service, ran the systemctl daemon-reload command, and restarted the service.
I then had to rebuild all of our reflections. I’ll check tomorrow to see if the issue returns.
I appreciate your attention and time.
Thanks,
Balaji,
After applying the fix you suggested, we had three days with our Dremio server having no issues with “too many open files”. Now, this afternoon the problem has returned. I have verified that the limit set is 65536. Is there yet another
place we need to configure this setting?
Thanks,
Matt
Hi @summersmd
Maybe the # of open files > 65536. When the Dremio query is running, you can monitor the open files using lsof and see if it creeps up to 65536. The minute it fails it will all be released
Thanks
@balaji.ramaswamy
Balaji,
Why are there so many open files? This is a little concerning…
Balaji,
The command lsof | grep dremio | wc -l
Which should show each file open by Dremio returns a count of 5748288. There is something odd going on here and I’m not sure how to prevent it.
Hi @summersmd
Is this on a executor? I thought you ulimit for dremio user is 65536. Did you increase it?
Can you please do “su - dremio” and do a “ulimit -a”
How many reflections are refreshed in parallel?
Can we start with one and see if it works? and then slowly increase parallelism?
Thanks
@balaji.ramaswamy
Balaji,
This is on a single node configuration. I did set the limit for dremio user to 65536. However, this morning when the dremio server stopped responding and the reflections had to be rebuilt, the number of files that dremio had open appeared
to be over 5 million?
-bash-4.2$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 125002
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65536
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
I have no idea how to determine how many reflections are refreshed in parallel or how to start with one and slowly increase them.
Can you provide some guidance?
Thanks,
Matt
Sent: Tuesday, Febulimit -a
ruary 12, 2019 3:04 PM
Balaji,
We’re getting the “too many open files” error again today. Our reflections have been corrupted again. Is there an upcoming patch to fix this? Would this still be a problem if we ran dremio in a docker container? I really want to keep
moving forward with seeing if dremio can meet our needs but am beginning to get concerned.
Thanks,
Matt
Hi @summersmd
We need to know what are the different issues you are hitting. Open the UI. Check all job types, like internal/accelerator/WEB UI/external/downloads. Filter only failed jobs
What are the different errors you see? All they are max open files? Can we go through the different errors and send me a profile for each of one of them
Thanks
@balaji.ramaswamy
Balaji,
We’re running a single node instance of dremio.
This morning I went into dremio and opened some reflections only to find that
all of them showed the following (see attached image IndexWriter.PNG):
Something went wrong
The IndexWriter is closed
I’ve also attached queuecontrol.PNG which shows our queue Control configuration.
Today’s server.log and server.out files are also attached. The log file shows “too many open files” errors see attachment
(Attachment server.log is missing)
(Attachment server.out is missing)
balaji.ramaswamy
February 13
Hi @summersmd
We need to know what are the different issues you are hitting. Open the UI. Check all job types, like internal/accelerator/WEB UI/external/downloads. Filter only failed jobs
What are the different errors you see? All they are max open files? Can we go through the different errors and send me a profile for each of one of them
Thanks
@balaji.ramaswamy
(Attachment server_log.txt is missing)
(Attachment server_out.txt is missing)
You have attached 3 profile
653b050a-be56-42d6-8efa-0d894b59559c: Table ‘json’ not found is a known issue and we are working on it
The other 2 profiles are giving too many open files but coming from the rocksDB. Let me check on that and get back to you
Thanks
@balaji.ramaswamy
Balaji,
Here are the profiles for 2 more issues we’re having. One of them is telling me that using the Raw Reflections would be too expensive (not sure I understand why).
The other is throwing a “The JDBC storage plugin failed while trying setup the SQL query.”
– this query runs but will not save or preview.
Thanks,
Matt
Hi @summersmd
The JDBC error is a known issue and we will be addressing it. The reflection matching but not getting picked up is something we need to investigate. Will have to get back to you on that
Thanks
@balaji.ramaswamy