After restarrting Dremio rflection failed


#1

I have restarted dremio on kubernate after that all reflection shows Error


#2

Hi @Vikash_Singh

If you do not have too many reflections, you should be able to just disable-save-enable-save and reflection should turn green. Or you can wait for the next scheduled refresh too

Thanks
@balaji.ramaswamy


#3

How many reflections is too many? We’re having a similar issue. Is there a way to set the refresh schedule at the datasource level so some would get refreshed at 1AM and others at 6AM?


#4

@summersmd,

From the UI, there’s currently not a way to really schedule a refresh at a specific time. You can “manually” do this using the method described above (disable-save-enable-save) at the time you want the refresh to start. Alternatively you could schedule a REST API call to do the same action. See this post for an example of how you might achieve this:


#5

Ben,

We have a bigger issue. Every day we have to recreate our reflections and the following day they all have a “Something went wrong – The IndexWriter is closed” error. When we try to run a query we get the following error message:

Error while running command to get file permissions : java.io.IOException: Cannot run program “ls”: error=24, Too many open files at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:913) at org.apache.hadoop.util.Shell.run(Shell.java:869) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170) at org.apache.hadoop.util.Shell.execCommand(Shell.java:1264) at org.apache.hadoop.util.Shell.execCommand(Shell.java:1246)
at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1078) at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:697) at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:672)
at com.dremio.exec.store.dfs.PseudoDistributedFileSystem$GetFileStatusTask.reduce(PseudoDistributedFileSystem.java:1169) at com.dremio.exec.store.dfs.PseudoDistributedFileSystem$GetFileStatusTask.reduce(PseudoDistributedFileSystem.java:1114) at com.dremio.exec.store.dfs.PseudoDistributedFileSystem$PDFSDistributedTask.get(PseudoDistributedFileSystem.java:861)
at com.dremio.exec.store.dfs.PseudoDistributedFileSystem.getFileStatus(PseudoDistributedFileSystem.java:702) at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1450) at com.dremio.exec.store.dfs.FileSystemWrapper.isDirectory(FileSystemWrapper.java:971)
at com.dremio.service.jobs.JobResultsStore.loadJobData(JobResultsStore.java:191) at com.dremio.service.jobs.LocalJobsService$InternalJobLoader.load(LocalJobsService.java:1063) at com.dremio.service.jobs.JobDataImpl.range(JobDataImpl.java:52) at com.dremio.service.jobs.JobDataWrapper.range(JobDataWrapper.java:37)
at com.dremio.dac.model.job.JobDataWrapper.range(JobDataWrapper.java:37) at com.dremio.dac.resource.JobResource.getDataForVersion(JobResource.java:161) at sun.reflect.GeneratedMethodAccessor535.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161) at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99) at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102) at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
at org.glassfish.jersey.internal.Errors.process(Errors.java:315) at org.glassfish.jersey.internal.Errors.process(Errors.java:297) at org.glassfish.jersey.internal.Errors.process(Errors.java:267) at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305) at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154) at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473) at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669) at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83) at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:365) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.handler.RequestLogHandler.handle(RequestLogHandler.java:95) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:258) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: error=24, Too many
open files at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) … 65 more

Is there something that Dremio does daily that would be causing this? Is it a Linux permissions error of some kind? I’ve already set the system NOFILES to 63556.

Also, community.dremio.com seems to be down this morning.

Matt


#6

Ben,

We have a bigger issue. Every day we have to recreate our reflections and the following day they all have a “Something went wrong – The IndexWriter is closed” error. When we try to run a query we get the following error message:

Error while running command to get file permissions : java.io.IOException: Cannot run program “ls”: error=24, Too many open files at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:913) at org.apache.hadoop.util.Shell.run(Shell.java:869) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170) at org.apache.hadoop.util.Shell.execCommand(Shell.java:1264) at org.apache.hadoop.util.Shell.execCommand(Shell.java:1246)
at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1078) at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:697) at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:672)
at com.dremio.exec.store.dfs.PseudoDistributedFileSystem$GetFileStatusTask.reduce(PseudoDistributedFileSystem.java:1169) at com.dremio.exec.store.dfs.PseudoDistributedFileSystem$GetFileStatusTask.reduce(PseudoDistributedFileSystem.java:1114) at com.dremio.exec.store.dfs.PseudoDistributedFileSystem$PDFSDistributedTask.get(PseudoDistributedFileSystem.java:861)
at com.dremio.exec.store.dfs.PseudoDistributedFileSystem.getFileStatus(PseudoDistributedFileSystem.java:702) at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1450) at com.dremio.exec.store.dfs.FileSystemWrapper.isDirectory(FileSystemWrapper.java:971)
at com.dremio.service.jobs.JobResultsStore.loadJobData(JobResultsStore.java:191) at com.dremio.service.jobs.LocalJobsService$InternalJobLoader.load(LocalJobsService.java:1063) at com.dremio.service.jobs.JobDataImpl.range(JobDataImpl.java:52) at com.dremio.service.jobs.JobDataWrapper.range(JobDataWrapper.java:37)
at com.dremio.dac.model.job.JobDataWrapper.range(JobDataWrapper.java:37) at com.dremio.dac.resource.JobResource.getDataForVersion(JobResource.java:161) at sun.reflect.GeneratedMethodAccessor535.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161) at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99) at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102) at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
at org.glassfish.jersey.internal.Errors.process(Errors.java:315) at org.glassfish.jersey.internal.Errors.process(Errors.java:297) at org.glassfish.jersey.internal.Errors.process(Errors.java:267) at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305) at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154) at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473) at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669) at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83) at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:365) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.handler.RequestLogHandler.handle(RequestLogHandler.java:95) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:258) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: error=24, Too many
open files at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) … 65 more

When I run the Linux lsof command it shows that dremio has 486299 open files, which exceeds the limit set in the OS.

Is this expected behavior?

Our server.out contains:

Wed Feb 6 13:52:49 EST 2019 Starting dremio on XXX.XXX.com

core file size (blocks, -c) 0

data seg size (kbytes, -d) unlimited

scheduling priority (-e) 0

file size (blocks, -f) unlimited

pending signals (-i) 125002

max locked memory (kbytes, -l) 64

max memory size (kbytes, -m) unlimited

open files (-n) 1024

pipe size (512 bytes, -p) 8

POSIX message queues (bytes, -q) 819200

real-time priority (-r) 0

stack size (kbytes, -s) 8192

cpu time (seconds, -t) unlimited

max user processes (-u) 125002

virtual memory (kbytes, -v) unlimited

file locks (-x) unlimited

And server.log contains:

Caused by: java.nio.file.FileSystemException: /app/dremio/data/db/search/materialization_store/core/_6j4_Lucene54_0.dvd: Too many open files

at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) ~[na:1.8.0_191]

at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[na:1.8.0_191]

at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[na:1.8.0_191]

at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) ~[na:1.8.0_191]

at java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434) ~[na:1.8.0_191]

at java.nio.file.Files.newOutputStream(Files.java:216) ~[na:1.8.0_191]

at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]

at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]

at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]

at org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]

at org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]

at org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.(Lucene54DocValuesConsumer.java:73) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]

at org.apache.lucene.codecs.lucene54.Lucene54DocValuesFormat.fieldsConsumer(Lucene54DocValuesFormat.java:108) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]

at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.getInstance(PerFieldDocValuesFormat.java:213) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]

at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addNumericField(PerFieldDocValuesFormat.java:111) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]

at org.apache.lucene.index.NumericDocValuesWriter.flush(NumericDocValuesWriter.java:96) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]

at org.apache.lucene.index.DefaultIndexingChain.writeDocValues(DefaultIndexingChain.java:258) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]

at org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:142) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]

at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:444) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]

at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:539) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]

at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:653) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]

at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:445) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]

at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:291) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]

at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:266) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]

at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:256) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]

at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:140) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]

at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:156) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]

at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]

at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]

at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]

at com.dremio.datastore.indexed.LuceneSearchIndex.checkIfChanged(LuceneSearchIndex.java:265) ~[dremio-services-datastore-3.1.0-201901172111160703-dc6f6e5.jar:3.1.0-201901172111160703-dc6f6e5]

at com.dremio.datastore.indexed.LuceneSearchIndex.search(LuceneSearchIndex.java:387) ~[dremio-services-datastore-3.1.0-201901172111160703-dc6f6e5.jar:3.1.0-201901172111160703-dc6f6e5]

at com.dremio.datastore.indexed.CoreSearchIterable$SearchIterator.hasNext(CoreSearchIterable.java:126) ~[dremio-services-datastore-3.1.0-201901172111160703-dc6f6e5.jar:3.1.0-201901172111160703-dc6f6e5]

at com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:42) ~[guava-20.0.jar:na]

at com.google.common.collect.Iterators.getNext(Iterators.java:818) ~[guava-20.0.jar:na]

at com.google.common.collect.Iterables.getFirst(Iterables.java:753) ~[guava-20.0.jar:na]

at com.dremio.service.reflection.store.MaterializationStore.getMostRecentMaterialization(MaterializationStore.java:221) ~[dremio-services-accelerator-3.1.0-201901172111160703-dc6f6e5.jar:3.1.0-201901172111160703-dc6f6e5]

at com.dremio.service.reflection.store.MaterializationStore.getRefreshesExclusivelyOwnedBy(MaterializationStore.java:259) ~[dremio-services-accelerator-3.1.0-201901172111160703-dc6f6e5.jar:3.1.0-201901172111160703-dc6f6e5]

at com.dremio.service.reflection.ReflectionManager.deleteMaterialization(ReflectionManager.java:469) ~[dremio-services-accelerator-3.1.0-201901172111160703-dc6f6e5.jar:3.1.0-201901172111160703-dc6f6e5]

at com.dremio.service.reflection.ReflectionManager.deleteDeprecatedMaterializations(ReflectionManager.java:245) ~[dremio-services-accelerator-3.1.0-201901172111160703-dc6f6e5.jar:3.1.0-201901172111160703-dc6f6e5]

at com.dremio.service.reflection.ReflectionManager.run(ReflectionManager.java:178) ~[dremio-services-accelerator-3.1.0-201901172111160703-dc6f6e5.jar:3.1.0-201901172111160703-dc6f6e5]

at com.dremio.common.WakeupHandler$1.run(WakeupHandler.java:63) [dremio-common-3.1.0-201901172111160703-dc6f6e5.jar:3.1.0-201901172111160703-dc6f6e5]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_191]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_191]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_191]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_191]

… 1 common frames omitted

Our dremio.conf file is:

paths: {

the local path for dremio to store data.

local: “/app/dremio/data”

the distributed path Dremio data including job results, downloads, uploads, etc

#dist: “pdfs://”${paths.local}"/pdfs"

}

services: {

coordinator.enabled: true,

coordinator.master.enabled: true,

executor.enabled: true

}


#7

@summersmd

Your open files is at default. Kindly increase.As I have explained in the other thread, make sure you are running into the Linux bug and actually increasing files. needs to be at least 65536

Thanks
@balaji.ramaswamy


#8

Thanks Balaji,

I have updated the /etc/systemd/system/Dremio.service file to include:

LimitNOFILE=65536

I stopped the service, ran the systemctl daemon-reload command, and restarted the service.

I then had to rebuild all of our reflections. I’ll check tomorrow to see if the issue returns.

I appreciate your attention and time.

Thanks,

@summersmd


#9

Balaji,

After applying the fix you suggested, we had three days with our Dremio server having no issues with “too many open files”. Now, this afternoon the problem has returned. I have verified that the limit set is 65536. Is there yet another
place we need to configure this setting?

Thanks,

Matt


#10

Hi @summersmd

Maybe the # of open files > 65536. When the Dremio query is running, you can monitor the open files using lsof and see if it creeps up to 65536. The minute it fails it will all be released

Thanks
@balaji.ramaswamy


#11

Balaji,

Why are there so many open files? This is a little concerning…


#12

Balaji,

The command lsof | grep dremio | wc -l

Which should show each file open by Dremio returns a count of 5748288. There is something odd going on here and I’m not sure how to prevent it.


#13

Hi @summersmd

Is this on a executor? I thought you ulimit for dremio user is 65536. Did you increase it?

Can you please do “su - dremio” and do a “ulimit -a”

How many reflections are refreshed in parallel?

Can we start with one and see if it works? and then slowly increase parallelism?

Thanks
@balaji.ramaswamy


#14

Balaji,

This is on a single node configuration. I did set the limit for dremio user to 65536. However, this morning when the dremio server stopped responding and the reflections had to be rebuilt, the number of files that dremio had open appeared
to be over 5 million?

-bash-4.2$ ulimit -a

core file size (blocks, -c) 0

data seg size (kbytes, -d) unlimited

scheduling priority (-e) 0

file size (blocks, -f) unlimited

pending signals (-i) 125002

max locked memory (kbytes, -l) 64

max memory size (kbytes, -m) unlimited

open files (-n) 65536

pipe size (512 bytes, -p) 8

POSIX message queues (bytes, -q) 819200

real-time priority (-r) 0

stack size (kbytes, -s) 8192

cpu time (seconds, -t) unlimited

max user processes (-u) 4096

virtual memory (kbytes, -v) unlimited

file locks (-x) unlimited

I have no idea how to determine how many reflections are refreshed in parallel or how to start with one and slowly increase them.

Can you provide some guidance?

Thanks,

Matt

Sent: Tuesday, Febulimit -a

ruary 12, 2019 3:04 PM


#15

Balaji,

We’re getting the “too many open files” error again today. Our reflections have been corrupted again. Is there an upcoming patch to fix this? Would this still be a problem if we ran dremio in a docker container? I really want to keep
moving forward with seeing if dremio can meet our needs but am beginning to get concerned.

Thanks,

Matt


#16

Hi @summersmd

We need to know what are the different issues you are hitting. Open the UI. Check all job types, like internal/accelerator/WEB UI/external/downloads. Filter only failed jobs

What are the different errors you see? All they are max open files? Can we go through the different errors and send me a profile for each of one of them

Thanks
@balaji.ramaswamy


#17

Balaji,

We’re running a single node instance of dremio.

This morning I went into dremio and opened some reflections only to find that
all of them showed the following (see attached image IndexWriter.PNG):

Something went wrong

The IndexWriter is closed

I’ve also attached queuecontrol.PNG which shows our queue Control configuration.

Today’s server.log and server.out files are also attached. The log file shows “too many open files” errors see attachment

(Attachment server.log is missing)

(Attachment server.out is missing)


#19

[https://discourse-cdn-sjc2.com/standard14/user_avatar/community.dremio.com/balaji.ramaswamy/45/843_2.png]

balaji.ramaswamy
February 13

Hi @summersmd
We need to know what are the different issues you are hitting. Open the UI. Check all job types, like internal/accelerator/WEB UI/external/downloads. Filter only failed jobs

What are the different errors you see? All they are max open files? Can we go through the different errors and send me a profile for each of one of them

Thanks
@balaji.ramaswamy
(Attachment server_log.txt is missing)

(Attachment server_out.txt is missing)


#20

@summersmd

You have attached 3 profile

653b050a-be56-42d6-8efa-0d894b59559c: Table ‘json’ not found is a known issue and we are working on it

The other 2 profiles are giving too many open files but coming from the rocksDB. Let me check on that and get back to you

Thanks
@balaji.ramaswamy


#21

Balaji,

Here are the profiles for 2 more issues we’re having. One of them is telling me that using the Raw Reflections would be too expensive (not sure I understand why).

The other is throwing a “The JDBC storage plugin failed while trying setup the SQL query.”
– this query runs but will not save or preview.

Thanks,

Matt