All Jobs remain in status Running forever

Hello,
I’ve installed Dremio on a RHEL 9.2 Server.
Dremio is running, and the Dremio interface can be reached, unfortunately every job started remains in status Running permanently, without ever succeeding or failing.

Furthermore, jobs cannot be cancelled, and the only way to stop peinding jobs is to restart the service

Dremio version is 25.0.0 installed with rpm package.

This is the /opt/dremio/conf/dremio.conf :

The only change made in the /opt/dremio/conf/dremio-env is this line:

DREMIO_JAVA_EXTRA_OPTS="-Djava.io.tmpdir=/opt/tmp"

(/opt/tmp is owned by dremio user and can create files)

This are the pending jobs in the Dremio dashboard:

This is the content of /var/log/server.log

I’ve tried to research the error but I was not able to find any information regarding it.

I do apologize for attaching screenshots and not text logs but the Linux server is not connected to the internet, and I currently have no means of copying files outside of it.

Thanks for your support,

The above log screenshot is from the coordinator section. We see you are running coord and executor on a single node. Lets make sure basic things are right

  • Can you make sure /var/lib/dremio is having enough space
  • I see you have everything in local and commented out dist:, this is no longer supported. Can you point dist:/// to your distributed storage (This is unrelated to the issue but you will hit this next)
  • Can you check if server.out has anything suspicious?
  • How many cores does your node have?
1 Like

I think this may be JVM specific. Can you try setting this VM option?

-Dio.netty.tryReflectionSetAccessible=true

1 Like
  • /var/lib/dremio has 20 GB total space (19GB free)
  • (Thanks for the feedback! I will make sure to change this one too to point to a local mounted NAS)
  • The Node has 8 cores and 32 GB RAM
  • This is the server.out attached below

h

Unfortunately it seems that adding this flag has no effect, the error is still the same

I’ve tried to enable debug logs, the only additional information I see is

Another update, hoping that it might be useful information.

I’ve tried to setup an identical system on a Virtualbox VM, same exact Dremio version, identical setup, same OS and roughly the same specs.

Jobs in Dremio on the VM seem to be working, so it’s possible that the factor is external from the setup itself, but I’m still looking for clues on why on the main server it’s not working.

I’ve tried enabling all debug logs and running SELECT 1 on both systems.

I’ve been able to spot a few differences (besides the fact that in the VM the job is successful):

  1. The error Unable to get query profile does not appear on the VM’s logs.
  2. I’ve connected to zookeeper with zkCli.sh and saw that on the main server machine jobs appear to queue in /dremio/semaphore/query.small/leases as opposed to the VM where after running the job successfully the queue is then empty

I’m attaching the log of this last finding, hoping that it can provide some more context

Thanks again for your support

I’ve done yet another attempt, I’ve tried updating to verson 25.0.5, same identical configuration.

Unfortunately I still face the same issue but I see now some additional error logs at the startup of Dremio service that I had not seen before:

An error related to pdfs not supported

An error related to master coordinator being down

And several thread errors

I hope this can be helpful to somehow identify a possible cause or to understand better what attempts I can make to narrow down the issue.

@balaji.ramaswamy @Benny_Chow

Thanks,
Riccardo

@riccardo_ditosto_gel Yes, this is an error that I saw in another community post. There was a similar issue fixed in 25.0 and I see you are on 25.0.5

PDFS is no longer supported so it could be that. On your virtual box, in your dremio.conf, do you have dist:/// defined? like below

dist: "hdfs://localhost:8020/iceberg_meta"

In my vbox I have local configured, not dist

I see in the vbox logs that I have “PDFS is no longer supported” there as well.
I also have “source sys failed unexpectedly” and “master coordinator is down” logs, the only difference is that I do not have the thread errors.

@riccardo_ditosto_gel Are you able to send your dremio.conf and dremio-env

Make sure to take out any passwords like trust store etc

dremio.conf

#
# Copyright (C) 2017-2019 Dremio Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

paths: {
  # the local path for dremio to store data.
  #local: "/var/lib/dremio"

  # the distributed path Dremio data including job results, downloads, uploads, etc
  dist: "file:///mnt/external_nas/data"
}

services: {
  coordinator.enabled: true,
  coordinator.master.enabled: true,
  executor.enabled: true,
  flight.use_session_service: true,
  coordinator.web.ssl.enabled: true
}

dremio-env

#
# Copyright (C) 2017-2019 Dremio Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

#
# Dremio environment variables used by Dremio daemon
#

#
# Directory where Dremio logs are written
# Default to $DREMIO_HOME/log
#
#DREMIO_LOG_DIR=${DREMIO_HOME}/log

#
# Send logs to console and not to log files. The DREMIO_LOG_DIR is ignored if set.
#
#DREMIO_LOG_TO_CONSOLE=1

#
# Directory where Dremio pidfiles are written
# Default to $DREMIO_HOME/run
#
#DREMIO_PID_DIR=${DREMIO_HOME}/run

#
# Max total memory size (in MB) for the Dremio process
#
# If not set, default to using max heap and max direct.
#
# If both max heap and max direct are set, this is not used
# If one is set, the other is calculated as difference
# of max memory and the one that is set.
#
#DREMIO_MAX_MEMORY_SIZE_MB=

#
# Max heap memory size (in MB) for the Dremio process
#
# Default to 4096 for server
#
#DREMIO_MAX_HEAP_MEMORY_SIZE_MB=4096

#
# Max direct memory size (in MB) for the Dremio process
#
# Default to 8192 for server
#
#DREMIO_MAX_DIRECT_MEMORY_SIZE_MB=8192

#
# Max permanent generation memory size (in MB) for the Dremio process
# (Only used for Java 7)
#
# Default to 512 for server
#
#DREMIO_MAX_PERMGEN_MEMORY_SIZE_MB=512

#
# Garbage collection logging is enabled by default. Set the following
# parameter to "no" to disable garbage collection logging.
#
#DREMIO_GC_LOGS_ENABLED="yes"

#
# Send GC logs to console and not to log files. The DREMIO_LOG_DIR is ignored if set.
# Default is set to "no"
#
#DREMIO_GC_LOG_TO_CONSOLE="no"

#
# By default G1GC is used as java garbage collection.
# This can be overriden by changing this parameter
#
#DREMIO_GC_OPTS="-XX:+UseG1GC"

#
# Java version will be checked by default.
# Currently only java 8 is supported by dremio.
# This check can be disabled by changing value to false.
#
#DREMIO_JAVA_VERSION_CHECK="true"

#
# The scheduling priority for the server
#
# Default to 0
#
# DREMIO_NICENESS=0
#

#
# Number of seconds after which the server is killed forcibly it it hasn't stopped
#
# Default to 120
#
#DREMIO_STOP_TIMEOUT=120

# Extra Java options - shared between dremio and dremio-admin commands
#
#DREMIO_JAVA_EXTRA_OPTS=

# Extra Java options - client only (dremio-admin command)
#
#DREMIO_JAVA_CLIENT_EXTRA_OPTS=

# Extra Java options - server only (dremio command)
#
#DREMIO_JAVA_SERVER_EXTRA_OPTS=""

DREMIO_JAVA_EXTRA_OPTS="-Djava.io.tmpdir=/opt/dremio/tmp -Dio.netty.tryReflectionSetAccessible=true"

@riccardo_ditosto_gel Your configuration files look good, lets try one more thing. This will make sure server.log verbose but hopefully we find the issue. On your logback.xml under the Dremio conf folder, you will see below 2 lines, make them both debug, save file, restart Dremio, run query and upload server.log and server.out after the query runs fro a few minutes

<level value="${dremio.log.root.level:-debug}"/>

<logger name="com.dremio">
    <level value="${dremio.log.level:-debug}"/>
  </logger>