Dremio.pid Where are you?

The /etc/dremio/dremio-env with my installation is simply a soft link to the install directory where I am editing the dremio-env

This is my current scenario. Dremio is running. Till it runs out of memory and creates a heap file. I stop dremio. Start dremio. It starts up web services for the UI, API and even connections from clients. Then it starts attempting to do all the heavy lifting. Which I assume is where it tries to read a dremio PID file from <dremio_install_dir>/run Which has no PID file as the PID file is in the DREMIO_PID_DIR=${DREMIO_HOME}/data/logs folder as defined in the dremio-env config file.

As a work around…

I then create a soft link

ln -s ${DREMIO_HOME}/data/logs/dremio.pid
service dremio start

Then it works again till I have a crash (which I do not have unless I am loading dremio up with a massive query or acceleration)

If I comment out the dremio-env DREMIO_PID_DIR it still does the same thing.

Oh… and next time it crashes it remove the soft link in /run/dremio.pid

dremio.pid itself is not being created and/or read until I create the soft link manually

Here is the error on the last crash we had.
ubuntu@ip-XXXX:~$ sudo service dremio status
â—Ź dremio.service - Dremio Daemon Server
Loaded: loaded (/etc/systemd/system/dremio.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Tue 2018-03-20 20:23:06 UTC; 22min ago
Process: 27060 ExecStop=/var/lib/dremio/bin/dremio --config /var/lib/dremio/conf stop (code=exited, status=0/SUCCESS)
Process: 4212 ExecStart=/var/lib/dremio/bin/dremio --config /var/lib/dremio/conf start (code=exited, status=0/SUCCESS)
Main PID: 4342

Mar 18 22:07:42 ip-XXXX systemd[1]: Starting Dremio Daemon Server...
Mar 18 22:07:42 ip-XXXX dremio[4212]: starting dremio, logging to /var/lib/dremio/data/logs/server.out
Mar 18 22:07:43 XXXX systemd[1]: dremio.service: PID file /var/lib/dremio/run/dremio.pid not readable (yet?) after start: No such file or directory
Mar 18 22:09:11 ip-XXXX systemd[1]: dremio.service: Supervising process 4342 which is not our child. We'll most likely not notice when it exits.
Mar 18 22:09:11 ip-XXXX systemd[1]: Started Dremio Daemon Server.

Notice the
/var/lib/dremio/run/dremio.pid

Where the location defined in the dremio-env is not being used to look for existing Dremio process running.

dremio-env file contents

#
# Dremio environment variables used by Dremio daemon
#

#
# Directory where Dremio logs are written
# Default to $DREMIO_HOME/log
#
DREMIO_LOG_DIR=${DREMIO_HOME}/data/logs

#
# Directory where Dremio pidfiles are written
# Default to $DREMIO_HOME/run
#
DREMIO_PID_DIR=${DREMIO_HOME}/data/logs

#
# Max heap memory size (in MB) for the Dremio process
#
# Default to 4096 for server
#
#DREMIO_MAX_HEAP_MEMORY_SIZE_MB=6144

#
# Max direct memory size (in MB) for the Dremio process
#
# Default to 8192 for server
#
#DREMIO_MAX_DIRECT_MEMORY_SIZE_MB=10240

#
# Max permanent generation memory size (in MB) for the Dremio process
# (Only used for Java 7)
#
# Default to 512 for server
#
#DREMIO_MAX_PERMGEN_MEMORY_SIZE_MB=512

#
# Garbage collection logging is enabled by default. Set the following
# parameter to "no" to disable garbage collection logging.
#
#DREMIO_GC_LOGS_ENABLED="yes"

#
# The scheduling priority for the server
#
# Default to 0
#
# DREMIO_NICENESS=0
#

#
# Number of seconds after which the server is killed forcibly it it hasn't stopped
#
# Default to 120
#
#DREMIO_STOP_TIMEOUT=120

# Extra Java options
#
#DREMIO_JAVA_EXTRA_OPTS=

cat /etc/systemd/system/dremio.service

#
# Dremio unit file for systemd
#
# Installation is assumed to be under /opt/dremio
#
[Unit]
Description=Dremio Daemon Server
After=syslog.target network.target

[Service]
Type=forking
User=dremio
PIDFile=/var/lib/dremio/run/dremio.pid

ExecStart=/var/lib/dremio/bin/dremio --config /var/lib/dremio/conf start
ExecStop=/var/lib/dremio/bin/dremio --config /var/lib/dremio/conf stop
RestartForceExitStatus=3

[Install]
WantedBy=multi-user.target

And there it is, I think! Ha!!!

the startup script does not look at the config?

In your example, you have a mismatch between where you asked Dremio to create the pidfile (${DREMIO_HOME}/data/logs which I assume expand to /var/lib/dremio/data/logs?) and where systemd should be looking at to verify the process is running or not (/var/lib/dremio/run/).

Both values have to match, otherwise systemd might decide to close the container, killing the running Dremio process at the same time. And no, the startup script is not aware of the systemd configuration.

Thanks laurent, all good on the restart now.

1 Like