Large PID files in dremio/log directory, huh?

What are these?
How and when do they grow?
How do we estimate the growth?

-rw------- 1 dremio dremio 3.3G Mar 16 00:01 java_pid16496.hprof
-rw------- 1 dremio dremio 1.9G Mar 18 01:41 java_pid21663.hprof

They caught me off guard and filled up the drive they are on and crashed Dremio

Can they be configured to go somewhere else? Seems like we should just attach some space for the large pid files and the growing log files.

Ah… Read the manual first… Oops.

https://docs.dremio.com/deployment/dremio-config.html?h=config

Logging and pid directories These directories must be first created.

DREMIO_LOG_DIR=/var/log/dremio
DREMIO_PID_DIR=/var/run/dremio

This config variable does not override the dremio.pid location, only the java_pid* from what I found

DREMIO_PID_DIR=/var/run/dremio

as the server still was looking in this default directory listed above when I changed the configuration.

The java_pid*.hprof files are memory dump of the java process in case the process crashed or ran out of memory. You can configure a different path by adding ‘-XX:HeapDumpPath=’ to $DREMIO_JAVA_EXTRA_OPTS in the file dremio-env. You might also want to check the log files to understand what caused the process to stop.

Thanks for the response laurent!

Are the java_pid* files safe to delete?

Also, any ideas why the DREMIO_PID_DIR variable is not picking up a directory than the default /var/run/dremio?

It is safe to delete, though as @laurent pointed out you may want to find out why you had it in a first place - as it indicates Java process crash.

Did you try to?:

  1. stop Dremio
  2. change DREMIO_PID_DIR in <dremio_install_dir>/conf/dremio-env file
  3. start Dremio

Yes, several times. Rebooted too. Just doesn’t seem to pick up on the variable change for the dremio.pid file. Still thinks it is in <dremio_install_dir>/run

Some clarifications:

  • DREMIO_PID_DIR only changes the destination directory for dremio.pid file (which is used to keep track of running Dremio process.
  • By default, java_pid<id>.hprof are written under DREMIO_LOG_DIR.
  • Those files are memory dump files, not pid files (they contain pid in their name because the id is the process id of the process whose file originated from). They are debugging information, totally safe to remove.

If you installed Dremio using the rpm package, note that dremio-env file is located at /etc/dremio/dremio-env

Thanks for your answers on this. Getting to learn the ins and outs of the system. :slight_smile:

The /etc/dremio/dremio-env with my installation is simply a soft link to the install directory where I am editing the dremio-env

This is my current scenario. Dremio is running. Till it runs out of memory and creates a heap file. I stop dremio. Start dremio. It starts up web services for the UI, API and even connections from clients. Then it starts attempting to do all the heavy lifting. Which I assume is where it tries to read a dremio PID file from <dremio_install_dir>/run Which has no PID file as the PID file is in the DREMIO_PID_DIR=${DREMIO_HOME}/data/logs folder as defined in the dremio-env config file.

As a work around…

I then create a soft link

ln -s ${DREMIO_HOME}/data/logs/dremio.pid
service dremio start

Then it works again till I have a crash (which I do not have unless I am loading dremio up with a massive query or acceleration)

If I comment out the dremio-env DREMIO_PID_DIR it still does the same thing.

Oh… and next time it crashes it remove the soft link in /run/dremio.pid

Could you say more about how/where you have Dremio deployed? Number of servers, amount of RAM, etc?

AWS EC2 c4.2xlarge, using 200GB EBS for data, logs and heap dumps.

just one while we are testing out dremio, have not moved to a multi-node deployment yet. That is our next step.

The Dremio process doesn’t actually read the pidfile once started, this happens much earlier in the startup script. The pidfile basically captures the process id of the current Dremio process. If absent, it means that no other Dremio process is running. If present, the startup script will check if the process is still alive.

When the process crash, the pidfile is not removed, but it will be taken care of at the next restart.

Basically, you should not need to create/manage that file yourself (but you can use to identify if a Dremio process is running).

I think I hijacked my own post. :slight_smile:

Started with the Java memory dump files named with the PID id, then jumped over to why dremio.pid itself is not being created and/or read.

Here is the error on the last crash we had.
ubuntu@ip-XXXX:~$ sudo service dremio status
● dremio.service - Dremio Daemon Server
Loaded: loaded (/etc/systemd/system/dremio.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Tue 2018-03-20 20:23:06 UTC; 22min ago
Process: 27060 ExecStop=/var/lib/dremio/bin/dremio --config /var/lib/dremio/conf stop (code=exited, status=0/SUCCESS)
Process: 4212 ExecStart=/var/lib/dremio/bin/dremio --config /var/lib/dremio/conf start (code=exited, status=0/SUCCESS)
Main PID: 4342

Mar 18 22:07:42 ip-XXXX systemd[1]: Starting Dremio Daemon Server...
Mar 18 22:07:42 ip-XXXX dremio[4212]: starting dremio, logging to /var/lib/dremio/data/logs/server.out
Mar 18 22:07:43 XXXX systemd[1]: dremio.service: PID file /var/lib/dremio/run/dremio.pid not readable (yet?) after start: No such file or directory
Mar 18 22:09:11 ip-XXXX systemd[1]: dremio.service: Supervising process 4342 which is not our child. We'll most likely not notice when it exits.
Mar 18 22:09:11 ip-XXXX systemd[1]: Started Dremio Daemon Server.

Notice the
/var/lib/dremio/run/dremio.pid

Where the location defined in the dremio-env is not being used to look for existing Dremio process running.

Indeed :slight_smile: do you mind opening a new discussion? You might also want to share your systemd unit file and your dremio-env file