Dremio Services not coming up in Docker swarm after nodes rebooted

Hi All,

we are hitting below error in dremio service log when redeploying stack in Docker swam. Is it still a known issue where persistent metadata files are being hold by another process after node reboots.?
Should we clean up the files and try to restart the service?

dremio_dremio.1.p0p8nkfl73u2@d01cdk003 | 2019-06-07 09:23:03,500 [main] INFO c.dremio.datastore.ByteStoreManager - Lock file to RocksDB is currently hold by another process. Will wait until lock is freed.
dremio_dremio.1.p0p8nkfl73u2@d01cdk003 | Lock file to RocksDB is currently hold by another process. Will wait until lock is freed.

Hi @paulsuk1982

Can you see anyother dremio process running

what is the output of this command gives,

ps -aux | grep -i dremio

Thanks
@Venugopal_Menda

Hi @Venugopal_Menda
Thanks for the update, I am not the cluster admin, I need to check with the docker admin and will let you know, may be within hours or so.

There are no dremio services running in any of the docker nodes, we just checked,

should we clean up the volumes now and try to re-deploy the stack ?

We tried to do metadata cleanup referring the below article using dremio admin script, every clean up operaition we do getting the same error
2019-06-07 14:59:27,634 [main] INFO c.dremio.datastore.ByteStoreManager - Lock file to RocksDB is currently hold by another process. Will wait until lock is freed.
Lock file to RocksDB is currently hold by another process. Will wait until lock is freed.

https://docs.dremio.com/advanced-administration/metadata-cleanup.html

What needs to be done now?

Glad we get rid of the issue,
Metadata clean up didn’t helped and popping up same issue file locked by another process.
Finally we removed the LOCK file from db/catalog directory which resolved the issue and services got started up.

Now the container and service is up and running in swarm cluster, but dremio service is not reachable, and telnet to port 9047 is also being rejected,
Could someone faced this kind of issue before and suggest what can be checked in the network side.

This site can’t be reached

f01bhr004 refused to connect.

Try:

ERR_CONNECTION_REFUSED

Getting GC allocation failure in dremio service log and dremio service is inaccessible, could you please help if anyone faced this issue before in dremio.

2019-06-10T20:46:18.930+0000: [GC (Allocation Failure) [PSYoungGen: 95744K->8831K(111616K)] 95744K->8911K(367104K), 0.0069614 secs] [Times: user=0.02 sys=0.01, real=0.01 secs]
2019-06-10 20:46:18,990 [main] INFO com.dremio.common.config.SabotConfig - Configuration and plugin file(s) identified in 120ms.

Dremio service is not getting started due to GC allocation error, Could anyone please suggest possible options ? or did anyone encountered the similar issue, can share the experience would be appreciated.

| starting dremio
| 2019-06-10 20:46:18,842 [main] INFO com.dremio.exec.util.GuavaPatcher - Google’s Stopwatch patched for old HBase Guava version.
| 2019-06-10 20:46:18,851 [main] INFO com.dremio.exec.util.GuavaPatcher - Google’s Closeables patched for old HBase Guava version.
| 2019-06-10T20:46:18.930+0000: [GC (Allocation Failure) [PSYoungGen: 95744K->8831K(111616K)] 95744K->8911K(367104K), 0.0069614 secs] [Times:

| 2019-06-10 20:46:18,990 [main] INFO com.dremio.common.config.SabotConfig - Configuration and plugin file(s) identified in 120ms.
| Base Configuration:
| - jar:file:/opt/dremio/jars/dremio-common-3.1.9-201904051346520183-a35b753.jar!/sabot-default.conf
|
| Intermediate Configuration and Plugin files, in order of precedence:
| - jar:file:/opt/dremio/jars/dremio-services-options-3.1.9-201904051346520183-a35b753.jar!/sabot-module.conf
| - jar:file:/opt/dremio/jars/dremio-common-3.1.9-201904051346520183-a35b753.jar!/sabot-module.conf
| - jar:file:/opt/dremio/jars/dremio-dac-common-3.1.9-201904051346520183-a35b753.jar!/sabot-module.conf
| - jar:file:/opt/dremio/jars/dremio-services-accelerator-3.1.9-201904051346520183-a35b753.jar!/sabot-module.conf
| - jar:file:/opt/dremio/jars/dremio-services-users-3.1.9-201904051346520183-a35b753.jar!/sabot-module.conf
| - jar:file:/opt/dremio/jars/dremio-sabot-logical-3.1.9-201904051346520183-a35b753.jar!/sabot-module.conf
| - jar:file:/opt/dremio/jars/dremio-dac-backend-3.1.9-201904051346520183-a35b753.jar!/sabot-module.conf
| - jar:file:/opt/dremio/jars/dremio-extra-plugin-jdbc-3.1.9-201904051346520183-a35b753.jar!/sabot-module.conf
| - jar:file:/opt/dremio/jars/dremio-hbase-plugin-3.1.9-201904051346520183-a35b753.jar!/sabot-module.conf
| - jar:file:/opt/dremio/jars/dremio-s3-plugin-3.1.9-201904051346520183-a35b753.jar!/sabot-module.conf

@paulsuk1982, GC allocation failure is normal part of Java heap management. You can ignore those messages.

Does Dremio come up?

dremio shows up in services and container is also up and inside container if we check .dremio status it shows running,
however can’t connect to dremio UI or client using ODBC connection, it says connection is being rejected. also telnet to port 9047 shows rejection.