Dremio Services not coming up in Docker swarm after nodes rebooted

paulsuk1982 · June 7, 2019, 9:34am

Hi All,

we are hitting below error in dremio service log when redeploying stack in Docker swam. Is it still a known issue where persistent metadata files are being hold by another process after node reboots.?
Should we clean up the files and try to restart the service?

dremio_dremio.1.p0p8nkfl73u2@d01cdk003 | 2019-06-07 09:23:03,500 [main] INFO c.dremio.datastore.ByteStoreManager - Lock file to RocksDB is currently hold by another process. Will wait until lock is freed.
dremio_dremio.1.p0p8nkfl73u2@d01cdk003 | Lock file to RocksDB is currently hold by another process. Will wait until lock is freed.

Venugopal_Menda · June 7, 2019, 11:36am

Hi @paulsuk1982

Can you see anyother dremio process running

what is the output of this command gives,

ps -aux | grep -i dremio

Thanks
@Venugopal_Menda

paulsuk1982 · June 7, 2019, 11:59am

Hi @Venugopal_Menda
Thanks for the update, I am not the cluster admin, I need to check with the docker admin and will let you know, may be within hours or so.

paulsuk1982 · June 7, 2019, 2:38pm

There are no dremio services running in any of the docker nodes, we just checked,

should we clean up the volumes now and try to re-deploy the stack ?

paulsuk1982 · June 7, 2019, 3:13pm

We tried to do metadata cleanup referring the below article using dremio admin script, every clean up operaition we do getting the same error
2019-06-07 14:59:27,634 [main] INFO c.dremio.datastore.ByteStoreManager - Lock file to RocksDB is currently hold by another process. Will wait until lock is freed.
Lock file to RocksDB is currently hold by another process. Will wait until lock is freed.

https://docs.dremio.com/advanced-administration/metadata-cleanup.html

What needs to be done now?

paulsuk1982 · June 7, 2019, 5:01pm

Glad we get rid of the issue,
Metadata clean up didn’t helped and popping up same issue file locked by another process.
Finally we removed the LOCK file from db/catalog directory which resolved the issue and services got started up.

paulsuk1982 · June 10, 2019, 7:28pm

Now the container and service is up and running in swarm cluster, but dremio service is not reachable, and telnet to port 9047 is also being rejected,
Could someone faced this kind of issue before and suggest what can be checked in the network side.

This site can’t be reached

f01bhr004 refused to connect.

Try:

Checking the connection
Checking the proxy and the firewall

ERR_CONNECTION_REFUSED

paulsuk1982 · June 10, 2019, 9:21pm

Getting GC allocation failure in dremio service log and dremio service is inaccessible, could you please help if anyone faced this issue before in dremio.

2019-06-10T20:46:18.930+0000: [GC (Allocation Failure) [PSYoungGen: 95744K->8831K(111616K)] 95744K->8911K(367104K), 0.0069614 secs] [Times: user=0.02 sys=0.01, real=0.01 secs]
2019-06-10 20:46:18,990 [main] INFO com.dremio.common.config.SabotConfig - Configuration and plugin file(s) identified in 120ms.

paulsuk1982 · June 11, 2019, 3:02am

Dremio service is not getting started due to GC allocation error, Could anyone please suggest possible options ? or did anyone encountered the similar issue, can share the experience would be appreciated.

| starting dremio
| 2019-06-10 20:46:18,842 [main] INFO com.dremio.exec.util.GuavaPatcher - Google’s Stopwatch patched for old HBase Guava version.
| 2019-06-10 20:46:18,851 [main] INFO com.dremio.exec.util.GuavaPatcher - Google’s Closeables patched for old HBase Guava version.
| 2019-06-10T20:46:18.930+0000: [GC (Allocation Failure) [PSYoungGen: 95744K->8831K(111616K)] 95744K->8911K(367104K), 0.0069614 secs] [Times:

| 2019-06-10 20:46:18,990 [main] INFO com.dremio.common.config.SabotConfig - Configuration and plugin file(s) identified in 120ms.
| Base Configuration:
| - jar:file:/opt/dremio/jars/dremio-common-3.1.9-201904051346520183-a35b753.jar!/sabot-default.conf
|
| Intermediate Configuration and Plugin files, in order of precedence:
| - jar:file:/opt/dremio/jars/dremio-services-options-3.1.9-201904051346520183-a35b753.jar!/sabot-module.conf
| - jar:file:/opt/dremio/jars/dremio-common-3.1.9-201904051346520183-a35b753.jar!/sabot-module.conf
| - jar:file:/opt/dremio/jars/dremio-dac-common-3.1.9-201904051346520183-a35b753.jar!/sabot-module.conf
| - jar:file:/opt/dremio/jars/dremio-services-accelerator-3.1.9-201904051346520183-a35b753.jar!/sabot-module.conf
| - jar:file:/opt/dremio/jars/dremio-services-users-3.1.9-201904051346520183-a35b753.jar!/sabot-module.conf
| - jar:file:/opt/dremio/jars/dremio-sabot-logical-3.1.9-201904051346520183-a35b753.jar!/sabot-module.conf
| - jar:file:/opt/dremio/jars/dremio-dac-backend-3.1.9-201904051346520183-a35b753.jar!/sabot-module.conf
| - jar:file:/opt/dremio/jars/dremio-extra-plugin-jdbc-3.1.9-201904051346520183-a35b753.jar!/sabot-module.conf
| - jar:file:/opt/dremio/jars/dremio-hbase-plugin-3.1.9-201904051346520183-a35b753.jar!/sabot-module.conf
| - jar:file:/opt/dremio/jars/dremio-s3-plugin-3.1.9-201904051346520183-a35b753.jar!/sabot-module.conf

ben · June 11, 2019, 6:57pm

@paulsuk1982, GC allocation failure is normal part of Java heap management. You can ignore those messages.

Does Dremio come up?

paulsuk1982 · June 11, 2019, 7:18pm

dremio shows up in services and container is also up and inside container if we check .dremio status it shows running,
however can’t connect to dremio UI or client using ODBC connection, it says connection is being rejected. also telnet to port 9047 shows rejection.

Topic		Replies	Views
Dremio not starting - Lock file to RocksDB is currently hold	6	4020	June 19, 2018
Rreceiving "Lock file to RocksDB is currently held by another process"	4	3283	June 10, 2020
New Dremio 4.7.2 installation not starting up	8	1955	December 21, 2020
Cleanup issue with lock	7	2878	April 19, 2018
How to clean metadata in kubernate container	3	1617	September 4, 2019

Dremio Services not coming up in Docker swarm after nodes rebooted

This site can’t be reached

Related topics