Dremio auto shutdown

Vikash_Singh · November 20, 2019, 5:41am

Hi ,
I am still struglling with dremio stablity its auto shutdown not able to access gui

2019-11-20 05:31:01,148 [222b3083-e945-9936-0b2d-c8ae5cddf000:foreman] INFO c.d.e.p.s.h.commands.FragmentStarter - User Error Occurred [ErrorId: a52310dd-4d25-4362-adc9-c647ce78d216]
com.dremio.common.exceptions.UserException: Exceeded timeout (5000) while waiting after sending work fragments to remote nodes. Sent 1 and only heard response back from 0 nodes.
at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:776) ~[dremio-common-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
at com.dremio.exec.planner.sql.handlers.commands.FragmentStarter.startFragments(FragmentStarter.java:157) [dremio-sabot-kernel-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
at com.dremio.exec.planner.sql.handlers.commands.FragmentStarter.start(FragmentStarter.java:80) [dremio-sabot-kernel-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
at com.dremio.exec.planner.sql.handlers.commands.AsyncCommand.startFragments(AsyncCommand.java:98) [dremio-sabot-kernel-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
at com.dremio.exec.work.foreman.AttemptManager.run(AttemptManager.java:332) [dremio-sabot-kernel-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_222]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_222]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_222]
2019-11-20 05:31:01,148 [222b3083-b764-ad0e-427f-5ad6b89c1200:foreman] INFO c.d.e.p.s.h.commands.FragmentStarter - User Error Occurred [ErrorId: 00d35199-3b9c-402d-a2ce-76c1931481bf]
com.dremio.common.exceptions.UserException: Exceeded timeout (5000) while waiting after sending work fragments to remote nodes. Sent 1 and only heard response back from 0 nodes.
at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:776) ~[dremio-common-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
at com.dremio.exec.planner.sql.handlers.commands.FragmentStarter.startFragments(FragmentStarter.java:157) [dremio-sabot-kernel-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
at com.dremio.exec.planner.sql.handlers.commands.FragmentStarter.start(FragmentStarter.java:80) [dremio-sabot-kernel-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
at com.dremio.exec.planner.sql.handlers.commands.AsyncCommand.startFragments(AsyncCommand.java:98) [dremio-sabot-kernel-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
at com.dremio.exec.work.foreman.AttemptManager.run(AttemptManager.java:332) [dremio-sabot-kernel-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_222]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_222]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_222]

But in kubernate pods all are in running state

Vikash_Singh · November 20, 2019, 5:42am

after restart dremio master node it works

Vikash_Singh · November 20, 2019, 2:23pm

i need solution i cant always restart Master pods as its a production ready system

balaji.ramaswamy · November 20, 2019, 4:27pm

@Vikash_Singh

Send us the complete server.log from the master coordinator, few questions

#1 Do you have a second coordinator other than the master? If yes, we do not recommend that
#2 Have you by any chance provisioned all the OS RAM to Dremio, if yes , it could be the oom-killer that is killing Dremio. You need to at least leave 4 GB to the OS
#3 Have set heap on the coordinator explicitly? if you have set MAX then everything minus 2 GB goes to heap and may end up in full GC cycles. Check the Dremio master pod log to see if you see Full GC’s

Vikash_Singh · November 21, 2019, 5:45am

Hello @balaji.ramaswamy,

Do you have a second coordinator other than the master? --> No
Have you by any chance provisioned all the OS RAM
i have 128 Gb Ram where only 116 GB Ram Assigned to dremio rest all free for OS
Have set heap on the coordinator explicitly?

yes this is value in dremio env

DREMIO_MAX_HEAP_MEMORY_SIZE_MB=10240
DREMIO_MAX_DIRECT_MEMORY_SIZE_MB=51200

please find atatche d logs for above error

dremio-master-0.zip (1.1 MB)

balaji.ramaswamy · November 21, 2019, 6:14am

@Vikash_Singh

The log file you sent has no Dremio startup or shutdown. Do these logs rollover when you restart Dremio?

Vikash_Singh · November 21, 2019, 6:27am

yes might be since its a production system i cant stop server and debub ,next time i will provide when i get same issue again i guess every week i faced this type of issue.

can you also guide how to copy full dremio log to local filesystem from kubernate pods as

kubectl logs -f dremio-master-0 > dremio-master-0.log looks hanged for me

Vikash_Singh · December 13, 2019, 11:13am

@balaji.ramaswamy i am still facing Dremio ui auto shutdown automatically and also see master pods status 0/1
i am attaching master log of issue dremio-master-0.zip (531.4 KB)

Vikash_Singh · December 13, 2019, 11:14am

Its quite urgent for me i cant able to run dremio to production every day i have to manually restart dremio master pods plese suggest permanent solution

balaji.ramaswamy · December 16, 2019, 4:26am

@Vikash_Singh

The attached logfile has nor entries on Dremio shutting down or starting up, Would you be able to send the logfile that has keyword “KVstore” or “localhost”

Thanks
@balaji.ramaswamy

Diego · March 23, 2023, 9:58pm

Hello, @balaji.ramaswamy

There is some way to force stop/start worker nodes programmatically?

balaji.ramaswamy · March 29, 2023, 12:48am

@Diego Depends on the deployment

K8’s and Yarn should restart automatically
Standalone VM’s would require a custom script to start on failure

Thanks
Bali

Diego · March 29, 2023, 5:43am

@balaji.ramaswamy I use Dremio AWS version. I’m interested on schedule an hour to stop nodes and hour to start these nodes

Diego · April 5, 2023, 10:18am

@balaji.ramaswamy, I noticed that when I stop the Dremio service, after a few minutes, the executor nodes are also stopped. I set the nodes to auto start so that when the service is turned on they go up together. But, the second engine does not go up

balaji.ramaswamy · April 8, 2023, 10:21pm

@Diego The second engine is a preview engine that will be used only for queries run in the preview mode, try hitting preview instead of run and see of that helps?

Diego · April 11, 2023, 1:33pm

@balaji.ramaswamy When I click on preview the nodes automatically startup? I guess no

balaji.ramaswamy · April 12, 2023, 10:22pm

@Diego If you hit preview on a query, the engine should automatically startup

Topic		Replies	Views
Not able to start production dremio	9	1264	March 18, 2020
Auto restart for master(coordinator) Dremio University	3	970	December 25, 2022
Some time dremio webui not working contact support	7	1751	December 30, 2019
Dremio On-Premise Server Down - ERROR ROOT Dremio is exiting. Node lost its master status	3	1618	January 3, 2022
Dremio master unhealthy	6	1013	March 5, 2021

Dremio auto shutdown

Related topics