Dremio FileNotFoundException error

I’ve this error when a run a reflection on dremio from Looker

FileNotFoundException: /grid/0/hadoop/yarn/local/usercache/dremio/appcache/application_1543574874181_12061/container_e54_1543574874181_12061_01_000006/tmp/4708707843919621304/dremio.app/opt/dremio/jars/3rdparty/hadoop-common-2.8.3.jar (No such file or directory)

Hi @marco

Are you Dremio executors deployed via yarn? Are the Dremio executors up and running? Can look at the provisioning screen and validate?

Thanks
@balaji.ramaswamy

Hi @balaji.ramaswamy the executors are deployed via yarn and running.

Hi @balaji.ramaswamy and @marco ,
I had the same issue.

It seems some random faulty connection between Dremio and Yarn containers.
A “manual” workaround could be Stop and Start Provisioning of Workers via Web UI .
Any suggestion regarding how to monitor and automate this?

Thanks in advance.
Rosario

Any update regarding this issue?

This happens when we have yarn containers that are orphaned, try this

  1. Stop cluster through UI
  2. Run select * from sys.nodes

Does it return any nodes? If yes those are the one that need to be manually killed