What error keyword I should capture from log for monitoring

Hi there,

I’m rebuilding our monitor systems for Dremio
Is there any common error keyword I should capture from log file, so that I could raise an alert for the team to look into?
Right now I have some like:

  • Full GC (Allocation Failure) → jvm full gc is triggerd
  • ERROR Fabric Channel closed → connection beetween nodes
  • java.lang.OutOfMemoryError: → OOM

Any other keyword I should pay attention to?

Very appreciate

@quangth2 There are quite a few but unfortunately there is not error codes built. This something Dremio is looking into. Meanwhile, can you look at the Dremio queries.json for the last 30 days and get all the outcomereason for only status=‘FAILED’ and build from there?