I’m using dremio version 4.7.0 and java oracle jdk 1.8.0_261
when I execute a query related to gandiva based then on node execution appear error “ExecutionSetupException: One or more nodes lost connectivity during query. Identified nodes were”
check log on execution then see thread dump jvm makes dremio process restart
dremio[18563]: Creating gandiva cache with capacity: 500
A fatal error has been detected by the Java Runtime Environment:
SIGILL (0x4) at pc=0x00007fb4036290c0, pid=18563, tid=0x00007fb3f10d0700
JRE version: Java™ SE Runtime Environment (8.0_261-b12) (build 1.8.0_261-b12)
Java VM: Java HotSpot™ 64-Bit Server VM (25.261-b12 mixed mode linux-amd64 compressed oops)
Problematic frame:
C 0x00007fb4036290c0
Core dump written. Default location: //core or core.18563
An error report file with more information is saved as:
/tmp/hs_err_pid18563.log
If you would like to submit a bug report, please visit:
i am currently using an Intel ® Xeon ® Silver 4216 CPU that supports the AVX512 instruction flags … i guess the java dump is due to the AVX512 flags incompatible with gandiva
yes! both version profiles (4.7.0 & 4.7.3) are the same content … the problem is that both dremio versions (4.7.0 & 4.7.3) are faulty on
I have debugged now and noticed that jdk oracle 1.8 version only supports MaxVectorSize 30bit but gandiva arrow execution uses 512bit AVX512 … this leads to incompatibility causing java thread dump
I need a practical solution for this error but the fact that both versions have the same hardware incompatibility error … this makes a new hardware upgrade related to intel hindered by bug of dremio
Thanks for the response … looking forward to the results from the team … currently the dremio application has great potential for public service to our corporation … only having this problem.
We have 2 clusters running dremio on both kubernet and vm, both of which use the hypervisor of virtualized kvm on intel hardware (Intel ® Xeon ® Silver 4216 CPU)
now I have re-run the query using the failed gandiva and remove the ndv (cast (“rundate” as date)) as “count_distinct_rundate” part according to your instructions on both kubernet and vm … the result is still dumped java thread
I send you the attachment dump java thread and job fail below