Hello,
after updating Dremio from 24.3.2 to 25.1.0 we are experiencing crashes (SIGSEGV
) when running queries that ran fine before.
We are running Dremio on K8s, installed with the “official” Helm chart.
The Dremio executor crashes with this error:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f1a3b472881, pid=1, tid=268
#
# JRE version: OpenJDK Runtime Environment Temurin-11.0.24+8 (11.0.24+8) (build 11.0.24+8)
# Java VM: OpenJDK 64-Bit Server VM Temurin-11.0.24+8 (11.0.24+8, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C [libjoust.so+0x72881] Joust::HashTable::LTable<16, false>::Find(unsigned short const*, int, int, unsigned char const*, unsigned char const*, int const*, int*, Joust::HashTable::NullMaskType, Joust::HashTable::NullMaskValue)+0x191
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /opt/dremio/core.1)
#
# An error report file with more information is saved as:
# /tmp/hs_err_pid1.log
Logs from the master at the same time:
2024-09-10 14:33:26,294 [FABRIC-rpc-event-queue] INFO c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 191fa749-540c-9a0c-cc51-9991a070c300:0:0 with assigned weight 1 and scheduling weight 1
2024-09-10 14:33:26,298 [e9 - 191fa749-540c-9a0c-cc51-9991a070c300:frag:0:0] INFO c.d.exec.expr.ExpressionSplitter - Named expression: FunctionHolderExpression [args=[ValueVectorReadExpression [fieldId=TypedFieldId [fieldIds=[0], remainder=null]], FunctionHolderExpression [args=[FunctionHolderExpression [args=[FunctionHolderExpression [args=[ValueExpression[quoted_string=.], ValueVectorReadExpression [fieldId=TypedFieldId [fieldIds=[0], remainder=null]]], name=position, returnType=int32, isRandom=false], ValueExpression[int=1]], name=add, returnType=int32, isRandom=false]], name=castBIGINT, returnType=int64, isRandom=false]], name=substring, returnType=varchar, isRandom=false]
2024-09-10 14:33:26,310 [e9 - 191fa749-540c-9a0c-cc51-9991a070c300:frag:0:0] ERROR com.dremio.sabot.driver.SmartOp - StatusRuntimeException: CANCELLED: Server sendMessage() failed with Error
com.dremio.common.exceptions.UserException: StatusRuntimeException: CANCELLED: Server sendMessage() failed with Error
at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:984)
at com.dremio.sabot.driver.SmartOp.contextualize(SmartOp.java:203)
at com.dremio.sabot.driver.SmartOp$SmartProducer.outputData(SmartOp.java:599)
at com.dremio.sabot.driver.StraightPipe.pump(StraightPipe.java:55)
at com.dremio.sabot.driver.Pipeline.doPump(Pipeline.java:134)
at com.dremio.sabot.driver.Pipeline.pumpOnce(Pipeline.java:124)
at com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run(FragmentExecutor.java:655)
at com.dremio.sabot.exec.fragment.FragmentExecutor.run(FragmentExecutor.java:560)
at com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run(FragmentExecutor.java:1234)
at com.dremio.sabot.task.AsyncTaskWrapper.run(AsyncTaskWrapper.java:130)
at com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop(SlicingThread.java:279)
at com.dremio.sabot.task.slicing.SlicingThread.run(SlicingThread.java:186)
Caused by: io.grpc.StatusRuntimeException: CANCELLED: Server sendMessage() failed with Error
at io.grpc.Status.asRuntimeException(Status.java:533)
at io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:631)
at com.dremio.exec.store.ischema.writers.TableWriter.write(TableWriter.java:77)
at com.dremio.exec.store.ischema.InformationSchemaRecordReader.next(InformationSchemaRecordReader.java:102)
at com.dremio.sabot.op.scan.ScanOperator.outputData(ScanOperator.java:418)
at com.dremio.sabot.driver.SmartOp$SmartProducer.outputData(SmartOp.java:595)
... 9 common frames omitted
2024-09-10 14:33:36,524 [FABRIC-rpc-event-queue] INFO com.dremio.sabot.exec.MaestroProxy - All queries on executor are active on coordinator. No queries to cancel.
We have now switched back to Dremio 24.3.2 and the queries are working again without any problems.
Is anyone else experiencing these crashes? Any idea how to fix this?
Thank you in advance and best regards,
Nico