I am running Dremio OSS on Kubernetes and accessing it through Istio gateway.
My client is a .Net Console app trying to run a simple query. When i use a low number of records (e.g. using LIMIT 100), i get results back. However, when this number goes beyond a few hundreds, i get back the below exception:
exception:Status(StatusCode=“Internal”, Detail=“Error reading next message. InvalidDataException: Unexpected end of content while reading the message content.”, DebugException="System.IO.InvalidDataException: Unexpected end of content while reading the message content.
- at Grpc.Net.Client.StreamExtensions.ReadMessageContentAsync(Stream responseStream, Memory`1 messageData, Int32 length, CancellationToken cancellationToken)*
- at Grpc.Net.Client.StreamExtensions.ReadMessageAsync[TResponse](Stream responseStream, GrpcCall call, Func`2 deserializer, String grpcEncoding, Boolean singleMessage, CancellationToken cancellationToken)*
- at Grpc.Net.Client.Internal.HttpContentClientStreamReader`2.MoveNextCore(CancellationToken cancellationToken)")*
My code essentially makes calls to GetFlightInfo and DoGet. In some cases, GetflightInfo also does not return until the HttpClient times out. In such cases, the Jobs UI shows “create prepared stmt” but doesn’t show the corresponding “execute prepared stmt” I would expect.
However, execute the code a few more times and it can progress.
Using the arrow-flight Java and Python clients i am able to get all the records when calls succeed. Though they also get “stuck” sometimes, but to a lesser extent and in such cases no calls go through (unlike .Net where flightinfo goes through but DoGet errors)
I did run this through Wireshark as well and can see this :-
[Expert Info (Error/Malformed): C:\gitlab-builds\builds\MsQ3pox2\1\wireshark\wireshark\epan\dissectors\packet-http2.c:3086: failed assertion “!((pinfo)->fd->visited) && datalen == length”]
I have already tried various combinations on the client and server side:-
with .Net 6 and 7
with various versions of Arrow flight and Grpc
connecting directly using external IP vs going through Istio ingress
with and without Istio injection enabled
ensured that i was able to see Http2 being used all the way to the dremio master pod
running the .Net app on windows 11 vs wsl2 ubuntu 22.04
Sometimes, the envoy proxy on dremio master shows it received the inbound call but no Job gets created, and sometimes the Job gets created but client receives no response and idle-times out.
There are NO issues when I execute the same queries using Dremio UI. Also, the SQL and Jobs API seem like a more reliable bet right now.
These are pretty small queries with the payloads ~2-4MB.
I am trying to understand what the potential problem(s) might be here and what other areas to look into. Any clues would be much appreciated.