Issue with Connecting to Dremio Software instance via Flight Client

Hi,

I am trying to connect to Dremio instance deployed in a standalone cluster in US VM from the client that is located in the EU via Flight SQL interface.

From time to time, I’m getting below exception:

java.lang.InterruptedException
	at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:385)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
	at org.apache.arrow.flight.auth2.ClientHandshakeWrapper.doClientHandshake(ClientHandshakeWrapper.java:51)
	at org.apache.arrow.flight.FlightClient.handshake(FlightClient.java:210)

Sometimes I’m getting the error that the connection has timed-out.

I checked the Jobs logs in the Dremio UI and I cannot see any queries that were excecuted.

I also investigated the Dremio instance logs in the /var/log directory, but there was also nothing.

When I’m connecting to that Dremio instance in US via JDBC client then it’s OK. Also when I’m connecting to Dremio instance that is deployed in the EU then it’s fine.

I think there might some issues with the timeout configuration. Is there any way to find out what can be wrong and how to tune it?

Hi pira
Which versions of client and Dremio are you using?

There is a support key mentioned here that may help:

Thank you

US Dremio Cluster version: 22.1.1-202208230402290397-a7010f28
EU Dremio Cluster version: 23.0.1-202210141019030815-c1de8bcc
Dremio Flight SQL client version: 8.0.0

After submitting that query from the above topic, I think the amount of timeouts is now smaller than it was before, however I need to do some more testing.

Thanks.

I did more testing and the timeouts still occur. The above properties did not help.

Hi @pira ,

Could you share what value you have tried setting flight.client.readiness.timeout.millis to?

The default is 5000 and the max is 90000 (15 minutes).

Hi @cindy.la

I’ll try to set to max value then.

Hi @cindy.la,
unfortunately, setting that option to the max value did not help.

Hi @pira ,

Is it timing out at the max value of 15 minutes after you set the support key?
Do you often see the timeout after running a long query? (possibly one that runs longer than 15 minutes?)

Cindy

Hi,

No it’s not timing out after 15 mins after I set that support key.
The timeouts occur more often when I run a query that need to return a lot of data via for example select statement.

@pira

Are you able to provide the Dremio job profile?

@balaji.ramaswamy
I’m attaching the job profile and also pasting the stacktrace from my Flight SQL Client:

SEVERE: Failed on completing future
org.apache.arrow.flight.FlightRuntimeException: UNAVAILABLE: io exception
	at org.apache.arrow.flight.CallStatus.toRuntimeException(CallStatus.java:131)
	at org.apache.arrow.flight.grpc.StatusUtils.fromGrpcRuntimeException(StatusUtils.java:164)
	at org.apache.arrow.flight.grpc.StatusUtils.fromThrowable(StatusUtils.java:185)
	at org.apache.arrow.flight.auth2.ClientHandshakeWrapper.doClientHandshake(ClientHandshakeWrapper.java:59)
	at org.apache.arrow.flight.FlightClient.handshake(FlightClient.java:210)
	
	...
	Caused by: java.io.IOException: Operation timed out
	at java.base/sun.nio.ch.FileDispatcherImpl.read0(Native Method)
	at java.base/sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
	at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:276)
	at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:233)
	at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:223)
	at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:356)
	at io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:258)
	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1132)
	at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:357)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:151)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	... 1 more

66eb3fc3-ffa1-41f8-9f47-974116832177 2.zip (11.0 KB)

THe profile you sent, is a prepare statement and is COMPLETED, do you see any other failed jobs?

I’ve tried running the query once again from my Arrow Flight SQL client and it failed again with the same error as above, however I can’t see any logged jobs in the Dremio UI console now.