Arrow Flight Sql - how to get list of columns for specific table

Hi,

I’ve found following specification for Apache Arrow Flight SQL recently made available in Dremio:
https://docs.dremio.com/software/developing-client-apps/arrow-flight-sql/

There are commands which allows to query different types of metadata but It is not entirely obvious for me how to list all columns for a specific table.
There are commands to get tables, schemas, catalogs, primary and foreign keys but it is not straight forward how to get columns unless I missed something ?
JDBC specification includes corresponding getColumns method.
Is there any way to get all columns for a table using Flight Sql ?

thanks for any tip

I’ve found getting columns missing also in provided instruction on Flight SQL Client development:
https://docs.dremio.com/software/developing-client-apps/arrow-flight-sql/
Basically I need following:

select * from INFORMATION_SCHEMA.COLUMNS

exposed via Flight SQL Dremio’s service with filtering capability by table, schema, column name and so on (e.g. WHERE COLUMN_NAME=‘column’

I’ve discovered myself that it might be achieved by sending CommandGetTables command with includeSchema parameter set to true, unfortunately such request ends with an error, so I cannot check if it really works.

In the job profile I see the query recognized as
[Get Columns] Catalog Filter: , Schema Filter: , Table Filter , Column Filter .
so it proves I am going right way but still as mentioned the request cannot be completed.

I found in the server.log file something which might help in finding where is the problem:

2022-06-21 14:39:10,820 [1d4e27f1-1618-8a1f-0788-004044c1e200/0:foreman-planning] WARN  com.dremio.exec.store.ischema.Column - Type not handled explicitly (code needs review): ANY
2022-06-21 14:39:10,832 [out-of-band-observer] INFO  query.logger - Query: 1d4e27f1-1618-8a1f-0788-004044c1e200; outcome: COMPLETED

Shall I open the ticket for this ? if so please direct me to the repository with the link. Thank you

Yes, the GetTables RPC with includeSchema is how you’d get column metadata similar to the COLUMNS view in INFORMATION_SCHEMA.

It seems likely that there’s a problem with the particular table you are testing. It’d be good to log a ticket with details about that tables (in particular its name and data types) and indicate if the problem occurs with other tables with simpler data types.

Thank you @jduong for this confirmation.
Regarding the table I am using for testing it seems irrelevant. I’ve tried almost every possible combination, With table, with no table (passing null -to collect all), with schema, with no schema (null), with catalog, and with no catalog (null) and always getting the exception below.

I’ve used Derby Flight SQL Service included in Apache Arrow Flight SQL source code and it works fine.

Exception in thread "main" org.apache.arrow.flight.FlightRuntimeException: INTERNAL: There was an error servicing your request.
	at org.apache.arrow.flight.CallStatus.toRuntimeException(CallStatus.java:131)
	at org.apache.arrow.flight.grpc.StatusUtils.fromGrpcRuntimeException(StatusUtils.java:164)
	at org.apache.arrow.flight.grpc.StatusUtils.fromThrowable(StatusUtils.java:185)
	at org.apache.arrow.flight.FlightStream$Observer.onError(FlightStream.java:454)
	at org.apache.arrow.flight.FlightClient$1.onError(FlightClient.java:328)
	at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:479)
	at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
	at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
	at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
	at org.apache.arrow.flight.grpc.ClientInterceptorAdapter$FlightClientCallListener.onClose(ClientInterceptorAdapter.java:117)
	at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:562)
	at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:743)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:722)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Additionally, I’ve found out that Dremio doesn’t seem to implement/support getXdbcTypeInfo FlightSQL request at all.

@jduong can you confirm it ? Thank you

Hi @msmetek ,

getXdbcTypeInfo is supported in Dremio v22 and higher.

1 Like

Thank you @jduong everything works including getting columns metadata in the version 22.0.0.

Thank you very much

Hi,
thank you for answering all the previous questions.

I am looking for one yet more answer. I would like to test one scenario where
I am getting data from various different endpoints using Flight SQL.

Here is example how to get data using FlightSql

FlightInfo flightInfo; // Use a FlightSqlClient method to get a FlightInfo

// 1. Fetch each partition sequentially (though this can be done in parallel)
for (FlightEndpoint endpoint : flightInfo.getEndpoints()) {

  // 2. Get a stream of results as Arrow vectors
  try (FlightStream stream = flightSqlClient.getStream(endpoint.getTicket())) {

    // 3. Iterate through the stream until the end
    while (stream.next()) {

      // 4. Get a chunk of results (VectorSchemaRoot) and print it to the console
      VectorSchemaRoot vectorSchemaRoot = stream.getRoot();
      System.out.println(vectorSchemaRoot.contentToTSVString());
    }
  }
}

and I am talking specifically about this line:

// 1. Fetch each partition sequentially (though this can be done in parallel)
for (FlightEndpoint endpoint : flightInfo.getEndpoints()) {

I always get the same single endpoint and I would like to employ several partitions instead.
Could you please guide me what I should do in regards to cluster configuration and data source
to achieve getting many endpoints from flight info.

Thank you

Hi,
could somebody pls share any tips, suggestion, any pointers to the documentation on this ?

Thank you

Hi @msmetek ,

As part of the GetTables request you can specify the schema and table to indicate the exact table you want to get metadata for. You can set the include_schema flag to get column metadata.

That’s not what I was asking for :slight_smile:

Sorry, I was rather referring to my last question I posted here I which is I admit out of this topic and is asking about something different.

Maybe to avoid further misunderstanding I will close this one and post my new question in the new topic.