Connection between Dremio, Spark with Arrow Flight

Hi All,

I’ve been thinking about the connection between the three of the component above. Is that still necessary to communicate with Dremio from Spark using Flight?

I see the project here from Ryan but has not been updated. It’s PoC but need some clarification.

So i’m curious if the ideas still beneficial or not given the current state of Dremio has already providing Flight endpoint.

Basically Spark can just directly access the parquet+iceberg and do direct manipulation there. But that’s for physical data source. However for the virtual datasets we can’t do that.

We can just use Flight to manipulate the virtual datasets. But do we need spark in other scenario interacting with Dremio? Anyone can find a good use case for that one?

@balaji.ramaswamy @Viktor

Any thought on this?

Cheers

@weltam This work is currently not complete and not prioritized currently

There is one for your reference - GitHub - qwshen/spark-flight-connector: A Spark Connector that reads data from / writes data to Flight end-points with Arrow-Flight and Flight-SQL. Thanks