I’m reading your dremio details and thinking about if we can leverage dremio to improve performance of our sql-on-spark, here I have 2 questions:
- How can dremio achieve high performance and even zero-copy when moving data via network ?
- If we have some parquet files on HDFS generated by spark and queried by spark-sql, how can we migrate to dremio? How is the data flow when querying via dremio? hdfs -> arrow -> dremio engine ?
- arrow supports moving data from spark to other systems, how is it implemented? Do we need to inject code into spark?