I established a low cardinality column as a partition column in a reflection and then queried that dataset using spark sql. Normally I would expect spark catalyst to be able to discover those partitions and partition the data accordingly (https://spark.apache.org/docs/2.2.2/sql-programming-guide.html#partition-discovery) however this did not happen. I had a single partition for the whole dataset. Is this to be expected?
Are you using the jdbc driver to talk to Dremio from Spark? Spark’s partition discovery doesn’t work for jdbc connections. You can use the option from Section 5 of this article though: https://email@example.com/tips-for-using-jdbc-in-apache-spark-sql-396ea7b2e3d3
There is a prototype custom connector for Spark to talk to Dremio natively which will drastically speed it up compared to jdbc and respect partitions. I will update you when it’s released.