So I am using the snowflake/dremio plugin and the latest snowflake jdbc drivers. I have several dimension tables from snowflake configured as reflections (on s3) and performance is very good (slightly faster than snowflake).
My issue relates to a test join between the reflection dimension table and snowflake fact table. The fact table has around 8b rows - I issue a left join with a limit of 1. Snowflake does this in just under 4 seconds.
Dremio is still going after 31 mins.
I can confirm the dimension uses the reflection. When I look in snowflake’s history though, I can see dremio blindly issued a “select * from fact-table” and didn’t push down the limit or any details of the join field and value
Obviously this is horrid and unusable.
Am I missing something ? I tried to set the fact table as a reflection but after 24 hours it was still going - and with no indication of progress I gave in and cancelled it
I am running a 3 node cluster - 1 master, 2 executors on m5.xlarge EC2 instance and have set memory to 15G
I did another test - rather than using limit 1, I chose a specific and unique id from the fact table. In this case, dremio pushed down correctly the “where” clause.
Snowflake took 8 mins and dremio took 6 mins.
So perhaps the initial query was a little bit of an edge case ? I am still interested in whether it truely is an edge case though
Thanks for the reply. I believe that server was waiting on snowflake to respond - my issue was relating to what dremio decided to send down to snowflake. I was expecting a more efficient query to be sent down given the limit of 1