Poor response when joining from reflection to snowflake

So I am using the snowflake/dremio plugin and the latest snowflake jdbc drivers. I have several dimension tables from snowflake configured as reflections (on s3) and performance is very good (slightly faster than snowflake).

My issue relates to a test join between the reflection dimension table and snowflake fact table. The fact table has around 8b rows - I issue a left join with a limit of 1. Snowflake does this in just under 4 seconds.

Dremio is still going after 31 mins.

I can confirm the dimension uses the reflection. When I look in snowflake’s history though, I can see dremio blindly issued a “select * from fact-table” and didn’t push down the limit or any details of the join field and value

Obviously this is horrid and unusable.

Am I missing something ? I tried to set the fact table as a reflection but after 24 hours it was still going - and with no indication of progress I gave in and cancelled it

I am running a 3 node cluster - 1 master, 2 executors on m5.xlarge EC2 instance and have set memory to 15G

Can you share the profile for this query?

247e41c5-16d4-4a4f-8d7d-9e22dfa5bad9.zip (21.5 KB)

I did another test - rather than using limit 1, I chose a specific and unique id from the fact table. In this case, dremio pushed down correctly the “where” clause.

Snowflake took 8 mins and dremio took 6 mins.

So perhaps the initial query was a little bit of an edge case ? I am still interested in whether it truely is an edge case though

@mfarmer

I see the first one that did not respond was waiting on “etl-cluster-2.dev.kl”. Do see you see the same behavior every time you run the below?

SELECT * FROM “snowflake”.kldw.public.f_stay as fs left join amenity_expense as ae on fs.kl_co_code = ae.kl_co_code limit 1

If yes, can we check GC logs during this time?

Thanks for the reply. I believe that server was waiting on snowflake to respond - my issue was relating to what dremio decided to send down to snowflake. I was expecting a more efficient query to be sent down given the limit of 1

I will try again and see if it’s repeatable.

@mfarmer

Is “KLDW”.“PUBLIC”.“F_STAY” a very big table in terms of size/rows?

yes - we have over 8b rows in there.

@mfarmer

Thanks for the update, will look into how to optimize the limit query pushdown

any update on limit query pushdown optimization?