Poor response when joining from reflection to snowflake

mfarmer · December 13, 2019, 12:42am

So I am using the snowflake/dremio plugin and the latest snowflake jdbc drivers. I have several dimension tables from snowflake configured as reflections (on s3) and performance is very good (slightly faster than snowflake).

My issue relates to a test join between the reflection dimension table and snowflake fact table. The fact table has around 8b rows - I issue a left join with a limit of 1. Snowflake does this in just under 4 seconds.

Dremio is still going after 31 mins.

I can confirm the dimension uses the reflection. When I look in snowflake’s history though, I can see dremio blindly issued a “select * from fact-table” and didn’t push down the limit or any details of the join field and value

Obviously this is horrid and unusable.

Am I missing something ? I tried to set the fact table as a reflection but after 24 hours it was still going - and with no indication of progress I gave in and cancelled it

I am running a 3 node cluster - 1 master, 2 executors on m5.xlarge EC2 instance and have set memory to 15G

steven · December 13, 2019, 12:57am

Can you share the profile for this query?

mfarmer · December 13, 2019, 1:00am

247e41c5-16d4-4a4f-8d7d-9e22dfa5bad9.zip (21.5 KB)

mfarmer · December 13, 2019, 1:10am

I did another test - rather than using limit 1, I chose a specific and unique id from the fact table. In this case, dremio pushed down correctly the “where” clause.

Snowflake took 8 mins and dremio took 6 mins.

So perhaps the initial query was a little bit of an edge case ? I am still interested in whether it truely is an edge case though

balaji.ramaswamy · December 13, 2019, 3:20am

@mfarmer

I see the first one that did not respond was waiting on “etl-cluster-2.dev.kl”. Do see you see the same behavior every time you run the below?

SELECT * FROM “snowflake”.kldw.public.f_stay as fs left join amenity_expense as ae on fs.kl_co_code = ae.kl_co_code limit 1

If yes, can we check GC logs during this time?

mfarmer · December 13, 2019, 3:50am

Thanks for the reply. I believe that server was waiting on snowflake to respond - my issue was relating to what dremio decided to send down to snowflake. I was expecting a more efficient query to be sent down given the limit of 1

I will try again and see if it’s repeatable.

balaji.ramaswamy · December 13, 2019, 4:55am

@mfarmer

Is “KLDW”.“PUBLIC”.“F_STAY” a very big table in terms of size/rows?

mfarmer · December 13, 2019, 2:04pm

yes - we have over 8b rows in there.

balaji.ramaswamy · December 16, 2019, 4:31am

@mfarmer

Thanks for the update, will look into how to optimize the limit query pushdown

desi · January 25, 2021, 11:41am

any update on limit query pushdown optimization?

Topic		Replies	Views
Different results from using the same join query with reflection vs without reflection	3	918	September 3, 2021
Pulling large datasets from Snowflake into Dremio	4	58	November 13, 2024
Dremio uses one core for processing	7	1583	January 12, 2020
The UI preview is very slow compared to the run	11	1504	May 5, 2020
"limit 100" in query is not always respected	7	384	June 26, 2024

Poor response when joining from reflection to snowflake

Related topics