Query execution time vs accelerations

Hello

I’m trying to use accelerations to speed-up query execution time, but I have a hard time understanding what is going on. Here is my set-up:

Dremio runs on an industrial PC (windows 10) with 256 GB RAM + 500 GB drive
I have one NAS source which contains a collection of parquet files :
My fact table is 1 folder with 200 partitionned files (450 millions records), whose columns are essentially a bunch of IDs and a few measures.
Then I have 5/6 “decoding” tables that are generally fairly wide (up to 50 columns) and occasionally pretty long as well (the smallest has 200 records, the largest 30 millions)

I have created a logical view MY_VIEW with all the joins whose definition is something like:
SELECT facts., decod1., decod2., decod3., decod4.*
FROM facts
LEFT JOIN decod1 ON = facts.DECOD1_ID = decod1.ID
LEFT JOIN decod2 ON = facts.DECOD2_ID = decod2.ID
LEFT JOIN decod3 ON = facts.DECOD3_ID = decod3.ID
LEFT JOIN decod4 ON = facts.DECOD4_ID = decod4.ID

I have tried to accelerate this view but I ran out of memory (I know I’ll have to go for aggregate reflections, but I’m not sure yet which fields I should put in)

Then I tried to execute the following query:
SELECT dim1, sum(measure1) FROM MY_VIEW GROUP BY dim1
which runs in about 2 minutes. I was somehow in the hope that, since I have left joins everywhere, the query planner would do the necessary to keep IO as low as possible.

Now when I run the query not using the view:
SELECT decod1.dim1, sum(facts.measure1)
FROM facts LEFT JOIN decod1 ON facts.DECOD1_ID = decod1.ID
GROUP BY dim1
which runs in 6s

Then I thought it may be that the query planner can’t read properly my files metadata (because the parquet files have been generated using pyarrow), so I’d try to accelerate separately each table (de fined as a view SELECT * FROM facts for instance). It absolutely didn’t help improve the query time.

Hence my questions :

  • I’ve noticed that the slow version of the query had many accelerations to choose from, but ultimately chose just one (qualifying the others as too expensive). However I would have guessed that using these accelerations, while expensive, whould generally cost less than reading the source file. Has Dremio really the ability to use several accelerations to serve a query (so far I’ve never seen that happen) as stated in your doc, or is it something on your roadmap ?
  • Is it any useful to create one for one accelerations on my source data (in the end if I understand correctly Dremio acceleration create under the hood the same kind of parquet files, except that you may have forked early on from Arrow to match Dremio specific requirements) ?
  • Side question by the way, do you use any compression codec for the accelerations ? Is it configurable ?
  • And finally, do you understand why my initial query is that slow compared the other one (as they are functionally equivalent and yield the same result)
    I enclose the query profile which I assume should be helpful if you have time to investigate

Best regards,
Romain7c01fddf-984e-4837-8eec-110def8ae800.zip (86.4 KB)

Hey @romain, thanks for the details and the explanations.

With regards to memory — In the Windows and Mac versions the memory usage is hard-coded as these are meant for basic evaluations on smaller datasets. Most likely, your Dremio instance is using 2GBs for heap and 4GBs for direct memory, despite the available 256GB memory. Have you done configuration to work around this somehow? We recommend working with any dataset of non-trivial size in a multi-node Dremio cluster environment on Linux.

With regards to reflection strategy – As you mentioned, setting up aggregation reflections on top of your virtual dataset seems like a good next step to get you the most performance boost. A raw reflection on the joined virtual dataset could also give you a nice boost (but I think this is the query that’s running out of memory?).

Responses to your other questions inline:

I’ve noticed that the slow version of the query had many accelerations to choose from, but ultimately chose just one (qualifying the others as too expensive). However I would have guessed that using these accelerations, while expensive, would generally cost less than reading the source file. Has Dremio really the ability to use several accelerations to serve a query (so far I’ve never seen that happen) as stated in your doc, or is it something on your roadmap ?
Yes, Dremio can use multiple reflections to satisfy a query – this is something available in the product today. When the source data is in Parquet, Dremio’s costing takes into account 1) whether the raw reflections have sorting or partitioning 2) whether the user query has a filter on the column that the reflection was sorted or partitioned on. If both of these conditions are not met, Dremio may choose to go against the raw Parquet files instead. This is probably what’s happening in your scenario.

Is it any useful to create one for one accelerations on my source data (in the end if I understand correctly Dremio acceleration create under the hood the same kind of parquet files, except that you may have forked early on from Arrow to match Dremio specific requirements) ?
When working with Parquet files as source, unless you are sorting or partitioning the raw reflections on a column that your queries will filter on, creating vanilla raw reflections is not needed.

Side question by the way, do you use any compression codec for the accelerations ? Is it configurable ?
Data Reflections use Snappy compression.

And finally, do you understand why my initial query is that slow compared the other one (as they are functionally equivalent and yield the same result)
Looks like one is a 4 way join and the other a 2-way join? Could you share the query profile for the fast one as well to help clarify things?

Hope these help address your questions.

Hi @can
Thanks for your answers, that’s definitively very helpful

With respect to memory, I ran into these limitations at the very beginning of my tests. Now I have Dremio running as a service, with access to 200 GB RAM (I’ve put the same large value for heap and direct memory). I do understand that Dremio is meant to run on proper hardware, but unfortunatly at my workplace, having access to such clusters is not easy nor cheap nor flexible, so I’d rather do as much prep work on the hardware I have access to before requesting it.

I’ve tried again to materialize the joined virtual dataset, but again with no success (I interrupted the query after >5 hours work when both CPU and RAM were low but the disk write speed suddently flet close to zero as it was getting closer to full).

Your answer on Dremio reflection usage strategy is very enlightning. One question though : does sorting have an impact on query execution time (CPU wise), or is it just an enabler for better compression of materializations hence reduced IO (as a side note I have already carefully sorted my input parquet files)?

I wouldn’t like to abuse of your time, but I have another one : I have noticed an option on reflections to either minimize refresh time or number of files produced. Would you mind elaborating a bit about the relative merits and drawbacks of these options ?

Finally, enclosed the two query profiles (+ the definition of the denormalized query). It’s indeed 1 join against 8, but I was hoping that the query planner would skip entirely the useless (LEFT) joins since they’re not involved in the result I’m asking for.

Thanks,
Romain

attachments.zip (71.9 KB)

Hi @romain,

Thanks for uploading the query profiles and extra details provided here.

It’s indeed 1 join against 8, but I was hoping that the query planner would skip entirely the useless (LEFT) joins since they’re not involved in the result I’m asking for.

Answer
I believe that our VDS is similar to traditional databases views, where its SQL logic is always applied. In your specific case that 8-way join would be executed, regardless of which columns/dimensions appear in the SELECT clause.

It’s worth mentioning that VDS is also being used in the field in order to apply certain security policies, i.e. row level security or column level permissions. To further illustrate the latter one here is an example:
Say we have a fact table holding bank transactions records and two dimensions: branch_location and customers. We’d like to restrict access from a specific group of analysts just to North America region and VIP customers related transactions. In that case we’ll create a VDS that joins these tables and applies relevant filters accordingly (Branch.region=NA, Customers.cust_type=VIP) and share it with that group of analysts.(Enterprise version only, community has just a single role - admin).
Now every query executed by those analysts against that VDS should always apply these joins and filters, regardless of the columns in the SELECT clause, otherwise we’ll break security policies and grant access to data they aren’t supposed to see.

In your case a raw reflection would definitely help with respect to performance but that leads us to the next item…

I’ve tried again to materialize the joined virtual dataset, but again with no success (I interrupted the query after >5 hours work when both CPU and RAM were low but the disk write speed suddently flet close to zero as it was getting closer to full)

Answer
Given the fact you’re limited with resources and not running on a decent multi-node cluster then it might makes sense to sample down the source data used here and build that virtual data set + raw reflection on a smaller scale. In addition you can also leverage aggregation reflections as suggested before.

does sorting have an impact on query execution time (CPU wise) or is it just an enabler for better compression of materializations hence reduced IO (as a side note I have already carefully sorted my input parquet files)?

Answer
Yes, if the reflection data is sorted and leveraged by the query acceleration (i.e. sorted columns are used in your WHERE preidcates) that would certainly help reducing IO and CPU (and obviously overall query execution time).

Thanks for these explanations
Best regards,
Romain

Hi Romain,
I am attempting the same kind test on a similar Windows machine
Can you share your config file modification that ables Dremio to get rid of the memory allocation limitation ?
Best
Boris