Inconsistent Iceberg predicate pushdowns

We are experiencing inconsistencies with Iceberg predicate pushdowns, and I am having trouble understanding why.

Our queries are typically queries on fact tables in iceberg with billions of rows joined with smaller dimension tables from other datasources. The dimension tables have reflections defined.

Our fact tables are partitioned by day or month transforms on a timestamp column and sorted by a key

Our queries filter on the fact tables using min/max timestamp and usually a single key.

What we experience is that small changes to the query (which should not significantly impact the planning), upgrading to a newer Dremio version, or even running the same queries on a separate Dremio cluster running the same version with the same catalog and underlying datasources can make the planner behave completely different.

Sometimes the pushdown works as expected, other times not doing pushdown at all and therefore reading the entire fact table.

The funny thing is that in our dev environment where the fact table has 100s of millions of rows instead of billions, the planner always seems to make the right choice doing the pushdowns as expected, so we don’t discover the problems before we deploy changes or upgrade the Dremio versions in production.

I really don’t understand why the planner would ever choose NOT to push predicates down to the Iceberg tables. Any insight into why this might happen or how to diagnose why it sometimes works and sometimes not would be greatly appreciated.

I might be able to understand that the planner behaves differently between Dremio versions, or if the queries change significantly, but having different behavior on two clusters running the same version with the same underlying data, or on our dev environment versus the production environment puzzles me. 100s of millions of rows vs billions of rows should both benefit equally from pushdowns.

I really wish I could share profiles, but unfortunately the profiles contain sensitive information that I cannot share.

@dotjdk Predicate pushdown should happen, any chance you can share the query profile please?

It does work sometimes, but we can’t really rely on it as it seems to change “at random” if we do small changes to the query, upgrade Dremio, or run the query on a different cluster.

Unfortunately, as mentioned, the query profile contains sensitive information, and due to the binary encoded content in the profile, anonymizing it isn’t even an option.

We are now at a point where we have moved pretty much all of the functionality we wanted to build in VDSs into the application code. We are now just doing the most basic query on the source table (a very simple query joining the fact table with a dimension table, filtering on the sort column and the partitioned timestamp columns), saving the results in an extract table, and then doing the rest of our joins/filtering/aggregations on the extract tables directly from the application. But after upgrading from 24.2.5 to 24.3.0, even that join caused the pushdown to fail in our production environment, and had to be commented out for now just to be able to query the table. Meaning our fact table results are currently not enriched with the dimension data, and we have to find another workaround.

As mentioned by my colleague in a recent post, it would be extremely beneficial, both to us and most likely also very beneficial to Dremio, if it was possible to share profiles in an anonymized manner. Either by making the entire profile non-binary, or by adding an option to anonymize when saving it.

I think I found “part” of the issue

In our dev environment we where running 24.3.0 until today, and the planning worked as expected

Today I upgraded our dev environment to 24.3.2, and after the upgrade the pushdown stopped working. Nothing had changed dataset/VDS/reflection wise between the upgrades. (This is the part that still puzzles me)

I then did a full reset of the cluster (deleted PVCs) and redeployed our catalog (minus reflections)

After this reset, the pushdown worked again

I enabled our reflections on the lookup tables, and the pushdown stopped working.

I then remembered some talk about default reflections a couple of years ago, so in desperation I tried to enable all columns on the reflections on the lookup tables, and viola… the pushdown worked again.

Just to confirm, I switched back to only enabling the columns we actually use, and the pushdown stopped working.

The profile says that the query is accelerated by the reflections in both cases (with and without selective columns enabled)

I still can’t explain why the pushdown behavior would change after an upgrade though.

@balaji.ramaswamy I have verified that the above fixed it in our production setup as well

Hi @dotjdk When troubleshooting filter pushdown issues not happening with reflections, we’ll look at the Query Profile → Planning → Logical Planning phase’s plan and review whether filters appear in the operator: FilesystemScanDrel(table=[“__accelerator”… If filters are being pushed down, you’ll see attributes for the partition/non-partition filters that get pushed down into Iceberg for manifest file and row group pruning.

The thing you pointed out about including all columns in the reflection is what we call using a default raw reflection. In general, we produce the best plans with default raw reflections and don’t miss out on rewrite opportunities like pushing filters past joins and into the scans. When using algebraic matching, we could miss a pushdown and that’s a bug. It does however depend on how complex the plan was to begin with. We are working on trying to fix this.

If you are able to share the logical plan described above that is missing the filters, I could tell you if its a case we are fixing.

1 Like

Regarding the binary stuff in the query profile, it’s primarily:

  • Table schemas in Arrow format

  • Serialized plans used for reflection recommendation

  • Serialized reflection matching information

There is no table data in the profiles. Likely the most sensitive thing will be string literals used in filter conditions and case expressions present in the current query and referenced view SQL definitions.

1 Like

I really wish I could share the plans, but they contain confidential information (table/column names and filter predicates), so it is not currently possible.

The (missing) pushdowns where to the huge iceberg fact tables, not the reflections on the dimension tables.

As the dimension tables are relatively small, I can live with the fact that I have to enable all columns. It just puzzles me why

  1. the plan would change when I select only the columns needed for the query in the dimension table reflections. Especially when the planner was still using the reflection for acceleration.
  2. joining the dimension tables which are several orders of magnitude smaller than the fact tables would cause the planner to skip the pushdown to iceberg in any case

Planning a query goes through many phases. When you had all columns selected, in the query profile, you’ll see in the convertToRel plan that the materialization was already substituted into the plan. Whereas when only the needed columns were selected, the materialization shows up in the logical planning phase.

You can imagine that for the planner to substitute a materialization into a query tree, it has to normalize the query tree and the materialization’s original plan to figure out if they are the same. This normalization process includes doing things like destroying join orders when they don’t matter and trying alternative query forms such as de-composing aggs into joins to match aggregate reflections. In the end, this normalization caused the filter pushdown to get missed. Again, how well we are able to pushdown filters in the re-written plan with reflections depends on how complex the query was to begin with.

I don’t know if this will help you but you could do a workaround where you create a view containing just the columns you need and then re-write the original query to directly use this view. A reflection on this new view will be substituted during convertToRel and you’ll get the expected filter pushdown.

1 Like

I appreciate the fact that reflection substitution is a complex problem. Thanks for the detailed explanation. It helps a lot to understand what to look out for in the profile, and it makes sense now why changing the reflection columns may change the final plan.

I had never really grasped the concept of default reflections and how they could affect the planning until now.