Hi!
I’ve got a Physical Dataset (A) with an aggregate reflection on dimension columns d1, d2 and aggregate (SUM) columns a1, a2, …, a750. (Also a raw reflection on this).
I’ve got another PDS (B) with columns c1, c2, c3 with a raw reflection.
I’ve separately got a Virtual Dataset which joins A with B on columns d1 = c1. (many to 1).
Using an external client (python) if I query the VDS with a WHERE clause on c2 (which should give me a subset of A) [and summing the a1 - a750 cols] then I see in the jobs that this query is not being accelerated by the aggr reflection on A (considered, not matched.) and is instead hitting the raw reflection on A. This query takes 220 seconds.
If with the same client I don’t query the VDS and instead query A and B directly, specifying the same join and WHERE on the same column then it DOES hit the aggregate reflection on A and the query completes in 40 seconds.
Is this behaviour explainable? I was surprised as I thought the query on the VDS would still optimise in the same way.
Thanks for your help