The reflection now works but the reflected query is a lot slower than the original query. It’s a really small set of files. It takes like 6s to run this really small query.
If you see your query is taking ~7.7s to complete but out of which 6.8s is on wait time for your Parquet row group scan
Thread
Setup Time
Process Time
Wait Time
Max Batches
Max Records
Peak Memory
00-00-07
0.039s
0.047s
6.841s
2
26
8MB
Are your reflections on S3? are the sizes of the reflection files small? This is a know issue on cloud sources like ADLS and S3. Just to see the speed of reflections would it be possible to store the reflections locally?
yes, this is a small test data set so they are super tiny. like 2M file since there aren’t many records at all in the current data set.
if we store locally, is there any special thing we need to do in a 6 node cluster? in other words, do we need to use EFS to mount or will the coordinator somehow distribute the reflections? how can we store them locally and still use the cluster?