i have two cases for performance test and don’t understand completely in dremio cluster.
- run query to datasource(s3) without reflection
i created tables to use and didn’t set reflection to prevent it from caching. query was executed twice and appeared different performance between executions
initial trial took 61s
second trial took 18s
what i expected was data is not cached because i didn’t set reflection but it seems data was cached and like accelerated by that even though job execution page in job history didn’t show accelerated icon( firing rocket).
my question is, what makes the second execution faster? does it use local cache at second trial if there’s no update at the datasource?
- run query with reflection
turning on reflection fetched data from datasource to local storage. i used row reflection. and i ran the same query and performance seemed similar to previous execution’s second trial(18s).
my question is what is difference between second execution without reflection and execution with reflection?
it would be really helpful to me to understand how dremio works.
thanks!