Questions about cache and reflection

i have two cases for performance test and don’t understand completely in dremio cluster.

  1. run query to datasource(s3) without reflection
    i created tables to use and didn’t set reflection to prevent it from caching. query was executed twice and appeared different performance between executions

initial trial took 61s
second trial took 18s

what i expected was data is not cached because i didn’t set reflection but it seems data was cached and like accelerated by that even though job execution page in job history didn’t show accelerated icon( firing rocket).

my question is, what makes the second execution faster? does it use local cache at second trial if there’s no update at the datasource?

  1. run query with reflection
    turning on reflection fetched data from datasource to local storage. i used row reflection. and i ran the same query and performance seemed similar to previous execution’s second trial(18s).

my question is what is difference between second execution without reflection and execution with reflection?

it would be really helpful to me to understand how dremio works.

thanks!

@indigoblue8848 There are 4 in total

  • Unaccelerated query without C3
  • Unaccelerated query with C3
  • Accelerated query without C3
  • Accelerated query with C3

Kindly send me the profiles of the different runs and I can explain what happened

Ok i’ll share my profiles with you

@balaji.ramaswamy hello, i’ve done replicating all the cases you’ve noted.

thanks for your help.