How much space does enabling arrow caching take

How much space does enabling arrow caching take?

is there a way to see this via the ui?

@lfcosio

Are you talking about enabling arrow caching for reflections? here are some pointers before you enable it

  • Arrow caching is beneficial when a query takes very little compute time and most time is spent reading from parquet files. This is most common when a small number of rows are selected from a reflection. The primary benefit is arrow caching avoids decompression in parquet.
  • This is very workload dependent. Data in arrow format will likely be some number of times larger than in compressed parquet format. However Dremio uses C3 to be efficient and only convert commonly used data to arrow format.
  • The option to use Arrow caching is a configuration setting, after it is selected, Dremio will promote the cached reflection as part of the process that caches reflections locally.
  • sys.reflections should tell you if a certain reflection had ARROW CACHING enabled

* The option to use Arrow caching is a configuration setting, after it is selected, Dremio will promote the cached reflection as part of the process that caches reflections locally.

Thanks for the info. You mentioned that it caches it locally. How can I see how much space it actually consumes?

I checked sys.reflections with sys.materializations and found that no changes happened to the size when I enabled and disabled arrow caching respectively. Or will it only reflect increase in size as soon as I throw it some queries? Will the sys.materializations bytes column adjust its value taking into account arrow caching? Thanks!

@lfcosio

Does it show the same size on disk too?

Hi guys, i’ve enabled the arrow chaching for some reflection, but in the sys.reflection the column arrow_cache is set to false for all the records. some idea?

thanks

@LucGth

This is a known bug, we are looking into it. Meanwhile when the arrow cache enabled reflections is used, you will see it in the planning tab of the profile (arrowCachingEnabled=True)

The REST API call should tell us true

“arrowCachingEnabled”: true,

@balaji.ramaswamy is it possible to enable arrow caching in dremio v.22+?

Thanks in advance!

@sergey-kryuchkov

Is there a particular reason why you need to enable arrow caching, is there a reflection performance issue with the default settings?

Yes, when we use reflections we have not enough perfomance with big datasets (more than 800 millions rows).

@sergey-kryuchkov Would it be possible to send the job profile, want to make sure arrow caching will help