Huge Reflection Total footprint

vladislav-stolyarov · September 16, 2025, 9:23am

We have a reflection over vds that involves AWS Glue table.
The reflection is incremental (reflection partion column(with truncate) = glue table partition column).
Reflection tab shows: Footprint: 594.96 MB (226.55 GB) where Total footprint = 226 GB.
The total size of the glue table underlying files is 2.7 GB while VDS filters away 60% of records.

Reflection is updated every hour.

From Dremio wiki:

**Total Footprint**

Shows the current size, in kilobytes, of all of the existing materializations of the Reflection. More than one materialization of a Reflection can exist at the same time, so that refreshes do not interrupt running queries that are being satisfied by the Reflection.

i guess this is not the case here. Why total footprint is so huge?(i suspect that it is being growing over time constantly) How can we control its size? Is there a way to do teh cleanup?

Build Info:

Build: 25.1.6-202501021803480419-127757a4
Edition: AWS Edition (activated)

balaji.ramaswamy · September 18, 2025, 4:50am

@vladislav-stolyarov Look the the reflection refresh job, use the first UUID which is the reflection id and under your location configured in dremio.conf under dist:/// there will be a folder called accelerator, under that, the first UUID is a folder and under that the secondd UUID is the materialization id, are you able to do a du -sh * from the reflection folder?

vladislav-stolyarov · September 18, 2025, 11:09am

In my dremio.conf i have slightly different schema. I guess you meant this one?
paths.accelerator = "dremioS3:///dremio-me-...../dremio/accelerator"

if i go to the folder and search for subfolder==reflection_id i was able to find it.
and its space is: Total number of objects 33,152 Total size 74.4 GB.

For the experiment i took another VDS that is not incremental like problematic one above and found:

Tooltal footprint is x6 more that current footprint which is normal.
On s3 it has 7 subfolders 6 of them has subfolder == materialization_id (from materializations table for this reflection_id) Only one folder is orphan.
subfolder names are in format {materialization_id}_0

On the contrast my problematic reflection folder has one subfolder with thousands of subfolders inside. What is interesting none of the materialization_id is used for those subfolders names.
So i guess in case of incremental reflections some other storage patterns apply.

Reflection details from reflections table:

Спойлер

{
“reflection_id”: “8d999388-2870-4fe1-a656-01e62ff23264”,
“reflection_name”: “Raw Reflection”,
“type”: “RAW”,
“created_at”: “2025-05-26 10:10:40.950”,
“updated_at”: “2025-06-10 09:51:04.752”,
“status”: “CAN_ACCELERATE”,
“dataset_id”: “a3bd83f6-00d8-4aa7-98d9-3ee6d0e6fcdb”,
“dataset_name”: “xxx.”,
“dataset_type”: “VIRTUAL_DATASET”,
“sort_columns”: “”,
“partition_columns”: “TRUNCATE(1,xxx)”,
“distribution_columns”: “”,
“dimensions”: “”,
“measures”: “”,
“display_columns”: “xxx”,
“external_reflection”: “”,
“num_failures”: 0,
“last_failure_message”: “”,
“last_failure_stack”: “”,
“arrow_cache”: false,
“refresh_status”: “SCHEDULED”,
“acceleration_status”: “AVAILABLE”,
“record_count”: 16139062,
“current_footprint_bytes”: 624331469,
“total_footprint_bytes”: 248875651699,
“last_refresh_duration_millis”: 43040,
“last_refresh_from_table”: “2025-09-18 11:03:43.477”,
“refresh_method”: “INCREMENTAL”,
“available_until”: “3025-01-19 11:03:43.477”,
“considered_count”: 4496,
“matched_count”: 4496,
“accelerated_count”: 4496
}

Benny_Chow · September 25, 2025, 8:31pm

Hi @vladislav-stolyarov , this is a known bug that is specific to INCREMENTAL reflections. Reflections track stats for each refresh job and its not adding them up correctly when there is compaction between the refreshes. There’s a sys.refreshes table with more details about these refreshes.

It should be fixed after Dremio 26.1.

vladislav-stolyarov · November 28, 2025, 2:28pm

How can we upgrade to 26.xxx we are on AWS. seems lates there is Ver 25.2.20

Benny_Chow · November 30, 2025, 12:51am

You probably know that AWSE will be deprecated 1/31/2026: AWS Edition (Deprecated) | Dremio Documentation so there is no 26.x version for AWSE.

To migrate from AWSE to 26, first you take a backup of your KV store which has all the sources, tables, views, jobs, reflections, etc.

Using helm, deploy dremio 26 into a k8s cluster

Then, restore the KV store back up using the admin cli:

Topic		Replies	Views
Evaluating Dremio	3	2136	May 17, 2018
Large Reflection creation, speed and performance	4	2292	April 16, 2019
Non-Trivial Reflection Storage Costs	4	1216	September 30, 2022
Reflection and datasource scan	11	1214	May 21, 2021
Reflection Refresh Behavior	1	2361	September 16, 2020

Huge Reflection Total footprint

Related topics