Can Dremio support auto analyze table?

colagy · October 18, 2023, 8:47am

ANALYZE TABLE <table_name> [ for_clause ] statistics_clause
    for_clause::
    FOR  [ ALL COLUMNS |
            COLUMNS ( <list_of_columns> ) ]
statistics_clause::
    COMPUTE STATISTICS |
    DELETE STATISTICS

This is the ANALYZE TABLE grammar. I have to execute the sql every day.
I want to analyze the table every day automicly.

hmarchman-jones · October 18, 2023, 5:16pm

Script it. You can connect with JDBC, ODBC, arrow etc, and trigger it with cron or tsched.
You could even use the REST API and do it with simple cURL requests.

balaji.ramaswamy · October 23, 2023, 3:10am

Thanks for your response @hmarchman-jones

@colagy What is the reason you started to analyze table? Without that are you not getting right row count estimates? Did you get an inefficient pln?

wundi · October 25, 2023, 6:07am

We see this on some of our dimension-fact table joins on Iceberg tables. I can’t share the profile, but we have a query where we filter the fact table on a very high cardinality column (reducing from billions to thousands).

Without analysing the table, the query planner joins the fact and dimension tables before applying the filter on the fact table - essentially dropping predicate pushdown to the Iceberg tables. If we run ANALYZE TABLE and compute statistics for the fact table, the filter predicate is correctly pushed all the way to the Iceberg table. Needless to say, the performance difference is many orders of magnitude.

Whether this is a bug I can’t say, but to my knowledge Dremio doesn’t compute any statistics that aren’t part of the Iceberg metadata already.
Expanding on that, I am not aware of any case where it is beneficial to skip pushing down predicates to the Iceberg tables, since it allows filtering on statistics both at the metadata and file/page level.
For some very specific cases, where the filtering is not actually filtering and costing some additional CPU cycles maybe, but personally haven’t seen these cases in practice or had the Iceberg filtering be the main bottleneck.

colagy · October 25, 2023, 3:24pm

Our data is getting bigger every day, and I’m worried that incorrect statistic information will reduce
query performance

balaji.ramaswamy · November 1, 2023, 5:50am

@colagy Is this Parquet? are you proactively collecting stats or you have a profile that you see incorrect row count estimates? If yes, are you able to send the profile and want to see where the issue is

Topic		Replies	Views
Inconsistent Iceberg predicate pushdowns	9	807	March 14, 2024
Iceberg query performance with many parquet files Dremio University	12	1332	July 22, 2023
Filter not pushing down to Postgres on JOIN	4	264	June 13, 2024
VACUUM and ANALYZE Iceberg Tables	3	42	February 24, 2025
Iceberg query performance	6	1335	March 25, 2022

Can Dremio support auto analyze table?

Related topics