Can Dremio support auto analyze table?

ANALYZE TABLE <table_name> [ for_clause ] statistics_clause
    for_clause::
    FOR  [ ALL COLUMNS |
            COLUMNS ( <list_of_columns> ) ]
statistics_clause::
    COMPUTE STATISTICS |
    DELETE STATISTICS

This is the ANALYZE TABLE grammar. I have to execute the sql every day.
I want to analyze the table every day automicly.

Script it. You can connect with JDBC, ODBC, arrow etc, and trigger it with cron or tsched.
You could even use the REST API and do it with simple cURL requests.

Thanks for your response @hmarchman-jones

@colagy What is the reason you started to analyze table? Without that are you not getting right row count estimates? Did you get an inefficient pln?

We see this on some of our dimension-fact table joins on Iceberg tables. I can’t share the profile, but we have a query where we filter the fact table on a very high cardinality column (reducing from billions to thousands).

Without analysing the table, the query planner joins the fact and dimension tables before applying the filter on the fact table - essentially dropping predicate pushdown to the Iceberg tables. If we run ANALYZE TABLE and compute statistics for the fact table, the filter predicate is correctly pushed all the way to the Iceberg table. Needless to say, the performance difference is many orders of magnitude.

Whether this is a bug I can’t say, but to my knowledge Dremio doesn’t compute any statistics that aren’t part of the Iceberg metadata already.
Expanding on that, I am not aware of any case where it is beneficial to skip pushing down predicates to the Iceberg tables, since it allows filtering on statistics both at the metadata and file/page level.
For some very specific cases, where the filtering is not actually filtering and costing some additional CPU cycles maybe, but personally haven’t seen these cases in practice or had the Iceberg filtering be the main bottleneck.

Our data is getting bigger every day, and I’m worried that incorrect statistic information will reduce
query performance

@colagy Is this Parquet? are you proactively collecting stats or you have a profile that you see incorrect row count estimates? If yes, are you able to send the profile and want to see where the issue is