ANALYZE TABLE datalake.iceberg.mytable FOR COLUMNS ("from", ts) COMPUTE STATISTICS
translates to a job with the following query without quotes around the from column
SELECT 'datalake.iceberg.mytable' as TABLE_PATH, ndv("ORIGINAL_from") as "NDV_from" , ndv("ORIGINAL_ts") as "NDV_ts" , count(*) as "RCOUNT_null" , count("ORIGINAL_from") as "COLRCOUNT_from" , count("ORIGINAL_ts") as "COLRCOUNT_ts" , tdigest("ORIGINAL_ts", true) as "TDIGEST_ts" FROM (Select from as "ORIGINAL_from", ts as "ORIGINAL_ts"from datalake.iceberg.mytable)
The job fails because of the missing quotes in the inner SELECT
I was doing a pretty selective query on a table in our dev env. The table has about 6.5 billion rows with a VDS that joins it to a few dimension tables. I waited a maximum of 20 minutes for the query to finish before giving up. (The query returns less than 100K rows)
After the ANALYZE TABLE it was down to 4-5 seconds
Unfortunately I am unable to share the profiles as they contain confidential information. I can verify the row count estimates when I am back at the office.