Summary Job Statistics

We’re rolling out Dremio internally and want to measure utilization. The target metric is Jobs per month. Ideally, we could filter using the same criteria in the Jobs UI (start time, status, user, etc).

What’s the best way to collect this information? There are no summary stats in the Jobs UI (that I can see), and the REST endpoint can only pull jobs by specific jobId.

Note: This article states there is a sys.job_result table, but I am not seeing that when I SHOW TABLES IN SYS. If this does exist, I may be able to use it to group results …

@candlergrimes

Dremio writes a file called queries.json to the coordinator log folder, times are in UTC. It records every job that hits Dremio, be it REST API, JDBC, ODBC or UI. The file is moved to archive every 24 hours and kept for 30 days. Retention can be configured in conf/logback.xml (restart required). You can copy the queries.json to S3 or HDFS or Az storage and promote the folder containing the 30 days of queries.json and run SQL on it. You get very rich information like queryID, querytext, username, start time, finsih time etc

Thanks
Bali

Hi,
Is there an option to log additional attributes (Records returned, Input Size, Output Size, Peak Memory consumed) in the queries.json file ?

@vrb

Currently is no option to add fields but at some point Dremio will have a system table for these queries and will have more information

@balaji.ramaswamy any updates on this “Dremio will have a system table for these queries and will have more information”?
I’m currently trying to extract metrics from Dremio usage based on my OS Deployment.

In Dremio 25 we have a new “Monitor” page that provides cluster usage metrics, like job count over time and top 10 longest running jobs.