We’re rolling out Dremio internally and want to measure utilization. The target metric is Jobs per month. Ideally, we could filter using the same criteria in the Jobs UI (start time, status, user, etc).
What’s the best way to collect this information? There are no summary stats in the Jobs UI (that I can see), and the REST endpoint can only pull jobs by specific jobId.
Note: This article states there is a
sys.job_result table, but I am not seeing that when I
SHOW TABLES IN SYS. If this does exist, I may be able to use it to group results …
Dremio writes a file called queries.json to the coordinator log folder, times are in UTC. It records every job that hits Dremio, be it REST API, JDBC, ODBC or UI. The file is moved to archive every 24 hours and kept for 30 days. Retention can be configured in conf/logback.xml (restart required). You can copy the queries.json to S3 or HDFS or Az storage and promote the folder containing the 30 days of queries.json and run SQL on it. You get very rich information like queryID, querytext, username, start time, finsih time etc
Is there an option to log additional attributes (Records returned, Input Size, Output Size, Peak Memory consumed) in the queries.json file ?
Currently is no option to add fields but at some point Dremio will have a system table for these queries and will have more information