As one of the core capabilities of Dremio is to provide end users a self-service capability by way of creating Virtual Data Sets, a concern arises that these users who may not be SQL savvy may inadvertently create VDSs that are built on high cost SQL queries - even unintentional Cartesian products could result. What is Dremio’s solution, if any, to this possibility in the realm of monitoring and preempting such occurrences? Is Dremio dependent on the various underlying data sources timeout mechanisms or external monitoring tools to preempt such occurances?
Hey Lew, thanks for the question! We’re planning to address such scenarios as a part of our workload management work in our Enterprise Edition (more updates coming soon). As we release more capabilities here, admins will be able to manage this within Dremio. Over time, the idea is to provide a rule-based system where users can target queries based on operations in the query (full scan, cartesian join, no predicates, etc.) as well as run-time metrics (run-time, rows scanned, rows sorted, memory spilled/user, etc.). They can use this information to automatically reject or place queries into queues, as well as cancel active queries.