Running Dremio OSS on AWS Lambda


We are working on a project that aims at making query engines serverless by running them on AWS Lambda. We have already integrated 7 on them, among which Dremio OSS. We publish a benchmark that summarizes the execution times on a very simple query: cloudfuse - Standalone engines. The focus here is more on the setup times (cold and warm) than the query execution itself. For now we are running the engines in standalone mode, but we are getting started with the distributed mode.

Currently, the setup of Dremio is performed by a script that uses the REST API to create the initial user and necessary resources. This weighs down the cold start. If some of you have further ideas on how to make the start time faster, contributions are more than welcome, the infrastructure repository for the benchmark is open source: GitHub - cloudfuse-io/lambdatization: Run query engines in Cloud Functions