Using external storage like S3 for storing Dremio's metadata

Hey guys,
As of now i see, Dremio is using local storage for storing metadata(metadata, catalog information) and using distributed storage for storing results, reflections, user uploads and downloads.

How can we restore metadata stored on local disk when the cluster/co-ordinator is terminated because of any failure.

Can we use external storage/data lake such as S3 to store our dremio metadata so that in case of cluster terminations we can always relaunch a new cluster without losing all the metadata Or do we have any other solution for this problem from Dremio.

Thanks,
Jalandhar

I have created some topics about storage of metadata also.

@jalandhar @koolay

Currently we do not support an external DB like MySQL

Thanks

@jalandhar good question. we recommend storing the metadata store on an EBS drive so when the EC2 either dies or is terminated, the EBS drive is still available for the next EC2 you use for Dremio. In fact, this is how we architect the deployment of our AWS Edition

1 Like

@balaji.ramaswamy

Just like that loki has removed the dependency of BoltDB in the new release. loki needs only s3 as storage now.

Thanks @jason,

This problem is solved for us as of now as Dremio is taking backups from EBS volume and is able to read that backup when ever we launch a new cluster.

Thanks,
Jalandhar

1 Like

@balaji.ramaswamy
Do you have any update the supported external storage? Could it support distributed storage? Thanks!

@Ming With the unlimited splits feature, the metadata has to go to a distributed store

@balaji.ramaswamy
Would you explain more in details?
Does metadata mean data store in local?
Does the rocksdb move to the distributed storage instead of the local driver or NAS?
What is the benefit of unlimited splits feature?

@Ming Have you gone through this to start with?

https://docs.dremio.com/software/advanced-administration/metadata-caching/#improved-metadata-refreshes-preview

Hi guys,
Can you please brief like what are the external storage sources (like S3, Postgres or MySQL etc) that can be configured as metadata store for dremio.? I understand that the EBS volume can be used, but can we use S3 as in Loki which is mentioned above.? I haven’t seen any configurations to support S3 as metadata store for dremio.

@aaasif04 S3/Azure Storage (Gen2)/GCS/HDFS/NAS are all valid options

Hi,

This storage only stores the accelerator, table, job result, download, and upload data, as i believe. Can we use S3 for metadata store.

@aaasif04

Once you are on 21.x and above, metadata folder should automatically get created under the same level as accelerator. Do you not see it? What version of Dremio are you running on? Can you please upload dremio.conf and screenshot of your support keys set