Not sure if anyone experience similar cases while using the AWS community edition Dremio.
When launching a new project it has a option to Enable Automatically Backups and I tested by doing so it will create daily EBS snapshot with all metadata and settings for the project.
But what if I want to enable such a setting for a project already launched. Is there a way/procedure to do so? I checked out this page Introducing Parallel Projects | Dremio but it seems not carrying options to enable automatically backup for existing project.
Has anyone else has similar cases and make it work, will appreciate if any guidance can be given here. Thanks!
If you enable the backups at the time of project creation, you’ll see that this flag is enabled in your dremio.conf: provisioning.coordinator.enableAutoBackups
If at later stage, you want to change this setting, all you have to do is toggle this value and restart dremio service on the coordinator. In your case, if you want to enable backups, change this to true (or add this line in the dremio.conf if its not there already):
What version of Dremio are you running on?
Also can you check in the server.log to see when you restarted dremio service, did you see any errors from the backup service? If not, do you see something like: “Scheduling auto backup…”
I checkced the logs and do see some error seems running backup
2021-09-17 13:37:55,594 [scheduler-4] INFO c.d.dac.resource.AwsBackupService - Error creating snapshot Snapshot completion timed out. null [com.dremio.dac.server.AwsConfigurator.createSnapshot(AwsConfigurator.java:1846), com.dremio.dac.resource.AwsBackupService.backupProject(AwsBackupService.java:186), com.dremio.dac.resource.AwsBackupService$AutoBackupPolicy.run(AwsBackupService.java:339), com.dremio.service.scheduler.LocalSchedulerService$CancellableTask.run(LocalSchedulerService.java:191), java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511), java.util.concurrent.FutureTask.run(FutureTask.java:266), java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180), java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293), java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149), java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624), java.lang.Thread.run(Thread.java:748)]
2021-09-17 13:37:55,595 [scheduler-4] INFO c.d.dac.resource.AwsBackupService - User Error Occurred [ErrorId: 17442745-7e1f-4fd1-844c-477f0a671d84]
com.dremio.common.exceptions.UserException: Failure while creating snapshot.
at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:804)
at com.dremio.dac.resource.AwsBackupService.backupProject(AwsBackupService.java:190)
at com.dremio.dac.resource.AwsBackupService$AutoBackupPolicy.run(AwsBackupService.java:339)
at com.dremio.service.scheduler.LocalSchedulerService$CancellableTask.run(LocalSchedulerService.java:191)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: Snapshot completion timed out.
at com.dremio.dac.server.AwsConfigurator.createSnapshot(AwsConfigurator.java:1846)
at com.dremio.dac.resource.AwsBackupService.backupProject(AwsBackupService.java:186)
... 9 common frames omitted
The snapshot is timing out since its taking more than 5 minutes from the AWS side and we have a 5 minute timeout for the snapshot. If you can somehow take a manual snapshot at first, the successive snapshots will work fine.
This issue has been resolved in 18.0 where we have increased the timeout and also provide a flag to configure it to an even larger number.