Error upgrading v25.0.3 to v25.0.5

Hi @balaji.ramaswamy @ben

I’m trying to update my current version of Dremio (25.0.3, AWS Edition, Enterprise) to the new version 25.0.5, but when I try to open the project, I receive the following error:
Failure while starting services. com.dremio.services.credentials.SecretCredentialsException: Cannot decode token.

After failing to update, I tried restoring a backup of the project in version 25.0.3 again, but the same error is occurring.

Full error trace in Open Project window:

Command Execution Log
Attaching the Project
Attaching EBS to EC2
Completed EBS attachment to EC2
Attaching EFS to EC2
Completed EFS mounting
Completed EFS attachment to EC2
Starting Services
Failed to start Services. Failure while starting services. com.dremio.services.credentials.SecretCredentialsException: Cannot decode token

Can you help, please? I have my environment down.

7 Likes

Hi @joelhansen

I think at this point, the EC2 instance to which the project EBS is going to attach will be up and running (though, obviously Dremio itself has not started).

Can you look for the /var/lib/dremio/security? I’m wondering if you have encrypted some secret with dremio-admin encrypt and this is somehow causing the problem,

Also, there may be a more verbose log message in the server.log in /var/dremio_efs/log/coordinator

5 Likes

Hi @ben !

I can assure you that we have never ran the command dremio-admin encrypt.
Inside the folder /var/lib/dremio there are only two other folders, as the photo below shows:
image

Here are the logs from /var/log/dremio, not from /var/dremio_efs/log, because by the time it fails, both the EFS and EBS services disconnect.



As shown in the pictures above, the same error occurs as it tries to get the credentials for different services (S3, AWS Glue, MongoDB, MySQL), but the interesting thing is that is does not happen for all of our resources, only those.

6 Likes

@joelhansen in these screenshots of the logs, I do not see the Dremio app actually exiting. Does it come online, but with some of the sources unavailable?

7 Likes

No, @ben, the Dremio app never comes online. When we open the main page, it shows us the “Open Project” page, instead of the login page.

7 Likes

Hi @ben

We created a new stack in CloudFormation on AWS with version 25.0.7, and tried to open our project from version 25.0.3 (both opening the latest version and a automated backup), but the same error continues to occur when starting the services:

1 Like

@joelhansen

The problem is that, if you did not backup the security folder yourself, the EBS snapshotting service did not either.

Do you have a any working Dremio v25.0.3 instance up and running? If so, do not delete any instance that has the security folder. Copy it to the EBS volume.

25.0.7 does not revers the problem, it just prevents it from happening in the first place, by actually backing up the security folder through the background EBS snapshotting.

No @ben , we do not have any instances running on 25.0.3.

As we spoke to you on a Zoom call on the 12th, we tried to update our version 25.0.3 and were unable to do so. Then you helped us try in some ways and none of them were possible.

We recreated some basic things from scratch, but we want to be able to restore our official database to extract our views to replicate in the new installation.
We don’t need to be able to recreate sources, users, reflections. We can redo all of this, but we need to have the SQL codes that formed our views in that version 25.0.3.
If it is not possible to restore the complete project, is there a way to just restore/extract the definition from the views?

3 Likes

@joelhansen and team were able to resolve this issue.

The following description of the solution is to help Dremio community members if they ever do encounter this problem (hopefully very unlikely, at this point).

During startup, Dremio checks all the configured data sources and makes sure it can connect to them. Losing the security folder with the encryption keys for the source credentials should not “normally” cause the application to exit during this check, except if you have a MongoDB source. For reasons that are too complicated to go into here, checking the MongoDB connection causes the whole application to exit when it cannot decrypt the source credentials.

You can work around this by temporarily removing the jars related to MongoDB from the software distribution:

  • dremio-mongo-plugin-<dremio version>.jar
  • dremio-ce-mongo-plugin-<dremio version>.jar

You will be able to start up Dremio and have a “dead” Mongo source which you will delete. You will be able to recover your Dremio catalog by re-entering credentials for sources and making sure the security folder is saved next time you terminate your EC2 instance (in newer AWSE versions of Dremio 25, the folder is backed up properly to the project EBS volume). You can then re-add the Mongo jars and restart Dremio and reconfigure the Mongo source. If you give it the same name, there will few issues with VDS/views that reference collections (datasets) in that source.

Hi @ben

Thank you very much for the detailed explanation and for you and your team assisting us in resolving this issue. Our situation was exactly like this, we had a MongoDB source, and with your support, we were able to get our instance up and running again.