Reading data from S3 failing after upgrade to 1.2.2

Hi, guys!

Just upgraded my Dremio cluster from 1.1.0 to 1.2.2 and now I’m getting this error for every data that I try to list, even when I go to the Node Activity screen:

VALIDATION ERROR: Failed to create workspaces for buckets owned by the account.

I’m persisting data on AWS S3. I’ve double checked the core-site.xml and my access key looks fine (is active).

Edit: Just tried to update the key and nothing :pensive:

Where goes the profile:

418de095-df84-4bb6-b90a-928c05e9d4d7.zip (3.4 KB)

I just tried something: Removed some S3 source and updated it with those new credentials that I generated before. Now I can query data from S3 buckets, but from other sources, like MySQL and ElasticSearch, I’m getting other error:

java.nio.file.AccessDeniedException: /var/lib/dremio/db/search/dac-namespace/core/_2s9.cfe

Should I erase all the reflections from the S3 reflection layer after an upgrade?

b8a26626-fa9b-482a-b06f-57d4a1815a02.zip (6.5 KB)

Hi @allan.sene

We have had a similar issue at one of our customer’s and it was regarding the way the software was installed/upgraded.

Can you please share how you upgraded from 1.1.0 to 1.2.2?

Can you do a ps -ef | grep dremio on your co-ordinator box too?

Thanks,
@balaji.ramaswamy

Hi @balaji.ramaswamy

I followed exactly what this doc says: https://docs.dremio.com/advanced-administration/upgrade/rpm.html

Then:

  1. Generated a new AIM access/secret for S3
  2. Updated the core-site.xml with it and restarted the cluster (didn’t work)
  3. Updated one data source, that consumes from S3 (error changed)

Funny thing is that I can query my files on S3. Just the others sources don’t work anymore.

ps -ef from master/executor:

[ec2-user@dremio-master-01 ~]$ ps -ef | grep dremio
dremio    68237      1  0 Nov08 ?        00:00:00 bash /opt/dremio/bin/dremio internal_start dremio
dremio    68311  68237  0 Nov08 ?        00:03:55 /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.151.x86_64/jre/bin/java -Djava.util.logging.config.class=org.slf4j.bridge.SLF4JBridgeHandler -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/dremio/server.gc -Ddremio.log.path=/var/log/dremio -Xmx4096m -XX:MaxDirectMemorySize=8192m -XX:MaxPermSize=512m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/dremio -cp /etc/dremio:/opt/dremio/jars/*:/opt/dremio/jars/ext/*:/opt/dremio/jars/3rdparty/* com.dremio.dac.daemon.DremioDaemon dremio start
ec2-user  72093  72064  0 16:06 pts/0    00:00:00 grep --color=auto dremio

from executor-only:

[ec2-user@dremio-executor-01 ~]$ ps -ef | grep dremio
dremio    64322      1  0 Nov08 ?        00:00:00 bash /opt/dremio/bin/dremio internal_start dremio
dremio    64396  64322  0 Nov08 ?        00:01:25 /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.151.x86_64/jre/bin/java -Djava.util.logging.config.class=org.slf4j.bridge.SLF4JBridgeHandler -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/dremio/server.gc -Ddremio.log.path=/var/log/dremio -Xmx4096m -XX:MaxDirectMemorySize=8192m -XX:MaxPermSize=512m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/dremio -cp /etc/dremio:/opt/dremio/jars/*:/opt/dremio/jars/ext/*:/opt/dremio/jars/3rdparty/* com.dremio.dac.daemon.DremioDaemon dremio start
ec2-user  66500  66477  0 16:08 pts/0    00:00:00 grep --color=auto dremio

Hi @allan.sene

Do you get the same error on other sources now? Can you please send me a profile from a failed job?

Thanks,
@balaji.ramaswamy

I suppose that my problem is with the executor node. For some reason, when I turn off it, everything returns back to normal. :thinking:

Just checked all the configs and everything seems ok: dremio.conf and core-site.xml are the same on both nodes - except the service.coordinator.enable variable, of course.

I rebooted both 2 machines and when the executor-only is up, or the query hangs and keeps on running forever or it fails. When it hangs, even when I try to cancel their by the UI, nothing happens:

master-mysql-failure.zip (6.5 KB)

Hi @allan.sene

Apologies for the delay in responding,

Just wanted to find out if everything is working as expected now? Are you able to use Dremio with a separate node for the co-ordinator and a separate node for the executor?

Thanks,
@balaji.ramaswamy

Hi @balaji.ramaswamy

Unfortunately, we are dropping Dremio for now and looking for another alternative to our scenario. We need to put something stable in production really fast and the experience with Dremio on AWS is not so great for now.

I really appreciate your assistance and hope that we can try the platform later when it becomes more stable.

Thanks man :slight_smile:

Hey @allan.sene, we’re very sorry to hear that. We’re actively working on improving our users’ experience working with Dremio and would love to incorporate your feedback and concerns into the process. If you are up for it, we’d like to do a deep dive session and talk through your experience — I’ll reach out via DM. It would be really valuable to understand what went wrong and what we could do better to support you and others who might ran into similar issues going forward.

Thanks,
Can