Issues with Dremio 24.2.2 on AKS: Configuring ADLS for Dist Storage

We are currently using Dremio version 24.2.2 on AKS (Kubernetes version 1.27.3). In our deployment, we followed the instructions outlined in the Dremio documentation(Configuring Distributed Storage | Dremio Documentation) to set up distributed storage with ADLS. We attempted both access key and OAuth2 for storage authentication, but encountered issues with both methods. The detailed error message is provided below.

     IO_EXCEPTION ERROR: Unable to load key provider class.

SqlOperatorImpl ICEBERG_SUB_SCAN
Location 0:0:10
SqlOperatorImpl ICEBERG_SUB_SCAN
Location 0:0:10
Fragment 0:0

[Error Id: afd0c0cb-5077-4fa9-b5f4-85456e10917f on dremio-executor-dremio-qasprint-1-0.dremio-cluster-pod-dremio-qasprint-1.qa.svc.cluster.local:0]

  (org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException) Unable to load key provider class.
    org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getTokenProvider():526
    org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient():878
    org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>():151
    org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize():106
    org.apache.hadoop.fs.FileSystem.createFileSystem():3469
    org.apache.hadoop.fs.FileSystem.get():537
    com.dremio.exec.store.dfs.DremioFileSystemCache.get():69
    com.dremio.plugins.azure.AzureStorageFileSystem$ContainerCreatorImpl$FileSystemSupplierImpl.create():266
    com.dremio.plugins.util.ContainerFileSystem$FileSystemSupplier.get():245
    com.dremio.plugins.util.ContainerFileSystem$ContainerHolder.fs():203
    com.dremio.plugins.util.ContainerFileSystem.getFileStatus():493
    com.dremio.exec.hadoop.HadoopFileSystem.getFileAttributes():258
    com.dremio.exec.store.hive.exec.DremioFileSystem.getFileStatus():365
    com.dremio.exec.store.hive.exec.dfs.DremioHadoopFileSystemWrapper.getFileAttributes():239
    com.dremio.io.file.FilterFileSystem.getFileAttributes():77
    com.dremio.exec.store.dfs.LoggedFileSystem.getFileAttributes():113
    com.dremio.exec.store.iceberg.DremioFileIO.newInputFile():78
    org.apache.iceberg.TableMetadataParser.read():266
    com.dremio.exec.store.iceberg.IcebergUtils.loadTableMetadata():1331
    com.dremio.exec.store.iceberg.IcebergManifestListRecordReader.setup():136
    com.dremio.sabot.op.scan.ScanOperator.setupReaderAsCorrectUser():348
    com.dremio.sabot.op.scan.ScanOperator.setupReader():339
    com.dremio.sabot.op.scan.ScanOperator.setup():303
    com.dremio.sabot.driver.SmartOp$SmartProducer.setup():595
    com.dremio.sabot.driver.Pipe$SetupVisitor.visitProducer():80
    com.dremio.sabot.driver.Pipe$SetupVisitor.visitProducer():64
    com.dremio.sabot.driver.SmartOp$SmartProducer.accept():565
    com.dremio.sabot.driver.StraightPipe.setup():102
    com.dremio.sabot.driver.StraightPipe.setup():102
    com.dremio.sabot.driver.StraightPipe.setup():102
    com.dremio.sabot.driver.StraightPipe.setup():102
    com.dremio.sabot.driver.StraightPipe.setup():102
    com.dremio.sabot.driver.StraightPipe.setup():102
    com.dremio.sabot.driver.StraightPipe.setup():102
    com.dremio.sabot.driver.StraightPipe.setup():102
    com.dremio.sabot.driver.StraightPipe.setup():102
    com.dremio.sabot.driver.StraightPipe.setup():102
    com.dremio.sabot.driver.Pipeline.setup():71
    com.dremio.sabot.exec.fragment.FragmentExecutor.setupExecution():621
    com.dremio.sabot.exec.fragment.FragmentExecutor.run():443
    com.dremio.sabot.exec.fragment.FragmentExecutor.access$1700():108
    com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():1007
    com.dremio.sabot.task.AsyncTaskWrapper.run():122
    com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():249
    com.dremio.sabot.task.slicing.SlicingThread.run():171

Interestingly, when we tested the same configuration with version 24.1, we did not encounter any problems.

Any insights, thoughts, or suggestions regarding this issue would be highly appreciated.

Thanks,
Ratheesh

Hi @Ratheesh,

In 24.2 the Hive 2 dependent plugins were upgraded to Hive 3. Was this environment upgrade to 24.2, or is it a new install? If it was upgraded, did you follow the upgrade notes?

Thanks, Bogdan

Hi @bogdan.coman

Thanks for your response.
We have tried both upgrade and Fresh install scenarios and got the same error in both cases. We are running Hivemetastore 3.1.3.

Thanks,
Ratheesh

Hi @Ratheesh,

Is this error reported for the dremio-executor-dremio-qasprint-1-0 executor all the time, or are there other executors causing it?
Can you attach a copy of the dremio.conf that you use in both, 24.2 and 24.1?
Does the server.log report anything in regards to ADLS connectivity issues?

Thanks, Bogdan

Hi @bogdan.coman

The same dremio.conf file was used across both versions, and no modifications were made to this configuration file. The issue has since been resolved. I downgraded the dremio-azure-storage-plugin, and it is now functioning properly with dremio-azure-storage-plugin-24.1.0-202306130653310132-d30779f6.jar.

dremio.conf

paths: {
  # Local path for dremio to store data.
  local: ${DREMIO_HOME}"/data"
  # local: "dremioS3:///{{ .Values.S3_DATALAKE_BUCKET_NAME }}/dremio-master-pvc/"
  # Distributed path Dremio data including job results, downloads,
  # uploads, etc
  results: "pdfs://"${paths.local}"/results"
  dist: "dremioAzureStorage://:///dremio-data/dremio-qa/dremio-data/"
}

services: {
  # The services running are controlled via command line options passed in
  # while starting the services via kubernetes. Updating the values listed below will not
  # impact what is running:
  # - coordinator.enabled
  # - coordinator.master.enabled
  # - coordinator.master.embedded-zookeeper.enabled
  # - executor.enabled
  #
  # Other service parameters can be customized via this file.
  executor: {
    cache: {
      path.db: "/opt/dremio/cloudcache/c0"
      pctquota.db: 100

      path.fs: ["/opt/dremio/cloudcache/c0"]
      pctquota.fs: [100]
      ensurefreespace.fs: [0]

    }
  }
}
debug: {
  # Enable caching for distributed storage, it is turned off by default
  dist.caching.enabled: true,
  # Max percent of total available cache space to use when possible for distributed storage
  dist.max.cache.space.percent: 100
}

services.web-admin.host: "0.0.0.0"
services.web-admin.port: 9078