I have an error when running a query to any table pointing to parquet files in MinIO. I get the error “IllegalStateException: Invalid AWSCredentialsProvider provided:”.
I have other tables pointing to CSV files in the same bucket in MinIO and I have no issues with those. I can even preview the data in the parquet file when creating the dataset. The error is only when running the SELECT. Any advice?
Eventhough the error seems to be related to AWS credentials, I wonder why I have no issues accessing other CSV-based datasets in the same minio and I can even preview the data. This happens to only my Parquet-based datasets in minio.
@moy8011 Welcome to Dremio Community.
My 2 cents → How are you configuring the MinIO credentials and settings in Dremio? The reason I’m asking is that Preview and even CSV scans are sometimes done from a single node. While S3 scans are typically distributed in nature, so this may have something to do with configuration mismatches between the executors in your cluster. The same core-site.xml need to be present in all nodes.
We have deployed Dremio with helm in a kubernetes cluster, so the core-site.xml is replicated to all the nodes as a configmap.
We put a trace on the MinIO bucket and noticed that when querying parquet files, the search path is incorrect, looks like Dremio is repeating the path:
when the same dataset is being created, during the preview, the trace shows correctly:
I tried changing the context and trying different ways on the select statement but no luck.
Any idea why it’s repeating the path?
Can we get a sense of what are you putting into core-site.xml (and why) and what are all the non-default values in S3 source config (besides compatibility flag)?
Looks like this:
!-- If you are editing any content in this file, please remove lines with double curly braces around them –
!-- S3 Configuration Section –
name fs.dremioS3.impl /name
description The FileSystem implementation. Must be set to com.dremio.plugins.s3.store.S3FileSystem /description
value com.dremio.plugins.s3.store.S3FileSystem /value
name fs.s3a.access.key /name
description AWS access key ID. /description
value XXXXXXXXXXXXXXXX /value
name fs.s3a.secret.key /name
description AWS secret key. /description
value XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX /value
name fs.s3a.endpoint /name
value minio.tenant.xxx.xxxxxx.local /value
name fs.s3a.path.style.access /name
value true /value
name dremio.s3.compat /name
value true /value
In the S3 source, I’m setting these:
Also, enabling compatibility mode and choosing PARQUET as default CTAS Format
I followed this documentation: Configuring Amazon S3 for MinIO | Dremio Documentation
What are the values (in each)?
It’s our minio endpoint where we have the CSVs and the parquet files
Are you using slashes for table names during table creation or query?
Nope. I even tried removing the file extension to avoid quotes. I also tried creating the dataset on the folder rather than on the file and it’s the same behavior.
- Looking at the traces you provided, could look at all your configurations to find out where is the term
minio1 coming from? It’s odd that this specific term gets inserted in on the HeadObject call.
- Remove the
fs.s3a.endpoint.region from source settings. It shouldn’t be needed.
- What happens when you explicitly set the credentials provider in Source settings? i.e.
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider. Along with using “AWS Access Key” mode of authentication.
Also a full stack trace of the error would help too. You can get this from the logs, or maybe also from the Job Profile of the query that failed → Raw Profile → Error tab