Unable to find bucket named error when querying S3

BJ_Choi · March 7, 2019, 4:14pm

All,

When I create a physical dataset on S3 datasource (specifically, a HIVE partitioned dataset) Dremio shows me a sample of the data and successfully creates the dataset.

However, when I re-open the dataset in SQL Editor, I encounter ‘Unable to find bucket named xxx’ error.

I am able to create dataset on another S3 bucket fine.

Both buckets have hyphens in the name.
Problem bucket has datafiles under 2 levels of folder while the working dataset has the 1 level.
Problem bucket datafiles are HIVE partitioned 3 levels (YYYY=year/MM=month/DD=day) while the working dataset is not partitioned.

Any glues?

thanks
BJ

ben · March 7, 2019, 5:47pm

Hello @BJ_Choi,

Forgive me if you already checked, but is bucket ‘xxxx’ actually present in S3?
It’s possible that Dremio has stale metadata.
On the S3 source, what are top-level dataset discovery settings?

BJ_Choi · March 7, 2019, 6:06pm

Hi, Ben

Yes, the bucket ‘xxxx’ is present in S3. I can query it fine via AWS Athena.

My ‘Edit Source : Metadata’ screen looks identical yours.

Via ‘Settings’ on the dataset, dremio is able to show data in the partitioned file (see below). I get the error when I try to query it from SQL Editor.

ben · March 7, 2019, 6:12pm

Though the preview fails, there may be a job associated with this; can you supply the profile if it’s present?

I’d like to see the complete error stack.

BJ_Choi · March 7, 2019, 6:36pm

Thank, Ben. How do I grab the profile?

ben · March 7, 2019, 6:46pm

If the error can’t be found there, it may be in the logs, in particular server.log
https://docs.dremio.com/advanced-administration/log-files.html

BJ_Choi · March 7, 2019, 6:55pm

downloaded profile is attached.

In the server queryies.json log I see this entry

{“queryId”:“237ea235-d876-2350-f452-26731c2e3c00”,“schema”:"[s3, ttam-datalake-dev-d]",“queryText”:“SELECT * FROM mailgun”,“start”:1551982025451,“finish”:1551982025487,“outcome”:“FAILED”,“username”:“dremio_admin”}

d2093d0e-627a-47d9-85a1-04acbf6daea5.zip (18.0 KB)

ben · March 7, 2019, 10:33pm

It looks like the error reports a single file within one of the partitions:

/ttam-datalake-dev-d/mailgun/api_events/YYYY=2017/MM=08/DD=05/part-00071-93cbe7d6-2c7a-4be4-ba52-e1b372c43acb.c000.snappy.parquet

This could be absent or there might be permission issues.

First let’s try running a refresh on that source in Dremio:
ALTER PDS s3."ttam-datalake-dev-d".mailgun REFRESH METADATA

BJ_Choi · March 7, 2019, 11:01pm

Hi, Ben

I was able to execute the refresh command successfully, but the error didn’t go away.

So, having only 1 file per partition is a problem?

thanks
BJ

cholden · April 20, 2020, 4:39pm

Ben,

Don’t know whether you got your issue squared away but I had a similar issue and this is what worked for me: Remove the S3 source (the connection, not the PDS) and readd. I know this can be a pain if you already have a lot of PDSes defined on that connection, but that did it for me. The fact that this is what cleared it up for me makes me think that the metadata is being cached on the executor but the refresh is only being executed on the master. Anyway, hope this helps.

Thanks,
Chip

Topic		Replies	Views
Iceberg Dataset not found	3	1366	August 17, 2022
S3 source table not found	1	983	December 20, 2021
S3 table not found Dremio University	1	1390	February 14, 2022
Couldn’t do query on a physical dataset with special character in the table name	5	1289	March 19, 2021
Dremio 14.0.0 fails to detect file in S3 when subfolders start with underscore "_"	4	1145	March 17, 2021

Unable to find bucket named error when querying S3

Related topics