When I create a physical dataset on S3 datasource (specifically, a HIVE partitioned dataset) Dremio shows me a sample of the data and successfully creates the dataset.
However, when I re-open the dataset in SQL Editor, I encounter ‘Unable to find bucket named xxx’ error.
I am able to create dataset on another S3 bucket fine.
Both buckets have hyphens in the name.
Problem bucket has datafiles under 2 levels of folder while the working dataset has the 1 level.
Problem bucket datafiles are HIVE partitioned 3 levels (YYYY=year/MM=month/DD=day) while the working dataset is not partitioned.
Forgive me if you already checked, but is bucket ‘xxxx’ actually present in S3?
It’s possible that Dremio has stale metadata.
On the S3 source, what are top-level dataset discovery settings?
Yes, the bucket ‘xxxx’ is present in S3. I can query it fine via AWS Athena.
My ‘Edit Source : Metadata’ screen looks identical yours.
Via ‘Settings’ on the dataset, dremio is able to show data in the partitioned file (see below). I get the error when I try to query it from SQL Editor.
{“queryId”:“237ea235-d876-2350-f452-26731c2e3c00”,“schema”:"[s3, ttam-datalake-dev-d]",“queryText”:“SELECT * FROM mailgun”,“start”:1551982025451,“finish”:1551982025487,“outcome”:“FAILED”,“username”:“dremio_admin”}
Don’t know whether you got your issue squared away but I had a similar issue and this is what worked for me: Remove the S3 source (the connection, not the PDS) and readd. I know this can be a pain if you already have a lot of PDSes defined on that connection, but that did it for me. The fact that this is what cleared it up for me makes me think that the metadata is being cached on the executor but the refresh is only being executed on the master. Anyway, hope this helps.