Dremio 14.0.0 fails to detect file in S3 when subfolders start with underscore "_"

I have an S3 bucket which uses hive partitioning key naming schema where partitioning prefixes all start with an underscore in order to avoid clashes with actual data columns.

Given this example:

  • I have files within the same bucket “my-bucket”:

    s3://my-bucket/data-source/_partition=part1/_date=2021-03-04/file1.json
    s3://my-bucket/data-source/_partition=part2/_date=2021-03-04/file2.json

  • When I now navigate to the s3://my-bucket and promote data-source to a physical dataset Dremio gives me an error:

No files were found.

With 13.1.0 it worked but after upgrading to 14.0.0 it stopped working.

In the release notes for version 3.0 I discovered a fixed issue which says:

Format previews did not work when a directory has ‘hidden’ files (files starting with an underscore in the file name).
Resolved by ignoring period and underscores in files when performing format previews.

Do you think these might be related?
Does anybody else face the same problem and/or know a solution to this problem?

1 Like

@triduong.tran

That is right, Dremio currently ignores files and.folders starting with an underscore

Thanks
Bali

@balaji.ramaswamy I’m still confused as the patch note states that only filenames starting with an underscore are ignored but not prefixes/folders on s3 like shown in the example. To me this behaviour is unexpected and should be fixed as the filenames do not have underscores as a starting character.

@balaji.ramaswamy would you mind elaborating your view on this?

@triduong.tran

Let me check on this internally and get back to you