S3 (minio) errors while getting data from reflection

Hello,

Dremio Build 4.1.8-202003120636020140-9c2a6b13 Community Edition

I’ve connected Dremio with private cloud S3 storage, so it is not AWS, but rather minio compatible. I.e. I’ve used settings for minio to configure shared storage for reflections.

I have source file as bzipped CSV which is also located on the same S3 storage. Next I created type conversion and reflection on it. Next I am trying to make queries against it.

while count(*) is working fine, select distinct vendor_id results in error :frowning:

Please check full Java stacktrace here: https://gist.github.com/deem0n/9f6e3239ebfb4874911b0ff923378464

Most suspicious lines look like this:

Failed request for bucket dremio, path /distributed-storage/accelerator/216b0949-f3b0-4ac5-a024-fa4a5d432fea/70f0ef0e-da92-44ed-b1c8-2d20a8923a24_0/0_0_8.parquet for bytes=267386880-268435455

I have 33 parquet files and though 33 errors. All errors about file tails and it seems dremio tries to get data which is not present, i.e. reading data after file ends. For example, file size for 0_0_8.parquet file is really 267.9 MB (267,905,381 bytes)

Another example:

2020-04-06 08:23:31,584 [s3-read-209] ERROR c.d.p.s.s.S3AsyncByteReaderUsingSyncClient - [e0 - 217518fe-1e45-b694-6ebc-418be5575100:frag:2:5] Failed request for bucket dremio, path /distributed-storage/accelerator/216b0949-f3b0-4ac5-a024-fa4a5d432fea/70f0ef0e-da92-44ed-b1c8-2d20a8923a24_0/0_0_12.parquet for bytes=267386880-268435455, took 568 ms

Real file size: 267.6 MB (267,564,957 bytes)

Error message in UI is:

software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: dremio.s3.gagalabs.ru

Is it bug in dremio or is it unsupported S3 version used as file storage?
How can we fix it?

Thanks

@deem0n,

As a workaround, have you tried disabling the reflection on the CSV and re-enabling it? If all the errors complain about the same reflection/materialization (216b0949-f3b0-4ac5-a024-fa4a5d432fea/70f0ef0e-da92-44ed-b1c8-2d20a8923a24), starting a fresh build of the reflection with a new ID and new files may address the issue.

Also, further down in the error stack I see:
Caused by: java.net.UnknownHostException: dremio.s3.gagalabs.ru

Is this the Minio server host?

Yes, I’ve tried to rebuild reflection couple of times, same problem (just new GUID in filenames)

Ref S3 server. s3.gagalabs.ru is supposed to be a host name, dremio is a bucket name. It seems S3 protocol works with DNS somehow to combine long DNS names like bucket.h.o.s.t, but I believe network connectivity is fine, as I see no problems with reflections generation .

Please note that I changed real host name in the logs to some fake one, so you can not ping s3.gagalabs.ru

Also I am not sure it is really minio at the S3 server, but can clarify with the client.

Checked with S3 provider, they confirmed that range requests was not working properly on their side. So it is not a Dremio bug. Thanks to all who tried to help!