S3 (minio) errors while getting data from reflection

deem0n · April 6, 2020, 9:03am

Hello,

Dremio Build 4.1.8-202003120636020140-9c2a6b13 Community Edition

I’ve connected Dremio with private cloud S3 storage, so it is not AWS, but rather minio compatible. I.e. I’ve used settings for minio to configure shared storage for reflections.

I have source file as bzipped CSV which is also located on the same S3 storage. Next I created type conversion and reflection on it. Next I am trying to make queries against it.

while count(*) is working fine, select distinct vendor_id results in error

Please check full Java stacktrace here: dremio.stacktrace.txt · GitHub

Most suspicious lines look like this:

Failed request for bucket dremio, path /distributed-storage/accelerator/216b0949-f3b0-4ac5-a024-fa4a5d432fea/70f0ef0e-da92-44ed-b1c8-2d20a8923a24_0/0_0_8.parquet for bytes=267386880-268435455

I have 33 parquet files and though 33 errors. All errors about file tails and it seems dremio tries to get data which is not present, i.e. reading data after file ends. For example, file size for 0_0_8.parquet file is really 267.9 MB (267,905,381 bytes)

Another example:

2020-04-06 08:23:31,584 [s3-read-209] ERROR c.d.p.s.s.S3AsyncByteReaderUsingSyncClient - [e0 - 217518fe-1e45-b694-6ebc-418be5575100:frag:2:5] Failed request for bucket dremio, path /distributed-storage/accelerator/216b0949-f3b0-4ac5-a024-fa4a5d432fea/70f0ef0e-da92-44ed-b1c8-2d20a8923a24_0/0_0_12.parquet for bytes=267386880-268435455, took 568 ms

Real file size: 267.6 MB (267,564,957 bytes)

Error message in UI is:

software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: dremio.s3.gagalabs.ru

Is it bug in dremio or is it unsupported S3 version used as file storage?
How can we fix it?

Thanks

ben · April 6, 2020, 4:03pm

@deem0n,

As a workaround, have you tried disabling the reflection on the CSV and re-enabling it? If all the errors complain about the same reflection/materialization (216b0949-f3b0-4ac5-a024-fa4a5d432fea/70f0ef0e-da92-44ed-b1c8-2d20a8923a24), starting a fresh build of the reflection with a new ID and new files may address the issue.

Also, further down in the error stack I see:
Caused by: java.net.UnknownHostException: dremio.s3.gagalabs.ru

Is this the Minio server host?

deem0n · April 6, 2020, 4:38pm

Yes, I’ve tried to rebuild reflection couple of times, same problem (just new GUID in filenames)

Ref S3 server. s3.gagalabs.ru is supposed to be a host name, dremio is a bucket name. It seems S3 protocol works with DNS somehow to combine long DNS names like bucket.h.o.s.t, but I believe network connectivity is fine, as I see no problems with reflections generation .

Please note that I changed real host name in the logs to some fake one, so you can not ping s3.gagalabs.ru

Also I am not sure it is really minio at the S3 server, but can clarify with the client.

deem0n · April 8, 2020, 8:55am

Checked with S3 provider, they confirmed that range requests was not working properly on their side. So it is not a Dremio bug. Thanks to all who tried to help!

Topic		Replies	Views
Can not read data from reflection [S3 compatibility]	1	2031	May 20, 2020
S3 (Compat Mode) errors while getting data from reflection	10	2883	January 24, 2023
Dremio 18.1 create reflection Unknown format (pdfs) conversion for path /opt/dremio/data/pdfs/accelerator/	12	2770	May 25, 2022
Dremio 23 not match any reflections with MINIOS3	31	2570	February 16, 2023
Reflections on S3 - no results when hitting "run" Dremio University	34	3283	May 23, 2023

S3 (minio) errors while getting data from reflection

Related topics