Hello,
Dremio Build 4.1.8-202003120636020140-9c2a6b13 Community Edition
I’ve connected Dremio with private cloud S3 storage, so it is not AWS, but rather minio compatible. I.e. I’ve used settings for minio to configure shared storage for reflections.
I have source file as bzipped CSV which is also located on the same S3 storage. Next I created type conversion and reflection on it. Next I am trying to make queries against it.
while count(*) is working fine, select distinct vendor_id results in error
Please check full Java stacktrace here: dremio.stacktrace.txt · GitHub
Most suspicious lines look like this:
Failed request for bucket dremio, path /distributed-storage/accelerator/216b0949-f3b0-4ac5-a024-fa4a5d432fea/70f0ef0e-da92-44ed-b1c8-2d20a8923a24_0/0_0_8.parquet for bytes=267386880-268435455
I have 33 parquet files and though 33 errors. All errors about file tails and it seems dremio tries to get data which is not present, i.e. reading data after file ends. For example, file size for 0_0_8.parquet file is really 267.9 MB (267,905,381 bytes)
Another example:
2020-04-06 08:23:31,584 [s3-read-209] ERROR c.d.p.s.s.S3AsyncByteReaderUsingSyncClient - [e0 - 217518fe-1e45-b694-6ebc-418be5575100:frag:2:5] Failed request for bucket dremio, path /distributed-storage/accelerator/216b0949-f3b0-4ac5-a024-fa4a5d432fea/70f0ef0e-da92-44ed-b1c8-2d20a8923a24_0/0_0_12.parquet for bytes=267386880-268435455, took 568 ms
Real file size: 267.6 MB (267,564,957 bytes)
Error message in UI is:
software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: dremio.s3.gagalabs.ru
Is it bug in dremio or is it unsupported S3 version used as file storage?
How can we fix it?
Thanks