Error creating s3 reflection

Looks like the acceleration failed to create with this message:

“Failure while attempting to read metadata for “__accelerator”.“591c069b-f248-4443-bffb-73d14a6e3532”.“4d840196-9145-4e58-8c86-3a2713fa17e3”.”

ca118f02-b684-4355-8088-321a1ca8729c.zip (18.5 KB)

Is this some sort of S3 permission issue or something? I feel like maybe this is a config issue…

Hi @jhaynie

Can you please try the below? In your core-site.xml under the Dremio conf folder on all your executors make the following entry

<property> <name>fs.s3a.connection.maximum</name> <value>5000</value> </property>

OK that seems to now allow it to work correctly.

677f57f7-b890-4bbe-bf86-f7043ade39a0.zip (11.3 KB)

The reflection now works but the reflected query is a lot slower than the original query. It’s a really small set of files. It takes like 6s to run this really small query.

Any ideas?

Hi @jhaynie

If you see your query is taking ~7.7s to complete but out of which 6.8s is on wait time for your Parquet row group scan

Thread Setup Time Process Time Wait Time Max Batches Max Records Peak Memory
00-00-07 0.039s 0.047s 6.841s 2 26 8MB

Are your reflections on S3? are the sizes of the reflection files small? This is a know issue on cloud sources like ADLS and S3. Just to see the speed of reflections would it be possible to store the reflections locally?

Thanks
@balaji.ramaswamy

yes, this is a small test data set so they are super tiny. like 2M file since there aren’t many records at all in the current data set.

if we store locally, is there any special thing we need to do in a 6 node cluster? in other words, do we need to use EFS to mount or will the coordinator somehow distribute the reflections? how can we store them locally and still use the cluster?

Dremio will distribute the Data Reflections across the local storage of your nodes, see: https://docs.dremio.com/deployment/dremio-config.html#distributed-storage