Support for DynamoDB

How can I get Dremio to query Dynamo DB. Back in 2020, I saw a blog post stating we can query Dynamo DB (Accelerating Queries with Dremio’s DynamoDB ARP Connector | Dremio)
Is there any plan for Dremio to natively support DynamoDB ?

@rajupillai

Currently DynamoDB is only available via an ARP connector as outlined in the blog

@rajupillai After messing around with the ARP connector plugged into the Simba jdbc driver for a while ( on aws dremio v13.2.0 ), all I can say is it’s only useful in extremely limited circumstances currently.

Here are some of the issues I’ve had to deal with:

  1. The latest compiled plugin is for v6.0.0 - I had to compile against the latest v13 dremio-oss which is v13.1.0
  2. Metadata is a real pain for any large datasets. This is a real problem if you also have low read capacity set for archived data. The jdbc driver requires you to use a Schema Editor to read in a sample of all your tables and save it to either a metadata table or a json file ( again there is no good documentation for this process ). The schema editor can take hours to read large datasets ( with read capacity limits in place )
  3. The dremio driver has no way to configure a metadata json, so I had to modify and add config options for the ARP connector jdbc connect url to allow for setting a path to my metadata files and plus some other driver settings
  4. Once I had configured the above, I was able to browse my tables and preview data, however I could not run queries or add reflections. This is because as there is no native support you have to make sure you setup the executor bootstrap to copy the plugins onto any instances that spin up ( there is no mention of this in the blog/github install info )
  5. After fixing the above I was able to start building a reflection, but I would find that something would would hang overnight, and I would need to restart dremio in order to get previews again. The reflections would be marked as failed and unusable.
  6. It does stuff like this in background: “INFO c.d.e.store.jdbc.JdbcDatasetMetadata - Took 26629032 ms to get row count for [DynamoDB”… - probably not advisable on tables with billions of rows.
  7. You have to pay for the simba jdbc driver license ( and they don’t answer support queries during the evaluation period )

Altogether an extremely painful experience with no satisfactory conclusion as yet

As a side note, I see similar issues with MySQL integration when dealing with billion row tables ( where it attempts to build reflections with a rather naive “SELECT * FROM {TABLENAME}”

It looks like we will have to resort to exporting to S3, which is a real shame