View to dereference Iceberg hash-addressed blobs stored in S3

Hello.

Dremio has a field size limit of 32K, which can be overridden with limits.single_field_size_bytes, however I understand this limit exists for stability and performance reasons. In any case, best practice suggests very large fields should probably be placed elsewhere, such as S3, and a reference to the data put in the table.

Suppose I have Iceberg data on Minio S3 at s3://my-bucket/iceberg and large data fields in s3://my-bucket/objects. I would like to store SHA-256 hashes of my data in the iceberg tables and hash-addressed in S3, e.g. s3://my-bucket/objects/b5e170cce3c3849ecbb753ac172e18826a6e99ac523a000993d16e72bdde837a. I would then like to set up a view in locally-hosted Dremio to retrieve the data in a query.

Here’s where I’m stuck though: When I add a non-Nessie, general S3 source to Dremio, it appears to be inaccessible to the SQL queries until I nominate some objects/folders as some sort of tabular data (parquet, csv, delimited text, etc.). Until I do that, I cannot figure out how to access the data from a query. Once I define the type of tabular data, I can query it. However, the large objects I want to store are not necessarily going to be tabular data - they are just going to be the result of importing NVARCHAR(MAX) and VARBINARY(MAX) data from upstream OLTP SQL Server databases.

Is what I’m trying to do possible? Does it make sense?
Thank you for you help.

@ozzah If I understand it right, except the large objects, rest are going to be in columnar Iceberg format while the columns with large field width are on row wise CSV files? You can join them in Dremio but even those will hit the limit Or is the plan to keep the large objects in Oracle, that might work as Dremio will do a push down to Oracle and you join with Iceberg table

Note: The field width increase enahancement is also in the roadmap and will update on a ball park ETA