When you connect to a Dremio instance a large number of schema names are returned.
This includes mappings to external Parquet files in S3 under data lake section of Admin UX.
Dremio server 4.9.3-202010281843560195-edc49b6d
Dremio driver 4.9.1-202010230218060541-2e764ed0
From a consuming application perspective, many of the schemas returned are not expected and there appears to be no way to make Dremio suppress them by default.
The Resultset from DatabaseMetadata.getTables returns a null value for the schema name.
Persons familiar with other database systems would not expect to see the large number of schemas. The would expect to have some means to group those objects into fewer (i.e. one) logical schema. They certainly would not be expecting to see a potentially very large list of schemas by default.
The Dremio JDBC driver response from DatabaseMetadata.getTables claims qualified names is “DREMIO”.“A.B.C.D1.T1” and the simple name is DREMIO.A.B.C.D1.T1
Not able to completely follow your concern. Let me try to understand and help
Is one of your concerns about full qualified names? From JDBC that is mandatory
Is your other concern about, too many datasets returned by your call? What is the call you are trying to make? Is it something like everything under INFORMATION_SCHEMA.“TABLES”?
A simple JDBC program calls the DatabaseMetadata.getSchemas method.
ResultSet rs = dbMeta.getSchemas(null ,null);
The ResultSet returned by the Dremio is expected to include a row for each schema that is in the system.
There is no documentation for the driver etc that define that this is how Dremio is designed to work.
[$scratch],[DREMIO]
[@dbcert],[DREMIO]
…
[A],[DREMIO] <---- this would be reference to S3
[A.B],[DREMIO] <------ this is the bucket in S3
[A.B.C],[DREMIO] <-- this is a folder in S3
[A.B.C.D1],[DREMIO] <-- this would be a folder in S3
[A.B.C.D2],[DREMIO]
[A.B.C.D3],[DREMIO]
…
[A.B.C.D.T9999],[DREMIO]
[dbcert],[DREMIO] <---- THIS is my schema
[sys],[DREMIO]
[sys.cache],[DREMIO]
I also mentioned that Dremio returns a CATALOG name of Dremio in the JDBC driver interface. However, the SQL parser is not expecting 3-part CATALOG.SCHEMA.TABLE names. So why is it returning a catalog etc?
Keep in mind, applications which use dynamic systems to get metadata from the DatabaseMetadata interface etc, will see these issues. People who hand code will typically be unaware etc.
The schemas returned by the driver are indeed all the schemas that have tables in them. Dremio supports a hierarchy of folders (not just two levels like most databases) so you could have a folder a.b.c and in it two tables foo and bar. In that case a.b.c would be one of the schemas returned.
WRT the original question, this won’t be a problem because files in the object store (e.g., a Parquet file in a folder in S3) are not tables unless someone actually promoted the file to a table.