Hi can we have Dremio query via the AWS Glue Data Catalog? I saw thread from a year ago where you said you are “watching closely”… are you still watching, or do you have a way for us to query all our AWS Lake data?
We will indeed be adding Glue metastore support in an upcoming release. Stay tuned…
When exactly will that release be available?
We are still figuring out which exact release, but likely towards the end of the year or soon afterwards.
thanks. while we wait on that, any integration soon just with AWS Athena?
Hi, Any Update About the AWS Glue Data Catalog integration ?
you can easy connect amazon glue to athena then connecto Dremio to Athena.
Yeah but in that way you push down the work to Athena instead of Dremio, icurring in the extra cost of scanning the data. I wolud like to scan the data with the Dremio cluster.
I’m not expert of AWS Glue, but Glue not store data itself, it only orchestate and run jobs of ETLs then store data in some target data store (s3), so you can connect dremio direct to this target s3…
More info: https://medium.com/redbus-in/data-lake-formation-with-aws-glue-apache-drill-9f770a738100
and https://docs.aws.amazon.com/glue/latest/dg/components-key-concepts.html (Tables and databases in AWS Glue are objects in the AWS Glue Data Catalog. They contain metadata; they don’t contain data from a data store)
but if you really really need query direct to glue (not recommended) you can develop a custom dremio connector using api of glue and custom jdbc driver
Yes, I was talking about AWS Glue Data Catalog. My interest is in accessing the metadata in the Catalog, serving as a Metastore for a Dremio cluster.
I am waiting for the same feature as well.
Does anyone know if this was done? Is it possible to connect to tables in Glue Data Catalog as Hive Metastore? @tshiran
This will be in 4.6 which will be available in a couple weeks.