Data Governance / Expanding Data Catalogue Functionality


#1

Hi Team,

Let me start by saying what a great product you guys have! I believe it will be a game changer…

Now, I am looking into adopting Dremio into our organisation; however, after reviewing the open source version of the product I am not quite sure how I could use Dremio’s data catalogue…

Requirements: I would like to have in place a centralised environment where people from the organisation can search for data, review the data (data profiling), request access, and also have access to our business glossary (as opposed to technical glossary). Of course, I don’t expect for Dremio to do all this as these are the functions of a data governance tool such as Alation or Collibra (to name a few). Based on this, here’s my questions:

  1. Does the paid version of the product provide additional data governance/catalogue capabilities?
  2. Would it be possible to connect Dremio to products such as Alation, perhaps via the Web API? I know the technical answer would be yes, however, how would you go about it? Any particular recommendations? I would like to achieve a unified/seamless experience to my end users.
  3. Can you (or us) extend Dremio’s functionality to allow for some of these governance/catalogue capabilities? I would love to at least be able to have a business glossary, and be able to add tags to the different data sets and capture information such as who is the data owner, data stewardship, what is the quality of the data (capture some kind of user-driven rating), people who is using the data (all this would require some level of connectivity to AD), at the end of the day is about connecting people!!!

Keep in mind that I am also asking similar questions on the opposite end; that is, for instance I reached out Alation asking for options to ingest data from third-party data sources such as Dremio…

Your ideas and feedback are very welcome!


#2

Answers to your questions:

  1. Enterprise Edition includes a) LDAP integration, b) RBAC, c) dynamic masking abilities, d) data provenance & lineage, and a few other things.

  2. Yes this should be possible, and we have some customers who are exploring such an integration. Unfortunately, I do not have additional details I can share here.

  3. Much of what you are asking for is coming in Dremio 3.0, due out later this year. If you’re interested in providing some feedback into our process, ping @can and he can set up a chat.

Thanks for the feedback!


#3

Kelly,

Thanks for your reply, I will definitely get in touch with @can.


#4

Hi @kelly,

There are other questions like this one on the forum. It will be very interesting to in fact, manage semantics, tags and be able to describe comment datasets on the platform. A good overview of features that you could add to Dremio in the Data Government area could be what Airbnb Nerds describe in a presentation

Scaling tribal knowledge at Airbnb

Video


Presenation

Cheers


#5

@can

In the context of the requirements above from @akikax - is it correct to think of the catalog api as being generic?

  1. Not coupled to the exact implementation in Dremio, extensible, etc
  2. Aiming to become an industry standard for data management, as opposed just a (very powerful) technology?

From what I learned so far it seems it is but it is not promoted this way and there is a “risk” of it being perceived as a technology only when, in fact, it could be a standard to integrate custom data lakes with both application logic and more business oriented technologies and apps.

As architect can I promote it in this sense (is this the product intent / direction)?

Thanks


#6

Ability to search catalog info and semantic annotations using Elastisearch or Solr would also be powerful indeed.

But hard to generalize in an api given that semantic annotations may extend outside Dremio (in a wiki, for example) - maybe having a basic library and ability to customize the indexing workflow? (thinking out loud)