Hello guys,
Is this on your radar? I know the masking features but piping in a rules repository and the support for it on the final stages of serving data would make Dremio stand out.
Hello guys,
Is this on your radar? I know the masking features but piping in a rules repository and the support for it on the final stages of serving data would make Dremio stand out.
Can you give an example for how you would use this as part of processing a query? How are you using it with other systems today?
Masking PII information for various users/groups or assets or applying transformations to the rows (eg. removing the store’s share out of a price. We’re an mobile application’s company). We have our data both in Hadoop, exposed/governed via “Hive” as tables (views would help here) and would need the last step of integration of rules. And these rules must be shared with the real-time data (same copy as in Hadoop only served real-time) in a central place.
If Dremio as a “last mile” tool would support integration of an BPM engine, that would set it apart. I’ve said “Drools” as it seems the most popular/widely supported.
In short, data governance.
Are you able to use with Hive or some other engine?
You can definitely implement this kind of logic in Dremio using SQL today, and we are working on ways to make it easier to express and manage.
Thanks for the color and use case.
We’ll probably implement UDFs and requires users to call the “validate_business_rules” long-named UDF. But UDFs are optional to use. We wanted something more controlled/integrated, pre-provided. Thought of teasing you an use-case that’s hot and not usually covered by all the big data tools (and that’s data governance).
Interesting. We were just looking into using Drools.
Couldn’t Dremio be easily used on the input data stream side of Drools (for all your sourced data) and then on the output to a data store of your choice (S3, DB, etc) once Drools applied the rules logic to the data source (Dremio to then pick back up that data)?