Now that I have Dremio set up, I am trying to integrate my workflows. There is one capability I don’t see and I’m wondering if I’m missing something.
I have data that needs a lot of transformation. To do it in SQL is too complex. I use Python scripts for the purpose. It’s important to me to be able to track data history and lineage. Can I do that with Dremio?
It seems to me that to replicate my workflow with Dremio, I need to set my original data as a source, connect to Dremio with Python and run my scripts, then put the transformed data into Dremio as a new dataset. This allows the data to be in Dremio, but someone opening it can’t tell its source, or how it was transformed; that needs to be done outside Dremio, with some naming convention and reference to the script in version control. Does that seem right?
Relatedly, it seems Dremio requires users to make their own Python environments. I have previously worked with Civis Platform, which provides containers for users to run scripts against their data, and which tracks all of the scripts executed. Am I right in thinking Dremio doesn’t have something like that?