I am playing around with pyodbc to query data sources from Dremio and store the result in a DataFrame. I was wondering if the reverse was possible: Can I save a DataFrame that I’ve created (say from webscrapping) and save as a file in Dremio?
as a newbie…why would you use Dremio from python to query a data source ? Wouldn’t it be better to directly connect to the data source or query the data ? But I like your thought of externalizing a data frame to Dremio. But then again, as far as I know Dremio cannot do DML, but only CTAS. I really wish it can do DML as well. That would have made it really a cool tool.
Hey Raju, thank you for your response! Where I work we have a large HPC cluster that has many users. So it would be nice for users to be able to submit jobs that pull out the data they need from large data lakes and then perform whatever processing they want. For example, use Dremio to query/transform their data and then perform some analytics with Python. This can be done with the ODBC driver and pyodbc.
Now that I think of it, if I use NAS and save my dataframe in a directory in my NAS datasource, it should be available in Dremio. The reason I ask is I was hoping to find a way to apply Dremio to ETL in addition to other things. It would be nice from an automation standpoint for users not to have to open the UI, but write scripts that can be run at different times through a crontab to push/pull data through Dremio. But, I kind of answered my own question.
Thanks again!