Complete data pipeline

Can any one give some list of reference with which we can implement the complete data pipeline using dremio ?

Best ways to ingest data to dremio and process them. Comparing between different options to ingest and on which scenario I should use which tool.

I saw some reference where it states we can process data using spark. And some places where it states we can process it using DBT. when to use spark and when to use DBT.

@sksankar2006 What format is your source data in? Is it CSV/TXT/PARQUET?

Hi It will be in multiple formats. Basically we are building a new data lake house setup which can grow in exponential size . The file formats will be in of all CSV, json, .dat and we are planning to include kafka as a streaming service to ingest stream data. What I am looking for a reference architecture documentation with which I can find

  • the best way to impliment it.
  • Should I use spark or DBT or the native SQL in dremio itself.
  • If I have to mix all these where will each of them fit.
  • How to maintain the CI/CD for Dremio project? Iam not getting a reference except one document that mention about DBT.