I’m quite new to Big data field as well as dremio, so can you please advise
Hue, Zepplein, Jupyter, and BI tools are end user tools that help to write queries and visualize data. We call these Data Consumer Tools. They all submit queries to a query processing engine like Hive.
Most of these tools (Hue is limited to Hadoop sources) can connect to Dremio to issue queries. Dremio includes a high performance SQL execution engine based on Apache Arrow that can be used to query data in HDFS, S3, RDBMS, MongoDB, Elasticsearch, and others.
What may be confusing you is that Dremio also provides a GUI that might seem a little like Hue, Jupyter, and others. The role of this GUI is to curate datasets, search Dremio’s data catalog, view data lineage, and perform administrative tasks. In fact you could imagine the following scenairo:
- search for data in Dremio
- preview dataset
- launch Jupyter notebook connected to dataset over ODBC to Dremio for processing end queries.
And:
- search for data in Dremio
- perform transformations and joins to build new dataset (zero copy of data)
- click button to launch Tableau connected to your new dataset.
Hope that helps.