I have few questions on dremio and trying to compare with presto.
In presto, we normally create tables in hive, and have presto to run queries by referring to the metadata created in hive. I’m trying to understand the same in dremio, and have below questions.
- Does dremio support meta data management or do we need to have hive similar to presto.
- If dremio supports meta data management, then how to create tables in Dremio. Using docker am able to create a data lake. But not sure how to create tables. Do we need to create PDS and VDS from data lake.
- Is this the only way to create tables (using PDS and VDS)?
- I have partitioned data in ADLS based on specific column and want to create tables accordingly. Not sure how to create table with partition column in dremio.
- How to refresh dremio table whenever a new partition is added in ADLS. As per our use case, we create at least 100 partitions every day to tables. Want to have this reflected in dremio table in real-time.
Thanks
@snelaturu
#1 Yes Dremio does metadata management of its own
http://docs.dremio.com/advanced-administration/metadata-caching.html
http://docs.dremio.com/sql-reference/sql-commands/datasets.html#refreshing-physical-dataset-metadata
#2 How to create tables in Dremio? Not required, just add the source and the background metadata refresh should scan for new tables, new columns, new datatypes, new files added etc
#3 If you want to create new tables based of older tables then you can use CTAS
http://docs.dremio.com/sql-reference/sql-commands/tables.html
#4 You simply add the Azure storage as a source and Dremio and promote the dataset (above the partition), Dremio will turn it into a table with partitions defined
http://docs.dremio.com/data-sources/azure-storage.html
http://docs.dremio.com/rest-api/catalog/post-catalog-id.html
#5 Real time is coming up later this year, but until then you can increase background refresh interval or once the new partition is added just refresh metadata only for that dataset using SQL
http://docs.dremio.com/sql-reference/sql-commands/datasets.html#refreshing-physical-dataset-metadata