After importing the dataset into dremio, the folder for store_sales shows up as dir0 with values
ss_sold_date_sk=2452635
ss_sold_date_sk=2452636
ss_sold_date_sk=2452637
ss_sold_date_sk=2452638
ss_sold_date_sk=2452639
ss_sold_date_sk=2452640
ss_sold_date_sk=2452641
ss_sold_date_sk=2452642
We have a date dimension table date_dim with a column d_date_sk with values
What are our best practices in such situations? Hive handles the new column by stripping the ‘ss_sold_date_sk=’ part so that when we join, it is straight forward
Did you try to create a virtual dataset to extract the date out of the dir0 column? Then, you could use that new dataset to perform your join the same way as in hive
Here are a couple of links which might be of interest to you:
Is there a plan to make it part of the discovery process to be compatible with hive. We would like to have the same queries to be executed on both with any custom UDFs in Dremio
Also, is there an easy way to generate the full select statement. Is there intellisense built into the UI
We would love to have select fielda, fieldb from table1 generated with a click on the table. This makes it easy to create UDFs across 100 odd tables we might have
Do you mean you don’t have access to Hive Metastore on your standalone cluster or you mean you deploy Dremio using YARN (which you don’t need to do if you don’t want to)
If you use the Hive source instead of the HDFS source, the metadata is fetched from Hive using the Hive metastore, but the actual query is still managed by Dremio, by directly accessing the underlying data.
As long as you provide metastore URI to your remote Hive while configuring Hive source it should be fine. You don’t need Hive installation on your standalone cluster