I am completely new to Big data and so to dremio. Apologies if my queries are basic.
I pushed my data at a certain location in hadoop. This has almost 900 million records distributed automatically by Spark API in almost 1500 file. Each records has appx 470 columns mix of String and integer and date fields
Now I created a VDS in dremio pointing to this HDFS location. Now, If I run a simple query like
select * from tableName where isin=‘isin1’, it runs very fine and comes back in 1-2 minutes but as soon as I modify this query to lookup multiple isin, it takes almost 50 minutes.
If i further modify the query to include date fields in filter, it further worsens.
select * from tableName where isin=‘isin1’ and ‘2020-05-02’ between from_dt and to_dt
Noe:- The VDS doe snot have any reflection or indexes till now
how can we solve it? Should we create indexes/refelction over these columns.