Hi @rleyba
The answer slightly varies when it is a file system based source like ADLS/S3/HDFS/Hive Versud something we pushdown like Oracle or your case Elasticsearch
Lets take an example of a simple table in Hive called reftest like below
create table reftest (col1 int, col2 varchar(20));
Let us first insert two rows
insert into reftest values (1,‘A’);
insert into reftest values (2,‘B’);
When we create a reflection on top of this, it will use the reflection and return two rows
Now if we go and add 2 rows into Hive like below
insert into reftest values (3,‘C’);
insert into reftest values (4,‘D’);
If we run the query now Dremio will not know there are 2 additional files to read on HDFS, so if run the query on the same table we will still return only 2 rows and use the reflection. If we disable the reflection Dremio would still return only two 2 rows
Now if we go refresh the reflection and run the query again Dremio would still return 2 rows using the reflection
Now as a next step we have to run the below command,
“alter pds reftest refresh metadata”
Now if we do not use reflection, query will return all rows, if we do use the reflection and reflection has not been refreshed, Dremio would use the reflection and return only 2 rows
Refreshing the reflection and then running the query would use the reflection and return all the 4 rows
In case of a RDBMS like Oracle, NoSQL like MongoDB or Elasticsearh were we use push down, the behavior is slightly different
create table reftestora(col1 number, col2 varchar(20));
insert into reftestora values (1,‘A’);
insert into reftestora values (2,‘B’);
Now creating a reflection on reftestora will use the reflection and return both rows
Let us insert two more rows
insert into reftestora values (3,‘C’);
insert into reftestora values (4,‘D’);
If we disable the reflection and run the query, unlike Hive w do not require a PDS metadata refresh as this query will get pushed down and we will get all 4 rows. PDS metadata refresh is only needed if there is a schema level change
No if we turn on the reflection and run the query (reflection has not been refreshed) the query will use the reflection and return only 2 rows
Now refreshing the reflection and the running the query will return all 4 rows
Note: As stated earlier unlike file system based sources a PDS metadata refresh is not needed
So if we add 2 more rows like below
insert into reftestora values (5,‘E’);
insert into reftestora values (6,‘F’);
Now all we need to do is refresh the reflection (no PDS refresh) and the query should accelerate and return all 6 rows
Kindly let us know if you have any other questions
Thanks
@balaji.ramaswamy