am using Dremio 2.0.5, Dremio deployed in open shift container with 10 nodes.
looking for solution for below
is there any options available to purge deprecated reflection?
distributed data store is in shared mount point in open shift. if any one of node down then reflection status become INCOMPLETE. since its in shared mount one node down should not impact. i do look at code, its checks partitions with active nodes. is there any solution for this.
for 1 : let say reflection refresh policy is every 2 hours and expiry after 5 hours. when reflection will be deleted from distributed store after expiry. is there any advanced options-support key available to configured this. i want to purge reflection which is disabled and deprecated reflection .
for 2 : yes one of node act as coordinator and executor. i will go thru High availability docs and let you know in case anything.
By default deprecated reflections are removed 4 hours after they are marked as deprecated (primarily to let any queries that are using the reflection to finish). We do have a system option called reflection.deletion.grace_seconds that determines the grace period before they are removed - just be aware that if the reflection is being used and deleted mid query things will go wrong .
The first question I have is how you configure your shared mount point in your Dremio conf file. Are you using something like dist: "pdfs://"${paths.local}"/pdfs"?
You mention INCOMPLETE - so when you run select * from sys.reflections the STATUS for your reflection is INCOMPLETE? If so, can you run select * from sys.materializations - what does the data_partitions column say for your reflection?
in dremio.conf
paths: { local: “/var/lib/dremio/local”,
dist: “pdfs:///var/lib/dremio/share” }
yes. sys.reflections status is INCOMPLETE.
data_partitions column has ip address of node1,node2(xx.xx.xx.xx,xx.xx.xx.xx) in sys.materializations table.
For pdfs the current expected behavior is if any of the nodes are unreachable then the reflection will not work - I will open an internal ticket to see if for pdfs we should be doing that check or not.
Basically pdfs does not guarantee that the data is available on on nodes since its only pseudo distributed - you could be pointing pdfs at a local folder on each node and not necessarily a shared drive.