We have Hive source with 1 day metadata full refresh configured and every full refresh is taking almost 7 to 9 hours time.
Below are the logs for metadata full refresh,
Source ‘XXXXX’ refreshed in 24666 seconds. Details:
Shallow probed 102798 datasets: 1 added, 102795 unchanged, 2 deleted
Deep probed 1030 queried datasets: 786 changed, 242 unchanged, 0 deleted, 2 unreadable
There are few tables which has around 1600 partitions, which are taking approximately 10 mins time to complete refresh for that table.
Is there any optimisations in latest dremio version? Or is it normal behaviour?
Also suggest if there is any way to reduce the process time.
we are using dremio 3.3.1.