Running a cluser with a few thousand, relatively small datasets that need to be refreshed at a relatively high rate (every 15, or even 5 min).
We are seeing issue with scaling the metadata refresh and are becoming concerned that Dremio clusters are bottlenecked on refreshes due to the single master architecture, or more specifically the use of RocksDB that only allows a single writer.
Have we narrowed down the cause to be the write to RocksDB? The delay is usually waiting on the source to provide the metadata information like waiting on S3 or HDFS etc. How did we confirm?
Even if the bottleneck is currently not RocksDB, the concern is that - unlike read access during query planning that can be scaled horizontally across multiple co-ordinators - metadata refresh only scales vertically. There’s a limit to the I/O bandwidth a single master coordinator can achieve. Using RocksDB as the metadata store preculdes horizontally scaling metedata refresh since RocksDB is a single-writer store.
With that, we’re still looking at what is bottlenecking metadata refreshes in the configuration we’re currently using.