I just started using dremio and I want to know when I add a data source, where is the information about the data source and its metadata stored?
The data source definition is stored in a local Rocks DB KV store. There’s also a cache of tables and views in the KV store. Metadata about specific tables will depend on the source. If the table is Iceberg or Parquet, then the metadata like the manifest list and files are not stored in the KV store.
Are data source information, data source metadata (such as table names, column names) and jobs information all stored in rocksdb? I also found Lucene-related files in the data directory. Do you know how they work together? I want to understand what kind of data in drenio will be stored in rocksdb, what kind of data will use Lucene, and how they are used together in dremio?
@fuliu Lucene is used for search wghile everything under catalog is the metadata plus files needed for recovery in case Dremio was not stopped properly
@balaji.ramaswamy Can dremio’s KV store be replaced with mysql, and Lucene be replaced with elasticsearch? Do you know anyone in the community who has tried this?
No. But there is a well defined interface for KV store for which its possible to add other implementations besides RocksDB.
I have an idea to integrate rm IO with a metadata management platform. I want to use MySQL and es as storage. Do you think it is feasible and what problems need to be solved?
@fuliu Currently for software version only RocksDB embedded KVstore is supported