[Discussion] Best Practice NFS Server for Dremio Clustering (Ceph with Ganesha, Direct CephFS, or etc?)

Hi everyone,

I would like to share my current setup and experience running Dremio in a clustered environment, and I also have some questions for those who might have encountered similar situations.

Current Dremio Cluster Setup:

  1. Master Coordinator Node

NOTE: I use zookeeper external but installed on this node

  • 1 node
  • 16 vCPU
  • 64 GB RAM
  1. Executor Node
  • 1 node
  • 32 vCPU
  • 126 GB RAM

As we know, Dremio requires centralized NFSv4-compatible storage that supports object locking, especially for:

  • rocksdb (key-value metadata store)
  • c3cache (columnar cloud cache)

Storage Backend Configuration:

I’m using Ceph with a single OSD 1 TB with this configuration pool:

  • rocksdb = 900 GB
  • c3cache = 100 GB

and exporting the pool via NFS Ganesha, which is then mounted on each Dremio node like this:

EXPORT
{
    Export_Id = 101;
    Path = "/volumes/dremio_group/rocksdb";
    Pseudo = "/dremio/rocksdb";
    Squash = "No_root_squash";
    SecType = "sys";
    Protocols = 4;

    CLIENT {
        Clients = ip_address_node_dremio_1, ip_address_node_dremio_2;
        Access_Type = RW;
    }

    FSAL {
        Name = CEPH;
        User_Id = "ganesha";
        Secret_Access_Key = "XXXXXXXXXXXXXXXXXXXXXXXX"; 
    }
}

EXPORT
{
    Export_Id = 102;
    Path = "/volumes/dremio_group/c3cache";
    Pseudo = "/dremio/c3cache";
    Squash = "No_root_squash";
    SecType = "sys";
    Protocols = 4;

    CLIENT {
        Clients = ip_address_node_dremio_1, ip_address_node_dremio_2;
        Access_Type = RW;
    }

    FSAL {
        Name = CEPH;
        User_Id = "ganesha";
        Secret_Access_Key = "XXXXXXXXXXXXXXXXXXXXXXXX"; 
    }
}
nfs-lakehouse-1.example.internal:/dremio/rocksdb nfs4  1000G  277G  724G  28% /mnt/nfs/dremio/rocksdb
nfs-lakehouse-1.example.internal:/dremio/c3cache nfs4  1000G  277G  724G  28% /mnt/nfs/dremio/c3cache

The Issue:

When i tried copying files into the mounted NFS path from the Dremio nodes via NFS Ganesha, I noticed the transfer speed is capped at around 3–4 MB/s.

NOTE: I tried to research about this issue on the official NFS Ganesha GitHub repository, and it turns out that many users have reported similar performance issues. From what I could find, this has been a known issue since at least 2019, with several open discussions and reports indicating slow throughput when using NFS Ganesha for large or high-frequency I/O workloads.

To troubleshoot, I tested mounting CephFS directly to the Dremio node without going through NFS Ganesha using this command:

sudo mount -t ceph nfs-lakehouse-1.example.internal:6789:/ /mnt/test -o name=admin,secretfile=admin.key

When copying the same files, the transfer speed jumped significantly to around 150–250 MB/s.

This indicates a possible bottleneck in the NFS Ganesha layer.

My Questions:

  1. Is there any recommended NFS server that works well and performs efficiently for Dremio clustering, especially for rocksdb and c3cache?

  2. Is it safe and supported to mount CephFS directly on Dremio nodes without using NFS Ganesha?

  • Does CephFS natively support NFSv4-style locking required by Dremio?
  • Has anyone successfully run a production Dremio cluster using a direct CephFS mount?

I’m looking for a scalable solution since I plan to:

  • Add more OSDs or disk capacity
  • Add more coordinator (standby or secondary) and executor nodes
  • Expand storage and compute without needing large migrations

Thank you in advance. I hope this discussion is also helpful for others working on similar setups.

Best regards,
Arman

I can’t comment about CephFS as a distributed file system store but I can tell you that if you add additional coordinator nodes for query planning they need to proxy through the master coordinator to talk to rocksdb. So, it is more performant to vertically scale the master coordinator than to add additional scale out coordinator nodes.