Dremio Performance & Load Balancing Issue

I am testing query performance and executor load balancing in Dremio and need guidance to understand whether I am facing a configuration or architectural bottleneck.

Environment Setup

  • Source: MySQL table with ~10 million rows

  • Dremio Object:

    • Created a view on the MySQL table

    • Enabled Raw Reflection

      • Partitioned by: event_date (20 distinct days)

      • Sort by: msisdn

  • Reflection is accelerated and queries are served from the Iceberg reflection table

Cluster Topology

Node Role Physical Cores Logical CPUs Total RAM Available RAM
server35 Executor only 16 32 31 GB 9.1 GB
srvr82 Executor only 8 16 31 GB 14 GB
server179 Master + Coordinator 16 62 GB 18 GB

No executor is running on the coordinator node.

Query Under Test

Executed continuously using JMeter with 50 threads and infinite loop, using random values:
SELECT * FROM test_case.test_04 WHERE event_date =‘2025/02/11’ AND msisdn = 8030043017;

Observed Behavior

  • TPS achieved: ~19–20 queries/sec

  • Issue:

    • TPS does not increase even with 2 executors

    • CPU usage appears skewed, not evenly balanced across executors

    • One executor often appears more loaded than the other

  • Reflection is definitely being used (verified via query profile)

SELECT * FROM test_case.test_02 WHERE event_date BETWEEN ‘2023-11-01’ AND ‘2023-11-20’ AND msisdn = 8030461446;

        i am getting memory issue for the above query or \~1-3 tps i am getting

Questions / Help Needed

  1. Is ~20 TPS expected for this query pattern (high-concurrency point lookups on Iceberg reflections)?

  2. Why might queries not be evenly distributed across both executors?

  3. Could the following be causing bottlenecks?

    • Single-row lookup (msisdn + event_date)

    • Reflection partitioning strategy

    • Executor core / memory imbalance (16 cores vs 8 cores)

    • Coordinator planning or fragment assignment behavior

  4. How can I verify executor-level load balancing correctly?

    • Query profile metrics?

    • Specific executor CPU / fragment stats to look at?

  5. Are there recommended reflection designs or tuning settings for:

    • High-TPS lookup workloads

Goal

I want to:

  • Achieve better TPS

  • Ensure work is evenly balanced between both executors

  • Understand whether this use case is realistically scalable in Dremio or if I’m hitting a design limitation

Any guidance, tuning suggestions, or best-practice recommendations would be greatly appreciated.
Dremio version : 25.2.0

229561ab-ed25-412f-946c-a7d77bb7c9e3.zip (13.5 KB)
Thanks in advance

Hi team,

Can you please update on above points ?

Thanks in Advance

@ruchitha All the time spent was on the JDBC_SUB_SCAN which is always single threaded so does matter number of executors, but I thought you said query was accelerated?