I am testing query performance and executor load balancing in Dremio and need guidance to understand whether I am facing a configuration or architectural bottleneck.
Environment Setup
-
Source: MySQL table with ~10 million rows
-
Dremio Object:
-
Created a view on the MySQL table
-
Enabled Raw Reflection
-
Partitioned by:
event_date(20 distinct days) -
Sort by:
msisdn
-
-
-
Reflection is accelerated and queries are served from the Iceberg reflection table
Cluster Topology
| Node | Role | Physical Cores | Logical CPUs | Total RAM | Available RAM |
|---|---|---|---|---|---|
| server35 | Executor only | 16 | 32 | 31 GB | 9.1 GB |
| srvr82 | Executor only | 8 | 16 | 31 GB | 14 GB |
| server179 | Master + Coordinator | — | 16 | 62 GB | 18 GB |
No executor is running on the coordinator node.
Query Under Test
Executed continuously using JMeter with 50 threads and infinite loop, using random values:
SELECT * FROM test_case.test_04 WHERE event_date =‘2025/02/11’ AND msisdn = 8030043017;
Observed Behavior
-
TPS achieved: ~19–20 queries/sec
-
Issue:
-
TPS does not increase even with 2 executors
-
CPU usage appears skewed, not evenly balanced across executors
-
One executor often appears more loaded than the other
-
-
Reflection is definitely being used (verified via query profile)
SELECT * FROM test_case.test_02 WHERE event_date BETWEEN ‘2023-11-01’ AND ‘2023-11-20’ AND msisdn = 8030461446;
i am getting memory issue for the above query or \~1-3 tps i am getting
Questions / Help Needed
-
Is ~20 TPS expected for this query pattern (high-concurrency point lookups on Iceberg reflections)?
-
Why might queries not be evenly distributed across both executors?
-
Could the following be causing bottlenecks?
-
Single-row lookup (
msisdn + event_date) -
Reflection partitioning strategy
-
Executor core / memory imbalance (16 cores vs 8 cores)
-
Coordinator planning or fragment assignment behavior
-
-
How can I verify executor-level load balancing correctly?
-
Query profile metrics?
-
Specific executor CPU / fragment stats to look at?
-
-
Are there recommended reflection designs or tuning settings for:
- High-TPS lookup workloads
Goal
I want to:
-
Achieve better TPS
-
Ensure work is evenly balanced between both executors
-
Understand whether this use case is realistically scalable in Dremio or if I’m hitting a design limitation
Any guidance, tuning suggestions, or best-practice recommendations would be greatly appreciated.
Dremio version : 25.2.0
229561ab-ed25-412f-946c-a7d77bb7c9e3.zip (13.5 KB)
Thanks in advance