Cpu 100%+ and query slowly

my sql:

SELECT o.org_id "orgId", o.org_name "orgName",
                o.branch_count_all "branchCount",
                o.ebranch_count_all "ebranchCount",
            a.*
            FROM (
            SELECT substr(t.org_code, 1, length('000002000008') + 6) "orgCode",
              sum(active_branch) "activeBranch", sum(total_meeting) "totalMeeting",
              sum(t.category_1_active) "category1Active", sum(t.category_1_sum) "category1Sum" ,
              sum(t.category_2_active) "category2Active", sum(t.category_2_sum) "category2Sum" ,
              sum(t.category_3_active) "category3Active", sum(t.category_3_sum) "category3Sum" ,
              sum(t.category_4_active) "category4Active", sum(t.category_4_sum) "category4Sum" ,
              sum(t.category_5_active) "category5Active", sum(t.category_5_sum) "category5Sum" ,
              sum(t.category_6_active) "category6Active", sum(t.category_6_sum) "category6Sum" ,
              sum(t.category_7_active) "category7Active", sum(t.category_7_sum) "category7Sum" ,
              sum(t.category_8_active) "category8Active", sum(t.category_8_sum) "category8Sum" ,
              sum(t.category_9_active) "category9Active", sum(t.category_9_sum) "category9Sum" ,
              sum(t.category_classroom_active) "categoryClassroomActive", sum(t.category_classroom_sum) "categoryClassroomSum"
            from (
            SELECT t.org_code, 
            (case when sum(active_branch)>=1 THEN 1 else 0 end ) as active_branch,
            sum(total_meeting) as total_meeting,
            (case when sum(t.category_1_active)>=1 THEN 1 else 0 end ) as category_1_active, sum(t.category_1_sum) category_1_sum ,
            (case when sum(t.category_2_active)>=1 THEN 1 else 0 end ) as category_2_active, sum(t.category_2_sum) category_2_sum ,
            (case when sum(t.category_3_active)>=1 THEN 1 else 0 end ) as category_3_active, sum(t.category_3_sum) category_3_sum ,
            (case when sum(t.category_4_active)>=1 THEN 1 else 0 end ) as category_4_active, sum(t.category_4_sum) category_4_sum ,
            (case when sum(t.category_5_active)>=1 THEN 1 else 0 end ) as category_5_active, sum(t.category_5_sum) category_5_sum ,
            (case when sum(t.category_6_active)>=1 THEN 1 else 0 end ) as category_6_active, sum(t.category_6_sum) category_6_sum ,
            (case when sum(t.category_7_active)>=1 THEN 1 else 0 end ) as category_7_active, sum(t.category_7_sum) category_7_sum ,
            (case when sum(t.category_8_active)>=1 THEN 1 else 0 end ) as category_8_active, sum(t.category_8_sum) category_8_sum ,
            (case when sum(t.category_9_active)>=1 THEN 1 else 0 end ) as category_9_active, sum(t.category_9_sum) category_9_sum ,
            (case when sum(t.category_classroom_active)>=1 THEN 1 else 0 end ) as category_classroom_active, sum(t.category_classroom_sum) category_classroom_sum
            FROM
            "dws_ezb_meeting_start" t
            where 1 = 1
                and t.org_code like '000002000008%'
                and t.start_month >= 202007
                and t.start_month <= 202107
                and (t.is_functional = 1 or t.is_functional = 2 )
            GROUP BY t.org_code
                    HAVING sum(active_branch ) > 0
            ) t
            GROUP BY substr(t.org_code, 1, length('000002000008') + 6) 
        ) a JOIN "dws_ezb_org_count" o ON a."orgCode" = o.org_code order by o.org_code LIMIT 10 OFFSET 0

When I use one concurrent SQL query, it takes 1 second, but when I use two concurrent SQL queries, it takes 2 seconds for each SQL query, and 5 seconds for five concurrent SQL queries. When viewing the system information, it is found that when executing an SQL query, the CPU utilization is as high as 100% +, and the CPU consumption of each node is very high.

There are about 6 million data in total。

profile:

9919a031-91c5-4071-877b-fffaf30ad26e.zip (40.1 KB)

@lanranjun As you bump up the concurrency, the query execution time will slightly drop down. Is your expectation to have 1s response time for 5 queries too?

@balaji.ramaswamy Yes, I hope that in the case of concurrency, each request can reach 1 second. Dremio is used to replace MySQL because the amount of data is too large and there are too many aggregation operations

@lanranjun 5s for concurrency of 5 seems reasonable, we probably need the profile to see where the 5s was spent, if it is all in sleep then you need more cores

@balaji.ramaswamy Our dremio server is used to access as a front-end web API, so there will be a large number of query requests at the same time,
profile file:
9919a031-91c5-4071-877b-fffaf30ad26e.zip (40.1 KB)

@lanranjun Takes about 4.6s, I see you only have 13 splits so thats the maximum parallelism on the scan you can get