Dremio parallel job execution

Hi all
I am having some trouble with understanding the execution of minor fragments.
I am attaching a visualization plan below. There are two leaf branches. Is the execution of minor fragments in leaf branches are parallel? after these processes, from unionAll, is the execution sequential?

@JoiceJacob Kindly send us the profile and we can explain, it does seem like your query was planned in a single phase and that means it was all the way single threaded, not sure where you see parallel threads in the scan

@balaji.ramaswamy
Here is the job profile :
8b053f4d-c8e5-4955-8a7e-14eb1e3e3eef.zip (16.7 KB)

@JoiceJacob Your entire query was planned in a single phase so it was all the way single threaded, this is because you had very less rows in JDBC scan and zero rows on your PARQUET_SCAN

@balaji.ramaswamy
Thank you for your response.
As you suggested I have executed the query with more data. The visualization plan and the job profile generated are attached below.


Job profile:
e9cddd73-440f-462b-96a6-9ed4ffbff934.zip (35.0 KB)

I am also attaching another visualization plan and the job profile.


Job profile:
bfc14e40-7080-45a4-a52d-ed3e27fc3e5e.zip (57.6 KB)

Is the execution of minor fragments in leaf branches are parallel in both cases?

@JoiceJacob Irrespective of number of rows JDBC_SCAN is always single threaded. Other than this I see HASH_AGG phase is multi-threaded, what is the real issue here?

@balaji.ramaswamy
I will elaborate my doubt.
From the above visualization plan there are two JDBC scan.

  • Are both JDBC scans running in parallel?
    or

  • After the completion of one JDBC scan, the other JDBC scan is executed.

  • Same like the JDBC scan, are parquet scan running in parallel or one after another?

@JoiceJacob

Are both JDBC scans running in parallel? Yes
Same like the JDBC scan, are parquet scan running in parallel or one after another? Yes

There will be some CPU slicing based on the number of parallel threads in the scan (only for Parquet scan)

Hi @balaji.ramaswamy ,

We did some testing for HashJoin case, seem the two scans are always running one by one.

I think the pros to run one by one is it will not run another if the first one data is zero.

But is there any config to let them run in parallel?

@popejune Operators in same phase will not run in parallel while all phases start in parallel