Dremio UI (Run) not scalling while query execution

Rituraj_Kumar · February 17, 2025, 9:58am

Hi Dremio Community,

I’m encountering an issue with executor utilization when running queries through the Dremio UI.

• When I execute a query using the preview option, all executors are effectively utilized.

• However, when I run the same query using the UI run option, it appears to use only a single executor.

Example:

I am creating a table in Nessie catalog from existing Iceberg tables stored in a GCS bucket using the following query:

CREATE TABLE nessie_catalog.db.new_table  
PARTITION BY (column_name)  
AS SELECT * FROM source_table;

When I run this query through the preview option, all executors are engaged efficiently. However, using the UI run option, only a single executor is utilized, leading to slower execution.

I’m unable to determine why this discrepancy exists. Is there a configuration setting or optimization that controls this behavior? Any insights or solutions would be greatly appreciated!

Thanks in advance.

Benny_Chow · February 19, 2025, 3:23am

Can you share the query profiles? They contain resource allocation information that tells about selected queues/engines, available nodes, cores/node and many other options that can control parallelism.

balaji.ramaswamy · February 19, 2025, 11:22pm

@Rituraj_Kumar Preview only applies to select statements, if you are running a CTAS then the parallelism depends on 2 factors, the number of cores and the row count estimates on the scan, look at the planning tab and go to FInal Physical transfomration and look at the rowcount estimate of the DATA SCAN TABLE_FUNCTION, for every 100K row count estimate there will be a thread created, if the row count estimate is very high that it > 75% of number of cores then then the latter takes precedence. As @Benny_Chow rightly said, send us the job profile and we can explain why single thread is used

Rituraj_Kumar · February 21, 2025, 10:47am

Hi @Benny_Chow and @balaji.ramaswamy ,

Thank you for the detailed explanation! I will check the Final Physical Transformation section and review the row count estimate in the DATA SCAN TABLE_FUNCTION. Based on your explanation, it seems like the parallelism is being limited by either the estimated row count or the number of available cores correct?

I have attached the query profile ZIP file for further analysis. Please take a look, and let me know if you find any insights.

Appreciate your help!
bb11190c-5362-46d7-b647-907d35933509.zip (23.2 KB)

balaji.ramaswamy · February 24, 2025, 9:13pm

@Rituraj_Kumar Seems like a very old issue where rowcount would be set to 1.0, can you please try the below query and send the profiles?

alter pds gcs_storage_nabu_poc.stg_bet forget metadata; --will remove reflections/acls on the pds
alter pds gcs_storage_nabu_poc.stg_bet refresh metadata;
select count(*) from gcs_storage_nabu_poc.stg_bet;
select * from gcs_storage_nabu_poc.stg_bet;

Send the 4 profiles plus when you ran the refresh metadata it would have generated an internal job like below and that profile too

REFRESH DATASET gcs_storage_nabu_poc.stg_bet


Thanks
Bali

Rituraj_Kumar · February 26, 2025, 1:00pm

Hi @balaji.ramaswamy,

I have completed the steps you provided and am attaching the requested profiles. However, Step 3 select count(*) from gcs_storage_nabu_poc.stg_bet is taking a long time to complete as it is running on a single thread.

For now, I am sharing the current profiles, and I will send the updated profile once the refresh process is finished.

Please let me know if you need anything else.
step_1_alter_pds_forget_metadata.zip (7.4 KB)
step2_refresh_metadata.zip (7.5 KB)
step_4_select_all.zip (21.4 KB)
step_3__count_all_rows.zip (13.2 KB)

Rituraj_Kumar · February 26, 2025, 2:13pm

Below is the updated profile file for Step 3, which includes the following SQL command: SELECT COUNT(*) FROM gcs_storage_nabu_poc.stg_bet.

step_3_count_all_rows_completed.zip (14.0 KB)

balaji.ramaswamy · March 5, 2025, 11:27am

@Rituraj_Kumar I still see rowcount 1.0, wondering if it is specific to Nessie catalog

Are you able to create an iceberg on say a Hive/Glue catalog, and do a simple count or select * and send over the profile?

Rituraj_Kumar · March 11, 2025, 1:24pm

@balaji.ramaswamy - I decided to skip the Nessie catalog for now and created the iceberg on Dremio using Parquet files. I am now able to scale properly in both run and preview modes.

Topic		Replies	Views
The UI preview is very slow compared to the run	11	1506	May 5, 2020
Dremio Query very slow	18	2981	December 27, 2019
Unable to read data more than 10K when i upload in Dremio	2	1543	February 15, 2019
No executors available to assign work	10	2657	July 25, 2017
Number of Threads	11	2184	January 18, 2022

Dremio UI (Run) not scalling while query execution

Related topics