Slow insert into table

I am using dbt-dremio to run an incremental model.

In the first phase, it creates a __dbt_tmp table.

It runs pretty quickly.


(Screenshot: Fast creation of __dbt_tmp)

After that, in the second phase, it selects * from the __dbt_tmp, and then insert into the target table.

The second phase, however, runs pretty slowly.


(Screenshot: Very slow insert from__dbt_tmp into target table)

Attached below is the job profile of the long-running insert.

e0ae6a1c-8dc2-4565-9881-e460d2e5c7ca.zip (20.0 KB)

Not too sure what the cause is.

@Ken Where is the data on table Minio.finance.l1.rollup_daily_hk_stock_shareholding_change__dbt_tmp stored? From the name it look like MinIO. All the time is spent on wait time for table_function 03-xx-02. Expand Operator Metrics and you will see NUM_CACHE_MISSES are matching with NUM_CACHE_HITS which also match with the total number of readers. Looks like a reporting issue that I will follow up. Total datafiles to be read are 30,162. Can you run select * from this table a few times (from JDBC as UI will truncate) and then try the CTAS?

Hi @balaji.ramaswamy

Thanks for following up.

I have this job running once every day at night.

Attached below is the log file in yesterday’s run.

c1e8234f-2eb5-4344-8a7f-a182f1705074.zip (22.9 KB)

It is still running 33 minutes for the simple insert.

=========

In fact, I see that the __dbt_tmp table (i.e. the source table) is created with a partition on sehk_code.

Actually, I just want the destination table with the partition.

I feel like the insert is slow because there’s too many partition in the source table.

Let me try to change/remove the partition and see how long it runs tonight

I changed the partition_by to another field called as_of_date and the (INSERT INTO xxxx select * from yyy) is now much faster.

I believe the problem was the partition I chose.

Yes, the problem was with over partitioning the source table. You had 30K parquet files with about 15 records per file. There’s too much overhead with opening and closing so many files for such a small dataset.

Were you able to get the incremental model to work successfully? I.e. was there an incremental filter present in the query that was used to materialize that dbt_tmp table?

Hi Benny,

Thanks for following up.

The issue was not with __dbt_tmp.

__dbt_tmp was created with lots of small partition and it caused problem in the second stage of selecting from __dbt_tmp into the final table.

But the issue was already resolved after I changed to another partition key for the incremental model.

I used date as the key, which is more appropriate, as I got new rows every day (and they all fall into the new day’s partition).

Thank you!

Hey there,
I totally got your query,the insertion process from the __dbt_tmp table to the target table is slower than expected. Analyzing the job profile provided can help identify any bottlenecks or inefficiencies. Optimizing SQL queries and ensuring proper Dremio environment configuration may improve performance. If needed, seek assistance from the dbt-dremio community or support team.

1 Like

Hello Benny, please can you tech me how can you identify this in Query profile? this could be very helpful for other analyze jobs

You can use either the Visual Profile or the Query Profile’s Operator metrics to get partitioning and row count information. Basically, every Iceberg table scan is going to have a pattern that looks like this:

ICEBERG_SUB_SCAN → TABLE_FUNCTION → OTHER OPERATORS → TABLE_FUNCTION

ICEBERG_SUB_SCAN is a scan against the Iceberg manifest file and outputs the number of manifest files (after partition field pruning).

TABLE_FUNCTION is a scan against the manifest files and outputs the number of data files (after partition and non-partition field pruning).

The second TABLE_FUNCTION is the distributed scan against the Parquet data files which could also include row group pruning. (See NUM_ROW_GROUPS_PRUNED) The output is the total number of records read from the Iceberg table. The NUM_CACHE_HITS and NUM_CACHE_MISSES is particularly important here for IO performance because it tells you about the state of the C3 cache.

OTHER OPERATOR is a special case to handle data file pruning on filter expressions that cannot be pushed down into the Iceberg manifest file scan.

The visual profile will sum the output rows across threads for each operator whereas the operator metrics will breakdown the max records per thread. This just sums back to the same total shown in the visual profile.

Hope that helps!