Error with dbt, dremio and incremental updates

Hi to you all.

I work with dremio and have the following

  • an iceberg table, I want to update
  • a view with differential source data that should be appended with dbt with an incremental update strategy to the target iceberg table.

When I’m running dbt run in vs code, I get the following error message:

e[0m14:52:25.120274 [error] [MainThread]: Runtime Error in model raw_demo_dbt_incr_updates (models/raw/demo_dbt_incr_updates/raw_demo_dbt_incr_updates.sql)
ERROR: SYSTEM ERROR: NoSuchElementException

SqlOperatorImpl NESTED_LOOP_JOIN
Location 0:0:25
ErrorOrigin: EXECUTOR
[Error Id: 19bd9ddd-6824-4d6e-b6d3-d24af8ba3bcd ]

(java.util.NoSuchElementException) null
  java.util.ArrayList$Itr.next():1000
  com.dremio.sabot.op.join.nlje.NLJEOperator.getInitialMatchState():188
  com.dremio.sabot.op.join.nlje.NLJEOperator.noMoreToConsumeRight():257
  com.dremio.sabot.driver.SmartOp$SmartDualInput.noMoreToConsumeRight():517
  com.dremio.sabot.driver.WyePipe.pump():83
  com.dremio.sabot.driver.Pipeline.doPump():134
  com.dremio.sabot.driver.Pipeline.pumpOnce():124
  com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():655
  com.dremio.sabot.exec.fragment.FragmentExecutor.run():560
  com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():1234
  com.dremio.sabot.task.AsyncTaskWrapper.run():130
  com.dremio.sabot.task.single.DedicatedFragmentRunnable.run():53
  java.util.concurrent.Executors$RunnableAdapter.call():515
  java.util.concurrent.FutureTask.run():264
  java.util.concurrent.ThreadPoolExecutor.runWorker():1128
  java.util.concurrent.ThreadPoolExecutor$Worker.run():628
  java.lang.Thread.run():829

How to fix this error ?

@memyself This job should have produced a profile, are you able to download the profile and send it?

Hi balaji,

I’m sending you some data.

The table I want to update incrementally is stored unter the nessie main branch.

the dremio sql code from the log file:

/* {“app”: “dbt”, “dbt_version”: “1.9.6”, “profile_name”: “dpf_dremio_project”, “target_name”: “dev”, “node_id”: “model.dpf_dremio_project.raw_demo_dbt_incr_updates”} */

create table “$scratch”.“raw”.“raw_demo_dbt_incr_updates__dbt_tmp”

as (

select

id,
dater ,
"domain_name",

"ip_v4_address",
"ip_v6_address",
url,
city

from “test-nessie-dev”.“raw”.“raw_demo_dbt_incr_updates_view”

WHERE id > (SELECT MAX(id) FROM “$scratch”.“raw”.“raw_demo_dbt_incr_updates”)

)

Here I get the error: NoSuchelementException

Regards,

memyself

@memyself Got it! The failure would generate a job, run the create table again and once it fails, click on jobs page, find the failed job, download profile and send it. The error in the profile will tell where the error is happening

Hi Balaij,

when I look at the raw_profile in dremio I get:

Query register card:

/* {“app”: “dbt”, “dbt_version”: “1.9.6”, “profile_name”: “dpf_dremio_project”, “target_name”: “dev”, “node_id”: “model.dpf_dremio_project.raw_demo_dbt_incr_updates”} */

create table “$scratch”.“raw”.“raw_demo_dbt_incr_updates__dbt_tmp”

the $scratch variable is the problem.
When i use the database name , here: Nessie catalog is used,
the table is created.
That seems an error for me. Can you approve it ?

In the Error register card, I get the message from above:

SYSTEM ERROR: NoSuchElementException

SqlOperatorImpl NESTED_LOOP_JOIN
Location 0:0:17
ErrorOrigin: EXECUTOR
[Error Id: dadcf005-5f56-4dbe-b4f0-38959c4a7ad5 ]

(java.util.NoSuchElementException) null
java.util.ArrayList$Itr.next():1000
com.dremio.sabot.op.join.nlje.NLJEOperator.getInitialMatchState():188
com.dremio.sabot.op.join.nlje.NLJEOperator.noMoreToConsumeRight():257
com.dremio.sabot.driver.SmartOp$SmartDualInput.noMoreToConsumeRight():517
com.dremio.sabot.driver.WyePipe.pump():83
com.dremio.sabot.driver.Pipeline.doPump():134
com.dremio.sabot.driver.Pipeline.pumpOnce():124
com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():655
com.dremio.sabot.exec.fragment.FragmentExecutor.run():560
com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():1234
com.dremio.sabot.task.AsyncTaskWrapper.run():130
com.dremio.sabot.task.single.DedicatedFragmentRunnable.run():53
java.util.concurrent.Executors$RunnableAdapter.call():515
java.util.concurrent.FutureTask.run():264
java.util.concurrent.ThreadPoolExecutor.runWorker():1128
java.util.concurrent.ThreadPoolExecutor$Worker.run():628
java.lang.Thread.run():829

What does your ~/.dbt/profiles.yml look like?

For example, with my dbt profile as below

Table will be created at hive_source.test_dbt

View will be created at test_dbt_space.test_dbt_space_folder

$scratch is the default value for storage_path/source in case you don’t specify it when you’re setting up DBT

Ahh, I guess I’m a step nearer to the goal.

I added the object_storage_source to the profiles.yml file. The mentioned error disappears.

But I get a new one:

dremio Job log: SQL:

create or replace view “test-nessie-dev”.“raw”.“raw_demo_dbt_incr_updates”

as select *
from
“test-nessie-dev”.“raw”.“raw_demo_dbt_incr_updates”

with error: A non-view table with given name [“test-nessie-dev”.raw.raw_demo_dbt_incr_updates] already exists in schema [“test-nessie-dev”.raw].

For my understanding: I’ve created an iceberg table which I want to incrementally update.
I’v entered an entry for the table in the schema.yml and a sql file with the name of the table.

Inside the sql file I’ve entered:

{{ config(
materialized=‘incremental’,
unique_key = ‘id’,
incremental_strategy=‘append’,
schema=‘raw’
) }}

What is wrong in my code ?

Need some info like

Your model file name, for example, tableA.sql will create/insert data into table/view at source.database.tableA
Also, full query of your model.sql, you only show the {{config}} path. Need source table and destination table
Also maybe show your storage/space in profiles.yml for more info.

For example, my case
testdbtdesinc.sql
{{
config(
materialized = ‘incremental’,
unique_key = ‘id’,
alias = ‘TestDBTDesInc’,
tags = [“example”]
)
}}
select
*
from hive_source.test_dbt.TestDBTSourceInc
{% if is_incremental() %}
where
id not in (select distinct id from {{ this }})
{% endif %}

Will insert new data from hive_source.test_dbt.TestDBTSourceInc into hive_source.test_dbt.testdbtdesinc table

dbt run --select testdbtdesinc.sql --debug

Generate 2 important queries

create table “hive_source”.“test_dbt”.“testdbtdesinc__dbt_tmp”

as (
– incremental cdc

select
*
from hive_source.test_dbt.TestDBTSourceInc

where
id not in (select distinct id from “hive_source”.“test_dbt”.“testdbtdesinc”)

)

and

insert into “hive_source”.“test_dbt”.“testdbtdesinc”( “id”, “data” )
select “id”, “data” from “hive_source”.“test_dbt”.“testdbtdesinc__dbt_tmp”

Hi [quangbilly], thanks for the fast answer. I’ve run dbt in debug mode and read the following steps:

  • drop table if exists “test-nessie-dev”.“raw”.“raw_demo_dbt_incr_updates__dbt_tmp”

  • create table “test-nessie-dev”.“raw”.“raw_demo_dbt_incr_updates__dbt_tmp”
    as select from my view

  • insert into “test-nessie-dev”.“raw”.“raw_demo_dbt_incr_updates”( “id”, “dater”, “domain_name”, “ip_v4_address”, “ip_v6_address”, “url”, “city” )
    select “id”, “dater”, “domain_name”, “ip_v4_address”, “ip_v6_address”, “url”, “city” from “test-nessie-dev”.“raw”.“raw_demo_dbt_incr_updates__dbt_tmp”

  • create or replace view “test-nessie-dev”.“raw”.“raw_demo_dbt_incr_updates”

    as select * from “test-nessie-dev”.“raw”.“raw_demo_dbt_incr_updates”

the last step produces the error:

ERROR: A non-view table with given name [“csp-dpf-nessie-dev”.raw.raw_demo_dbt_incr_updates] already exists in schema [“csp-dpf-nessie-dev”.raw]

Can you explain why the last step is executed ?

As I said, think you’re inserting a table into itself (raw_demo_dbt_incr_updates).

I asked the same question before.

What is the source table, and what is the destination table (the table that the data is written to)

And What is your model file name? Is it “raw_demo_dbt_incr_updates.sql”?

And also the content of that file