Dremio query forming

Hello! I am performing the following SQL-Query on a Postgres database.

select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date ‘1998-12-01’ - interval ‘90’ day
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;

Everything works as expected. But when I look into the postgres server query store, I see that Dremio is forming my initial SQL-Query into a slightly different one. This is the SQL Query that Dremio performs on my postgres server.

SELECT
“l_returnflag”,
“l_linestatus”,
CAST(
CASE WHEN COUNT(“l_quantity”) = $ 1 THEN $ 2 ELSE CAST(SUM(“l_quantity”) AS NUMERIC(38, 6)) END AS NUMERIC(38, 6)
) AS “sum_qty”,
CAST(
CASE WHEN COUNT(“l_extendedprice”) = $ 3 THEN $ 4 ELSE CAST(SUM(“l_extendedprice”) AS NUMERIC(38, 6)) END AS NUMERIC(38, 6)
) AS “sum_base_price”,
CAST(
CASE WHEN COUNT(“$f4”) = $ 5 THEN $ 6 ELSE CAST(SUM(“$f4”) AS NUMERIC(38, 6)) END AS NUMERIC(38, 6)
) AS “sum_disc_price”,
CAST(
CASE WHEN COUNT(“$f5”) = $ 7 THEN $ 8 ELSE CAST(SUM(“$f5”) AS NUMERIC(38, 6)) END AS NUMERIC(38, 6)
) AS “sum_charge”,
(
CAST(
CASE WHEN COUNT(“l_quantity”) = $ 9 THEN $ 10 ELSE CAST(SUM(“l_quantity”) AS NUMERIC(38, 6)) END AS DOUBLE PRECISION
) / COUNT(“l_quantity”)
) AS “avg_qty”,
(
CAST(
CASE WHEN COUNT(“l_extendedprice”) = $ 11 THEN $ 12 ELSE CAST(SUM(“l_extendedprice”) AS NUMERIC(38, 6)) END AS DOUBLE PRECISION
) / COUNT(“l_extendedprice”)
) AS “avg_price”,
(
CAST(
CASE WHEN COUNT(“l_discount”) = $ 13 THEN $ 14 ELSE CAST(SUM(“l_discount”) AS NUMERIC(38, 6)) END AS DOUBLE PRECISION
) / COUNT(“l_discount”)
) AS “avg_disc”,
COUNT(*) AS “count_order”
FROM
(
SELECT
“l_returnflag”,
“l_linestatus”,
“l_quantity”,
“l_extendedprice”,
(“l_extendedprice” * ((1 - “l_discount”))) AS “$f4”,
(
(“l_extendedprice” * ((1 - “l_discount”))) * ((1 + “l_tax”))
) AS “$f5”,
“l_discount”
FROM
(
SELECT
“lineitem”.“l_quantity”,
“lineitem”.“l_extendedprice”,
“lineitem”.“l_discount”,
“lineitem”.“l_tax”,
“lineitem”.“l_returnflag”,
“lineitem”.“l_linestatus”,
“lineitem”.“l_shipdate”
FROM
“public”.“lineitem”
) AS “lineitem”
WHERE
“l_shipdate” <= DATE ‘1998-12-01’ - interval ‘90’ day
) AS “lineitem”
GROUP BY
“l_returnflag”,
“l_linestatus”
ORDER BY
“l_returnflag”,
“l_linestatus”

Is there any documentation on why Dremio does that? And especially why forming the initial statement to this particular one?

Thanks for any help!

@vincent_mayer Dremio uses Advanced Relational Pushdown (ARP) framework to rewrite the SQL in native RDBMS language as sometimes the SQL written on Dremio may be using something that is not available in Postgres. If you do not want Dremio to pushdown using ARP then you can use the External query feature

https://docs.dremio.com/cloud/sql/commands/copy-into-table/

Thanks! Do you know where I can learn more about the ARP Framework?

@vincent_mayer Does the OSS code does not help?

@balaji.ramaswamy I think got it. How does the forming work for a NoSQL database, e.g. MongoDB? I can find the driver but there is no arp.yaml file where the rules for forming are specified.

@balaji.ramaswamy Do you also know how Dremio performs queries on NoSQL and SQL Databases simultaneously? How does Dremio connect data from a SQL and a NoSQL Database and generate the result?

@vincent_mayer Dremio will generate the pushdown separately to the 2 sources and the JOIN will happen in memory on the Dremio executors

@balaji.ramaswamy Is there any documentation on how the join will happen in memory on the Dremio executors?

@vincent_mayer Is your question related to which type of exchange happens during a join like hashtorandom or broadcast etc?

@balaji.ramaswamy Yes :slight_smile:

@vincent_mayer Look at sys.options for the threshold for broadcast and that is one of the major input factors that affects the type of the exchange, let me know if you are unable to find the key

@balaji.ramaswamy where can I find sys.options?

@vincent_mayer Table in Dremio

@vincent_mayer have you tried using the table(external_query()) function?

@balaji.ramaswamy Can you tell me how to find the key?

@vincent_mayer Would you be able to send me the profile so I can look at the plan and see what is going on?

@balaji.ramaswamy
9942af70-e17e-4a1b-a9f9-b59e1d2e15a9.zip (32,3 KB)
This is the job profile. My questions basically are:

  • What part of the system decides which data gets joined?
  • And what criteria does the system use to pick the right join?

Glad for any help regarding these questions.

@vincent_mayer

Is the below push down not using the right SQL plan on your backend DB?

SELECT "part"."p_partkey" AS "p_partkey", "part"."p_name", "part"."p_mfgr" AS "p_mfgr", "part"."p_brand" AS "p_brand", "part"."p_type", "part"."p_size" AS "p_size", "part"."p_container" AS "p_container", "part"."p_retailprice" AS "p_retailprice", "part"."p_comment", "supplier"."s_suppkey", "supplier"."s_name", "supplier"."s_address" COLLATE "C" AS "s_address", "supplier"."s_nationkey", "supplier"."s_phone", "supplier"."s_acctbal", "supplier"."s_comment" COLLATE "C" AS "s_comment"
FROM (SELECT "part"."p_partkey", "part"."p_name" COLLATE "C" AS "p_name", "part"."p_mfgr", "part"."p_brand", "part"."p_type" COLLATE "C" AS "p_type", "part"."p_size", "part"."p_container", "part"."p_retailprice", "part"."p_comment" COLLATE "C" AS "p_comment"
FROM "public"."part"
WHERE "part"."p_size" = 15 AND "part"."p_type" COLLATE "C" LIKE '%BRASS' COLLATE "C") AS "part"
INNER JOIN "public"."supplier" ON TRUE

@balaji.ramaswamy To be honest, I dont know. How can I check?

@vincent_mayer Let us try something else

Can you please try running the above SQL directly on your backend DB and see how much time it takes?