Simple query trying to return billions of rows

We have a pretty simple query that took 33 minutes, 23 seconds and is trying to return almost 2 billion rows. It doesn’t match that many records in the DB.

Here’s the profile

d38ae1f1-0de4-4c1e-8e69-a033bf99607b.zip (2.1 MB)

This is running latest 3.0 of Dremio.

Hi @jhaynie

It looks like what is called an expanding and that can cause such behavior. See below on how your query progressed operator wise. It is always bottom up. We started at MAJOR FRAGMENT 25 JSON_SUB_SCAN (all the way to the bottom) and we were doing good until PROJECT MAJOR FRAGMENT 7 the PROJECT that had a result of 3,341 records. The subsequent HASH_JOIN, MAJOR FRAGMENT 6 has 1,545,905 but the output from the HASH_JOIN which is the PROJECT (MAJOR FRAGMENT 5) is 1,915,864,488 (~ 2B records)

Can you please check your JOIN and see if there are any JOINS missing or you have duplicate rows from the JOINS? Screenshot attached too

Kindly let us know if you have any questions

{“majorFragmentId”:0,“operatorId”:0,“name”:“SCREEN”,“records”:1}
{“majorFragmentId”:0,“operatorId”:1,“name”:“PROJECT”,“records”:1}
{“majorFragmentId”:0,“operatorId”:2,“name”:“WRITER_COMMITTER”,“records”:1}
{“majorFragmentId”:0,“operatorId”:3,“name”:“ARROW_WRITER”,“records”:1915864488}
{“majorFragmentId”:0,“operatorId”:4,“name”:“PROJECT”,“records”:1915864488}
{“majorFragmentId”:0,“operatorId”:5,“name”:“PROJECT”,“records”:1915864488}
{“majorFragmentId”:0,“operatorId”:6,“name”:“HASH_JOIN”,“records”:1545905}
{“majorFragmentId”:0,“operatorId”:7,“name”:“PROJECT”,“records”:3341}
{“majorFragmentId”:0,“operatorId”:8,“name”:“HASH_JOIN”,“records”:4583}
{“majorFragmentId”:0,“operatorId”:9,“name”:“PROJECT”,“records”:3341}
{“majorFragmentId”:0,“operatorId”:10,“name”:“PROJECT”,“records”:3341}
{“majorFragmentId”:0,“operatorId”:11,“name”:“HASH_JOIN”,“records”:3342}
{“majorFragmentId”:0,“operatorId”:12,“name”:“SELECTION_VECTOR_REMOVER”,“records”:3341}
{“majorFragmentId”:0,“operatorId”:13,“name”:“SELECTION_VECTOR_REMOVER”,“records”:3341}
{“majorFragmentId”:0,“operatorId”:14,“name”:“PROJECT”,“records”:3341}
{“majorFragmentId”:0,“operatorId”:15,“name”:“HASH_JOIN”,“records”:17}
{“majorFragmentId”:0,“operatorId”:16,“name”:“FILTER”,“records”:4174}
{“majorFragmentId”:0,“operatorId”:17,“name”:“FILTER”,“records”:4174}
{“majorFragmentId”:0,“operatorId”:18,“name”:“SELECTION_VECTOR_REMOVER”,“records”:3341}
{“majorFragmentId”:0,“operatorId”:19,“name”:“PROJECT”,“records”:16}
{“majorFragmentId”:0,“operatorId”:20,“name”:“PARQUET_ROW_GROUP_SCAN”,“records”:1}
{“majorFragmentId”:0,“operatorId”:21,“name”:“JSON_SUB_SCAN”,“records”:4174}
{“majorFragmentId”:0,“operatorId”:22,“name”:“JSON_SUB_SCAN”,“records”:4174}
{“majorFragmentId”:0,“operatorId”:23,“name”:“FILTER”,“records”:4174}
{“majorFragmentId”:0,“operatorId”:24,“name”:“PARQUET_ROW_GROUP_SCAN”,“records”:16}
{“majorFragmentId”:0,“operatorId”:25,“name”:“JSON_SUB_SCAN”,“records”:4174}

we found a table that wasn’t fully joined and that fixed it … thanks.