Subquery executes to empty result making EXISTS/NOT EXISTS fail when inside of the subquery outer query column is nullable. Here i made very simple example to reproduce it, cos actual query i use is pretty complex:
so table(parquet file on s3, but same with aws glue) where ID is nullable float:
ID Text
0 1.0 One
1 NaN Two
2 2.0 Three
SELECT * FROM "table_nullable.parquet" a
returns 3 rowsSELECT * FROM "table_nullable.parquet" a where a.ID is null
returns 1 rows.SELECT * FROM "table_nullable.parquet" a where exists(select * from "table_nullable.parquet" b)
returns 3 rowsSELECT * FROM "table_nullable.parquet" a where exists(select * from "table_nullable.parquet" b WHERE A.ID IS NULL)
returns empty result set, but should return 1 row.
using WITH + COALESCE helps:WITH fixed AS (SELECT COALESCE(ID, -1) ID, "Text" FROM "table_nullable.parquet") SELECT * FROM fixed a where exists(select * from fixed b where a.ID = -1)
returns exactly 1 row
Check out last query has predicate with outer a
table referenced as a.id is null
which is true for 1 rows in outer query always.
This example is oversimplified and senseless, but in real use cases it makes such queries fail:
select * from table a where [not] exists (select * from table b where (b.x=a.x or (b.x is null and a.x is null)) and b.date > a.date)
Even coalesce(b.x,..)=coalesce(a.x,..)
doesn help. only work around this issue is to wrap main query in with
with coalesce(x,0)
instead of x.
Attached profile of the 4th failed query
190ad7c5-e167-4630-928b-c363ee06bc75.zip (10.9 КБ)
Build
19.1.0-202111160130570172-0ee00450
AWS Edition (activated)