Subquery executes to empty result making EXISTS/NOT EXISTS fail when inside of the subquery outer query column is nullable. Here i made very simple example to reproduce it, cos actual query i use is pretty complex:
so table(parquet file on s3, but same with aws glue) where ID is nullable float:
ID Text 0 1.0 One 1 NaN Two 2 2.0 Three
SELECT * FROM "table_nullable.parquet" areturns 3 rows
SELECT * FROM "table_nullable.parquet" a where a.ID is nullreturns 1 rows.
SELECT * FROM "table_nullable.parquet" a where exists(select * from "table_nullable.parquet" b)returns 3 rows
SELECT * FROM "table_nullable.parquet" a where exists(select * from "table_nullable.parquet" b WHERE A.ID IS NULL)returns empty result set, but should return 1 row.
using WITH + COALESCE helps:
WITH fixed AS (SELECT COALESCE(ID, -1) ID, "Text" FROM "table_nullable.parquet") SELECT * FROM fixed a where exists(select * from fixed b where a.ID = -1)
returns exactly 1 row
Check out last query has predicate with outer
a table referenced as
a.id is null which is true for 1 rows in outer query always.
This example is oversimplified and senseless, but in real use cases it makes such queries fail:
select * from table a where [not] exists (select * from table b where (b.x=a.x or (b.x is null and a.x is null)) and b.date > a.date)
coalesce(b.x,..)=coalesce(a.x,..) doesn help. only work around this issue is to wrap main query in
coalesce(x,0) instead of x.
Attached profile of the 4th failed query
190ad7c5-e167-4630-928b-c363ee06bc75.zip (10.9 КБ)
AWS Edition (activated)