dremio 20.1
I have a glue catalog datasource we’ll call glueci which has a table with ~3000 partitions, the partition being queried has 1 row in it (assume column1=‘abc’ is the partition lookup). When ‘is not null’ is used and a specific column is selected it runs for a very long time (1-2 min), otherwise the response is <=1s. I tried to reproduce with an s3 datasource and same parquet file, but that seems to works fine. dropping/adding ‘glueci’ did not help.
-- * returns fine
SELECT *
FROM glueci.catalog1.table1
WHERE column1='abc' and column2=945737 and column3 is not null
-- not isnull() returns fine
SELECT column3
FROM glueci.catalog1.table1
WHERE column1='abc' and column2=945737 and not isnull(column3)
-- no 3rd condition returns fine
SELECT column3
FROM glueci.catalog1.table1
WHERE column1='abc' and column2=945737
-- this takes a long time
SELECT column3
FROM glueci.catalog1.table1
WHERE column1='abc' and column2=945737 and column3 is not null
-- this takes a long time
SELECT column3
FROM glueci.catalog1.table1
WHERE column1='abc' and column2=945737 and not(column3 is null)