Data lost in query after using filter

Hi,

We have some weird query behaviour which we cannot explain.
Situation: we have a parquet file on HDFS which we are querying with Dremio.

Query 1:


profileQuery1.zip (13,6 KB)

Query 2: (same query as query1 but with a where clause on the group by column


profileQuery2.zip (12,0 KB)

As you can see the results are not equal before and after the filter related to the filtered data

Query 3:


profileQuery3.zip (12,2 KB)

As you can see now the results are equal as in query 1.
Same holds when we use btrim, rtrim or ltrim instead of lower.

How can this be explained?

Dremio Community Edition
12.1.0-202101041749050132-55c827cb

Thanks in advance.

Rgds,

Danny

@DannyPannemans It looks like the push down is the issue. Is the Parquet file used contain sensitive data? If not, is it possible to share the Parquet file?

Hi,

Well indeed there were sensitive data in there. I have masked that data and checked if the problem is still there and yes… the problem remains… but now it is safe to send you the data.
Please find it attached.

Kind regards,

Danny

(Attachment part-00000-8e5b9a5a-7d6c-481f-99e8-8581ba71b7bc-c000.snappy.parquet is missing)

Hi,

Sorry… the attachment bounced in previous mail…

Well indeed there were sensitive data in there. I have masked that data and checked if the problem is still there and yes… the problem remains… but now it is safe to send you the data.
Please find it attached.

Kind regards,

Danny

(Attachment part-00000-8e5b9a5a-7d6c-481f-99e8-8581ba71b7bc-c000.snappy.zip is missing)

Hi,

Did you receive the parquet data sent using WeTransfer as the attachment is too big to send it via mail?

Danny

Hi,

Small reminder… Did you succeed in having a look at the data I provided.

Thanks a lot for your swift reply.

Kind regards

Danny

@DannyPannemans Apologies for the silence, I was out due to several reasons. Would you mind sending me the file again as it expired