We have some weird query behaviour which we cannot explain.
Situation: we have a parquet file on HDFS which we are querying with Dremio.
Query 1:
profileQuery1.zip (13,6 KB)
Query 2: (same query as query1 but with a where clause on the group by column
profileQuery2.zip (12,0 KB)
As you can see the results are not equal before and after the filter related to the filtered data
Query 3:
profileQuery3.zip (12,2 KB)
As you can see now the results are equal as in query 1.
Same holds when we use btrim, rtrim or ltrim instead of lower.
How can this be explained?
Dremio Community Edition
Thanks in advance.
@DannyPannemans It looks like the push down is the issue. Is the Parquet file used contain sensitive data? If not, is it possible to share the Parquet file?
Well indeed there were sensitive data in there. I have masked that data and checked if the problem is still there and yes… the problem remains… but now it is safe to send you the data.
Please find it attached.
Kind regards,
(Attachment part-00000-8e5b9a5a-7d6c-481f-99e8-8581ba71b7bc-c000.snappy.parquet is missing)
Sorry… the attachment bounced in previous mail…
Well indeed there were sensitive data in there. I have masked that data and checked if the problem is still there and yes… the problem remains… but now it is safe to send you the data.
Please find it attached.
Kind regards,
(Attachment part-00000-8e5b9a5a-7d6c-481f-99e8-8581ba71b7bc-c000.snappy.zip is missing)
Did you receive the parquet data sent using WeTransfer as the attachment is too big to send it via mail?
Small reminder… Did you succeed in having a look at the data I provided.
Thanks a lot for your swift reply.
Kind regards
@DannyPannemans Apologies for the silence, I was out due to several reasons. Would you mind sending me the file again as it expired
Hope u are doing well Balaji.
Pls find below the 2nd attempt:
Thanks @DannyPannemans for the file, able to reproduce the issue (obviously
), will discuss this internally and get back to you
Hi Balaji ,
Do you have any update on this matter? This ‘bug’ makes our data products very unreliable…
Thanks in advance .
@DannyPannemans Sorry about the delay I created a bug and follow up