Hi Dremio Team and wider Community!
I’m running into a weird problem with filtering: I have a collection in MongoDB that contains documents with varying formats. Some look like this:
{
"_id": "100_20181010",
"time": {
"date": "2018-10-10T00:00:00.000Z",
"offset": 0
},
// ....
"subscribedApplications": [
"someData", "someOtherData"
]
}
SubscribedApplications might also be empty, i.e. []. Note that “date” has the time part set to 00:00:00.
Most of the documents, however, don’t have the “subscribedApplications” element at all, but look like this:
{
"_id": "362",
"time": {
"date": "2013-06-22T15:03:14.000Z",
"offset": 120
},
"source": "361",
"type": "c8y_TemperatureMeasurement",
//....
}
So, here we have Date+Timestamp in “date”, while the above version had time set to 00:00:00…
Dremio loads the raw data as follows:
For some reason, the “date” part is always incorrectly interpreted as a Date instead of DateTime (or String), removing the time part completely.
I’m only interested in the elements without “subscribedApplications”, but I need the timestamp, so I tried to filter and cast, like that:
SELECT "_id", cast(pmdata."time"."date" AS TIMESTAMP) AS "time", source, type, flatten("_seriesValueFragments") AS "_seriesValueFragments"
FROM C8Y_Edge.management.pmdata
WHERE subscribedApplications = NULL
I saved that as a virtual dataset (“measurements”) – timestamp is casted properly. Then, I tried to execute a few queries. Things like select *, count(), …, work fine, but filters are completely broken.
A query like
SELECT *
FROM c8y.measurements
WHERE "_id"='362'
returns an empty result set, but it should return exactly one record (same for source=‘361’ and so on).
It gets even worse when I further unnest the JSON. Then, filtering for things like “value” > 10 just don’t work, full result set (including records with value < 10) is returned…
The problem goes away when I remove “WHERE subscribedApplications = NULL” from my virtual dataset definition, but then I cannot cast to TIMESTAMP anymore. Manually excluding the first record type (fortunately, they all have source=100) also fixes the filtering problem…
Any idea what might be wrong here? I attach a small JSON export from the Mongo collection.
raw_data.zip (1.2 KB)
The full virtual dataset DDL:
virtualDataSet.zip (425 Bytes)
Query profile for a “_id”=X query:
id_eq_411.zip (6.8 KB)
Query profile for a “value > X” query:
val_gt_10.zip (7.4 KB)
Thanks, Tim