I have an index in elasticsearch index which contains two fields called “header” and “control”. Both of these contain nested JSON structures. I’ve pasted a sample below and full example is attached.
master.systemx.trade.eod.control.zip (1.2 KB)
I’m running a query in Dremio like the below:
SELECT header, data.header.batchId AS batchId, control, “_index”, “_type”, “_uid”, “_id”
FROM ES.“master.systemx.trade.eod.control”.data AS data where data.header.batchId = ‘1517350156032’
When I profile the query which dremio pushes down to elasticsearch I see that dremio does a “select all” on the index whereas one would expect an ES query which filters on the “data.header.batchId”.
ElasticScan(table=[[ES, master.systemx.trade.eod.control, data]], resource=[master.systemx.trade.eod.control/data], columns=[[header
, control
, _index
, _type
, _uid
, _id
]], pushdown
=[{
“from” : 0,
“size” : 1000,
“query” : {
“match_all” : {
“boost” : 1.0
}
}
For indexes which are very large this simple query takes very long as dremio pulls all the data from the index in whereas the running a query using lucene directly on elasticsearch it is instantaneous.
I know creating a reflection on the index definitely will help but is there a way for Dremio to push down the query to elasticsearch which would include the filter on the field which is part of a JSON structure or is this a limitation?
Sample from elasticsearch:
{
"_index" : "master.systemx.trade.eod.control",
"_type" : "data",
"_id" : "AWFJIXuEBISOowAUEIxL",
"_score" : 1.0,
"_source" : {
"header" : {
"messageId" : "systemx.ctrl.1517350386841",
"batchId" : "1517350156032",
"sourceSystem" : "systemx",
"secondarySourceSystem" : null,
"sourceSystemCreationTimestamp" : "2018-01-30T22:13:06.841Z",
"sentBy" : "systemx",
"sentTo" : "MYSYSTEM",
"messageType" : "Control",
"schemaVersion" : "0.4.16-SNAPSHOT",
"processing" : "Batch"
},
"control" : {
"action" : "End",
"subject" : "EOD",
"eodDate" : "2018-01-30",
"details" : "Trade Data Batch End",
"batchSizeIntended" : 33850,
"batchSizeSent" : 33850
}
}
},
{
"_index" : "master.systemx.trade.eod.control",
"_type" : "data",
"_id" : "AWJntEJZcqKwS7XmuU_E",
"_score" : 1.0,
"_source" : {
"header" : {
"messageId" : "systemx.ctrl.1522158289293",
"batchId" : "1522158098381",
"sourceSystem" : "systemx",
"secondarySourceSystem" : null,
"sourceSystemCreationTimestamp" : "2018-03-27T13:44:49.293Z",
"sentBy" : "systemx",
"sentTo" : "MYSYSTEM",
"messageType" : "Control",
"schemaVersion" : "0.4.21",
"processing" : "Batch"
},
"control" : {
"action" : "End",
"subject" : "EOD",
"eodDate" : "2018-03-23",
"details" : "Trade Data Batch End",
"batchSizeIntended" : 34525,
"batchSizeSent" : 34525
}
}
},
...