Dremio returns wrong value on simple count(*) on Elasticsearch

Hi,
I’m using Dremio build no. 20.1.0-202202061055110045-36733c65, community edition.
ES version 7.10.2.
A simple count(*) query translates to the following physical plan:

00-00    Screen : rowType = RecordType(VARCHAR(65536) Fragment, BIGINT Records, VARCHAR(65536) Path, VARBINARY(65536) Metadata, INTEGER Partition, BIGINT FileSize, VARBINARY(65536) IcebergMetadata, VARBINARY(65536) fileschema, VARBINARY(65536) ARRAY PartitionData): rowcount = 1.0, cumulative cost = {17158.1 rows, 17156.10011 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 415715
00-01      Project(Fragment=[$0], Records=[$1], Path=[$2], Metadata=[$3], Partition=[$4], FileSize=[$5], IcebergMetadata=[$6], fileschema=[$7], PartitionData=[$8]) : rowType = RecordType(VARCHAR(65536) Fragment, BIGINT Records, VARCHAR(65536) Path, VARBINARY(65536) Metadata, INTEGER Partition, BIGINT FileSize, VARBINARY(65536) IcebergMetadata, VARBINARY(65536) fileschema, VARBINARY(65536) ARRAY PartitionData): rowcount = 1.0, cumulative cost = {17158.0 rows, 17156.00011 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 415714
00-02        WriterCommitter(final=[/opt/dremio/data/pdfs/results/1d5f99df-8dac-05ba-4d82-a9d3f85f3d00]) : rowType = RecordType(VARCHAR(65536) Fragment, BIGINT Records, VARCHAR(65536) Path, VARBINARY(65536) Metadata, INTEGER Partition, BIGINT FileSize, VARBINARY(65536) IcebergMetadata, VARBINARY(65536) fileschema, VARBINARY(65536) ARRAY PartitionData): rowcount = 1.0, cumulative cost = {17157.0 rows, 17156.00002 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 415713
00-03          Writer : rowType = RecordType(VARCHAR(65536) Fragment, BIGINT Records, VARCHAR(65536) Path, VARBINARY(65536) Metadata, INTEGER Partition, BIGINT FileSize, VARBINARY(65536) IcebergMetadata, VARBINARY(65536) fileschema, VARBINARY(65536) ARRAY PartitionData): rowcount = 1.0, cumulative cost = {17156.0 rows, 17155.00002 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 415712
00-04            Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {17155.0 rows, 17154.00002 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 415711
00-05              Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {17154.0 rows, 17154.00001 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 415710
00-06                ElasticScan(table=[[elastic, recordings_catalog_prod, _doc]], resource=[recordings_catalog_prod/_doc], columns=[[`*`]], pushdown
 =[{
  "size" : 0,
  "query" : {
    "match_all" : {
      "boost" : 1.0
    }
  }
}]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {17153.0 rows, 17154.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 415652

And the returned value is 10,000, less than the actual value. The plan details hint that Dremio actually has the right value internally, but it always returns 10,000. Also, at least to my understanding, the count is not even pushed down, so I guess 10k is the default max number of records Dremio gets form ES.

This seems to require modifying the configuration of the ES database.

Thanks @bigfacewo , but I think the real solution is to push the count operation down to ES, isn’t it?

@bigfacewo

Can we have the profile please? also what happens if you run the below pushdown directly on elastic?

[{
  "size" : 0,
  "query" : {
    "match_all" : {
      "boost" : 1.0
    }
  }
}]

Hello,

do you find out the root cause and solution ? I have the same issue when running count(*) with ES.

@hieuiph Are you able you able to attach the job profile?

4047a007-877b-436f-82d3-6bde69e74f66.zip (11,7 Ko)
Hello ,

I attach the profile. We use dremio 20.4

thanks

@hieuiph

Few questions

  • Dremio is scanning one record from ES, what is the right count?
  • If you do select * instead of count, does that return the right value? are you able to send the result?
  • What version of ES are you using?
  • What happens if you run the below directly on ES? How many records are returned
[{
  "size" : 0,
  "query" : {
    "match_all" : {
      "boost" : 1.0
    }
  }
}]