Dremio returns wrong value on simple count(*) on Elasticsearch

Hi,
I’m using Dremio build no. 20.1.0-202202061055110045-36733c65, community edition.
ES version 7.10.2.
A simple count(*) query translates to the following physical plan:

00-00    Screen : rowType = RecordType(VARCHAR(65536) Fragment, BIGINT Records, VARCHAR(65536) Path, VARBINARY(65536) Metadata, INTEGER Partition, BIGINT FileSize, VARBINARY(65536) IcebergMetadata, VARBINARY(65536) fileschema, VARBINARY(65536) ARRAY PartitionData): rowcount = 1.0, cumulative cost = {17158.1 rows, 17156.10011 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 415715
00-01      Project(Fragment=[$0], Records=[$1], Path=[$2], Metadata=[$3], Partition=[$4], FileSize=[$5], IcebergMetadata=[$6], fileschema=[$7], PartitionData=[$8]) : rowType = RecordType(VARCHAR(65536) Fragment, BIGINT Records, VARCHAR(65536) Path, VARBINARY(65536) Metadata, INTEGER Partition, BIGINT FileSize, VARBINARY(65536) IcebergMetadata, VARBINARY(65536) fileschema, VARBINARY(65536) ARRAY PartitionData): rowcount = 1.0, cumulative cost = {17158.0 rows, 17156.00011 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 415714
00-02        WriterCommitter(final=[/opt/dremio/data/pdfs/results/1d5f99df-8dac-05ba-4d82-a9d3f85f3d00]) : rowType = RecordType(VARCHAR(65536) Fragment, BIGINT Records, VARCHAR(65536) Path, VARBINARY(65536) Metadata, INTEGER Partition, BIGINT FileSize, VARBINARY(65536) IcebergMetadata, VARBINARY(65536) fileschema, VARBINARY(65536) ARRAY PartitionData): rowcount = 1.0, cumulative cost = {17157.0 rows, 17156.00002 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 415713
00-03          Writer : rowType = RecordType(VARCHAR(65536) Fragment, BIGINT Records, VARCHAR(65536) Path, VARBINARY(65536) Metadata, INTEGER Partition, BIGINT FileSize, VARBINARY(65536) IcebergMetadata, VARBINARY(65536) fileschema, VARBINARY(65536) ARRAY PartitionData): rowcount = 1.0, cumulative cost = {17156.0 rows, 17155.00002 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 415712
00-04            Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {17155.0 rows, 17154.00002 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 415711
00-05              Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {17154.0 rows, 17154.00001 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 415710
00-06                ElasticScan(table=[[elastic, recordings_catalog_prod, _doc]], resource=[recordings_catalog_prod/_doc], columns=[[`*`]], pushdown
 =[{
  "size" : 0,
  "query" : {
    "match_all" : {
      "boost" : 1.0
    }
  }
}]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {17153.0 rows, 17154.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 415652

And the returned value is 10,000, less than the actual value. The plan details hint that Dremio actually has the right value internally, but it always returns 10,000. Also, at least to my understanding, the count is not even pushed down, so I guess 10k is the default max number of records Dremio gets form ES.

This seems to require modifying the configuration of the ES database.

Thanks @bigfacewo , but I think the real solution is to push the count operation down to ES, isn’t it?

@bigfacewo

Can we have the profile please? also what happens if you run the below pushdown directly on elastic?

[{
  "size" : 0,
  "query" : {
    "match_all" : {
      "boost" : 1.0
    }
  }
}]