Hi,
I’m using Dremio build no. 20.1.0-202202061055110045-36733c65, community edition.
ES version 7.10.2.
A simple count(*) query translates to the following physical plan:
00-00 Screen : rowType = RecordType(VARCHAR(65536) Fragment, BIGINT Records, VARCHAR(65536) Path, VARBINARY(65536) Metadata, INTEGER Partition, BIGINT FileSize, VARBINARY(65536) IcebergMetadata, VARBINARY(65536) fileschema, VARBINARY(65536) ARRAY PartitionData): rowcount = 1.0, cumulative cost = {17158.1 rows, 17156.10011 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 415715
00-01 Project(Fragment=[$0], Records=[$1], Path=[$2], Metadata=[$3], Partition=[$4], FileSize=[$5], IcebergMetadata=[$6], fileschema=[$7], PartitionData=[$8]) : rowType = RecordType(VARCHAR(65536) Fragment, BIGINT Records, VARCHAR(65536) Path, VARBINARY(65536) Metadata, INTEGER Partition, BIGINT FileSize, VARBINARY(65536) IcebergMetadata, VARBINARY(65536) fileschema, VARBINARY(65536) ARRAY PartitionData): rowcount = 1.0, cumulative cost = {17158.0 rows, 17156.00011 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 415714
00-02 WriterCommitter(final=[/opt/dremio/data/pdfs/results/1d5f99df-8dac-05ba-4d82-a9d3f85f3d00]) : rowType = RecordType(VARCHAR(65536) Fragment, BIGINT Records, VARCHAR(65536) Path, VARBINARY(65536) Metadata, INTEGER Partition, BIGINT FileSize, VARBINARY(65536) IcebergMetadata, VARBINARY(65536) fileschema, VARBINARY(65536) ARRAY PartitionData): rowcount = 1.0, cumulative cost = {17157.0 rows, 17156.00002 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 415713
00-03 Writer : rowType = RecordType(VARCHAR(65536) Fragment, BIGINT Records, VARCHAR(65536) Path, VARBINARY(65536) Metadata, INTEGER Partition, BIGINT FileSize, VARBINARY(65536) IcebergMetadata, VARBINARY(65536) fileschema, VARBINARY(65536) ARRAY PartitionData): rowcount = 1.0, cumulative cost = {17156.0 rows, 17155.00002 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 415712
00-04 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {17155.0 rows, 17154.00002 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 415711
00-05 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {17154.0 rows, 17154.00001 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 415710
00-06 ElasticScan(table=[[elastic, recordings_catalog_prod, _doc]], resource=[recordings_catalog_prod/_doc], columns=[[`*`]], pushdown
=[{
"size" : 0,
"query" : {
"match_all" : {
"boost" : 1.0
}
}
}]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {17153.0 rows, 17154.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 415652
And the returned value is 10,000, less than the actual value. The plan details hint that Dremio actually has the right value internally, but it always returns 10,000. Also, at least to my understanding, the count is not even pushed down, so I guess 10k is the default max number of records Dremio gets form ES.