Improving reflection performance

Hi,

What parameters are available to fiddle with and improve reflection performance? Is there any concurrency and partition control to achieve that?

I am running a single node dremio server on a c5.x ec2. Was able to cut down latency on a big dataset(15 GB) from 25 to 8 secs on representative queries by changing query queue and concurrent reflection control from the admin panel.

Any other suggestions?

What is the data source and what is the performance without reflections?

Can you share your query profile?

Where are you storing your reflections? EBS?

select "accounts"."STATE" AS accountsSTATE, MAX( "measures"."BALANCE") AS measuresBALANCE 

from measures as measures, accounts as accounts 

where 1=1 AND "measures"."ACCT_NO"="accounts"."ACCT_NO" AND "measures"."PERIOD_END" >= '1921-01-01' AND "measures"."PERIOD_END" < '2018-03-20' 

group by "accounts"."STATE" 


order by measuresBALANCE desc
limit 10 

00-00 Screen : rowType = RecordType(VARCHAR(65536) accountsSTATE, DOUBLE measuresBALANCE): rowcount = 10.0, cumulative cost = {6.258067439200002E7 rows, 4.73785018861381E8 cpu, 3.0017523200000003E7 io, 6.453442842111998E9 network, 1.8503958528000003E8 memory}, id = 61902
00-01 ComplexToJson : rowType = RecordType(VARCHAR(65536) accountsSTATE, DOUBLE measuresBALANCE): rowcount = 10.0, cumulative cost = {6.258067339200002E7 rows, 4.73785017861381E8 cpu, 3.0017523200000003E7 io, 6.453442842111998E9 network, 1.8503958528000003E8 memory}, id = 61901
00-02 Project(accountsSTATE=[$0], measuresBALANCE=[$1]) : rowType = RecordType(VARCHAR(65536) accountsSTATE, DOUBLE measuresBALANCE): rowcount = 10.0, cumulative cost = {6.258066339200002E7 rows, 4.73785007861381E8 cpu, 3.0017523200000003E7 io, 6.453442842111998E9 network, 1.8503958528000003E8 memory}, id = 61900
00-03 Limit(fetch=[10]) : rowType = RecordType(VARCHAR(65536) STATE, DOUBLE measuresBALANCE): rowcount = 10.0, cumulative cost = {6.258065339200002E7 rows, 4.73784987861381E8 cpu, 3.0017523200000003E7 io, 6.453442842111998E9 network, 1.8503958528000003E8 memory}, id = 61899
00-04 TopN(limit=[10]) : rowType = RecordType(VARCHAR(65536) STATE, DOUBLE measuresBALANCE): rowcount = 25001.37599999999, cumulative cost = {6.258064339200002E7 rows, 4.73784947861381E8 cpu, 3.0017523200000003E7 io, 6.453442842111998E9 network, 1.8503958528000003E8 memory}, id = 61898
00-05 RoundRobinExchange : rowType = RecordType(VARCHAR(65536) STATE, DOUBLE measuresBALANCE): rowcount = 25001.37599999999, cumulative cost = {6.255564201600002E7 rows, 4.73452736768E8 cpu, 3.0017523200000003E7 io, 6.453442842111998E9 network, 1.8503958528000003E8 memory}, id = 61897
01-01 HashAgg(group=[{0}], measuresBALANCE=[MAX($1)]) : rowType = RecordType(VARCHAR(65536) STATE, DOUBLE measuresBALANCE): rowcount = 25001.37599999999, cumulative cost = {6.2530640640000015E7 rows, 4.7325272576E8 cpu, 3.0017523200000003E7 io, 6.248631569919998E9 network, 1.8503958528000003E8 memory}, id = 61896
01-02 Project(STATE=[$0], measuresBALANCE=[$1]) : rowType = RecordType(VARCHAR(65536) STATE, DOUBLE measuresBALANCE): rowcount = 500027.5199999999, cumulative cost = {6.203061312000001E7 rows, 4.6325217536E8 cpu, 3.0017523200000003E7 io, 6.248631569919998E9 network, 1.7623910092800003E8 memory}, id = 61895
01-03 HashToRandomExchange(dist0=[[$0]]) : rowType = RecordType(VARCHAR(65536) STATE, DOUBLE measuresBALANCE, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 500027.5199999999, cumulative cost = {6.153058560000001E7 rows, 4.6225212032E8 cpu, 3.0017523200000003E7 io, 6.248631569919998E9 network, 1.7623910092800003E8 memory}, id = 61894
02-01 UnorderedMuxExchange : rowType = RecordType(VARCHAR(65536) STATE, DOUBLE measuresBALANCE, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 500027.5199999999, cumulative cost = {6.1030558080000006E7 rows, 4.5425168E8 cpu, 3.0017523200000003E7 io, 1.0429340416E8 network, 1.7623910092800003E8 memory}, id = 61893
03-01 Project(STATE=[$0], measuresBALANCE=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32($0)]) : rowType = RecordType(VARCHAR(65536) STATE, DOUBLE measuresBALANCE, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 500027.5199999999, cumulative cost = {6.053053056E7 rows, 4.5375165248E8 cpu, 3.0017523200000003E7 io, 1.0429340416E8 network, 1.7623910092800003E8 memory}, id = 61892
03-02 HashAgg(group=[{0}], measuresBALANCE=[MAX($1)]) : rowType = RecordType(VARCHAR(65536) STATE, DOUBLE measuresBALANCE): rowcount = 500027.5199999999, cumulative cost = {6.003050304E7 rows, 4.5025145984000003E8 cpu, 3.0017523200000003E7 io, 1.0429340416E8 network, 1.7623910092800003E8 memory}, id = 61891
03-03 Project(STATE=[$3], measuresBALANCE=[$1]) : rowType = RecordType(VARCHAR(65536) STATE, DOUBLE measuresBALANCE): rowcount = 1.00005504E7, cumulative cost = {5.002995264E7 rows, 2.5024045184000003E8 cpu, 3.0017523200000003E7 io, 1.0429340416E8 network, 229413.888 memory}, id = 61890
03-04 HashJoin(condition=[=($0, $2)], joinType=[inner]) : rowType = RecordType(BIGINT ACCT_NO, DOUBLE measuresBALANCE, BIGINT ACCT_NO0, VARCHAR(65536) STATE): rowcount = 1.00005504E7, cumulative cost = {4.002940224E7 rows, 2.3023935104000002E8 cpu, 3.0017523200000003E7 io, 1.0429340416E8 network, 229413.888 memory}, id = 61889
03-06 Project(ACCT_NO=[$0], measuresBALANCE=[$1]) : rowType = RecordType(BIGINT ACCT_NO, DOUBLE measuresBALANCE): rowcount = 1.00005504E7, cumulative cost = {3.0001651200000003E7 rows, 7.00038528E7 cpu, 3.0001651200000003E7 io, 3.0001651200000003E7 network, 0.0 memory}, id = 61883
03-08 Project(ACCT_NO=[$1], BALANCE=[$2]) : rowType = RecordType(BIGINT ACCT_NO, DOUBLE BALANCE): rowcount = 1.00005504E7, cumulative cost = {2.00011008E7 rows, 5.0002752E7 cpu, 3.0001651200000003E7 io, 3.0001651200000003E7 network, 0.0 memory}, id = 61882
03-09 ParquetScan(table=["__accelerator".“258e9e3e-ee39-4dd3-959c-55f27b98cf46”.“f6f2dbb1-93ef-407f-8997-96cd7a4779ee”], columns=[PERIOD_END, ACCT_NO, BALANCE], splits=[2], filters=[[Filter on PERIOD_END: booleanAnd(greater_than_or_equal_to(PERIOD_END, cast( -1546300800000 as TIMESTAMP)) , less_than(PERIOD_END, cast( 1521504000000 as TIMESTAMP)) ) ]]) : rowType = RecordType(TIMESTAMP(0) PERIOD_END, BIGINT ACCT_NO, DOUBLE BALANCE): rowcount = 1.00005504E7, cumulative cost = {1.00005504E7 rows, 3.0001651200000003E7 cpu, 3.0001651200000003E7 io, 3.0001651200000003E7 network, 0.0 memory}, id = 61881
03-05 Project(ACCT_NO0=[$0], STATE=[$1]) : rowType = RecordType(BIGINT ACCT_NO0, VARCHAR(65536) STATE): rowcount = 1130.8799999999997, cumulative cost = {26069.760000000002 rows, 217644.80000000002 cpu, 15872.0 io, 7.429175296E7 network, 209510.40000000002 memory}, id = 61888
03-07 BroadcastExchange : rowType = RecordType(BIGINT ACCT_NO, VARCHAR(65536) STATE): rowcount = 1130.8799999999997, cumulative cost = {24938.88 rows, 215383.04 cpu, 15872.0 io, 7.429175296E7 network, 209510.40000000002 memory}, id = 61887
04-01 HashAgg(group=[{0, 1}]) : rowType = RecordType(BIGINT ACCT_NO, VARCHAR(65536) STATE): rowcount = 1130.8799999999997, cumulative cost = {23808.0 rows, 206336.0 cpu, 15872.0 io, 6.5027584E7 network, 209510.40000000002 memory}, id = 61886
04-02 UnionExchange : rowType = RecordType(BIGINT ACCT_NO, VARCHAR(65536) STATE): rowcount = 7936.0, cumulative cost = {15872.0 rows, 79360.0 cpu, 15872.0 io, 6.5027584E7 network, 0.0 memory}, id = 61885
05-01 ParquetScan(table=["__accelerator".“dd9d92c6-78de-4cbf-89e6-4d84001beb47”.“1482bfce-1dad-4a1b-b9bd-d8c16075fbdf”], columns=[ACCT_NO, STATE], splits=[1]) : rowType = RecordType(BIGINT ACCT_NO, VARCHAR(65536) STATE): rowcount = 7936.0, cumulative cost = {7936.0 rows, 15872.0 cpu, 15872.0 io, 15872.0 network, 0.0 memory}, id = 61884

Are you using a raw reflections or an aggregation reflection? For this query an aggregation reflection would be appropriate.

One of my colleagues may comment on your query plan.

I was trying to create aggregation reflection for that dataset. But somehow the server dies when I define it.

What is the error? You can go to jobs and filter on acceleration.

Nothing in the job logs. Looks like I’ll have to enable detailed debug logging to see what is really happening/