Failed to update materialization cache. Stripped hash doesn't match expect stripped hash

Hi Everyone.

We have met a strange issue. When refreshing a reflection, Dremio will drop it after 4 hour automatically.
We saw some exception message in server log. below is the stacktrace:

2022-04-11 10:10:02,689 [dremio-general-4395] WARN c.d.s.reflection.ReflectionManager - failed to update materialization cache for 3fe013b7-70f1-4143-a7c5-10d63ab54798/7318b383-151e-4757-b5fd-1a904c9ac722
com.dremio.exec.planner.acceleration.MaterializationExpander$ExpansionException: Stripped hash doesn’t match expect stripped hash. Stripped logic likely changed. Non-matching plan: LogicalAggregate(group=[{0, 1, 2, 3, 4, 5, 6, 7, 8}], CONVERT_COUNT_STAR=[COUNT($9)])
LogicalProject(event_name=[$22], app_version=[$33], reg_vid=[$45], start_engine_id=[$68], trigger_cause=[$76], current_country=[$97], current_state=[$98], current_city=[$99], slogtime_datemonth=[DATE_TRUNC(‘MONTH’, $111)], $f9=[1])
ScanCrel(table=[GlueCatalog.“kipawa-recent-v8”.ford_sync], columns=[action, area_type, autosuggest_id, autosuggest_iid, avoid_carpool_lanes, avoid_country_borders, avoid_ferries, avoid_highway, avoid_tolls, avoid_tunnels, avoid_unpaved_roads, category, caused_by, coupon_detail_id, dest_type, display_screen, display, distance_remaining, ds_id, ds_version, duration, end_time, event_name, gecoding_source, impression_limit, is_resumed, is_sponsored, data_size, layer_type, layers_status, steps, layers_time_cost, update_type, app_version, car_id, current_lat, current_lon, device_model, log_id, log_version, map_matched_lat, map_matched_lon, raw_gps_lat, raw_gps_lon, raw_gps_timestamp, reg_vid, session_id, session_timer, time_zone, utc_timestamp, visitor_id, mode, nav_id, origin_lat, origin_lon, parent_log_id, parent_route_id, parent_search_id, predictive_nav_setting, recognized_command, request_id, request_time, response_time, retry_count, schema_definition, search_id, share_eta, source, start_engine_id, start_time, status, time_cost, token, traffic_flow_vendor, traffic_incident_vendor, transaction_id, trigger_cause, trigger, download_usage, subsystem, upload_usage, vendor, log_type, user_id, altitude, heading_angle, timestamp, payload_count, payload_type, origin_country, origin_state, origin_city, dest_country, dest_state, dest_city, current_country, current_state, current_city, payload-trip_score, current_gps-longitude, slogtime_day, current_geohash_4, current_geohash_3, payload-start_speed-timestamp, logshed_app_id, logshed_api_key, payload-end_speed-timestamp, slogtime_datetime, slogtime, slogtime_month, current_gps-latitude, current_geohash_5, payload-level, slogtime_year, dest_gps-longitude, payload-log_context-client_name, payload-end_speed-value, payload-start_speed-value, origin_gps-latitude, client_address, dest_gps-latitude, slogtime_hour, slogtime_date, origin_gps-longitude, slogtime_datehour, current_geohash_2, payload-log_context-trip_id, payload-trip_id, payload-log_context-client_version, incident_count, connection_type, app_id, device_id, interaction_method, impression_id, route_id, entity_id, term, category_id, card_id, dest_lat, dest_lon, destination_id, distance, eta, label, position_double, position_string, score, traffic, type, lat, lon, speed, edge_id, horizontal_accuracy, vertical_accuracy, actual_speed, assumed_speed, mercator_coord_x, mercator_coord_y, traffic_speed, features, order_id, purchase_time, payload-car_id, payload-purchase_id, region, matched_road_lon, raw_dr_lon, raw_dr_ehpe, matched_road_lat, raw_dr_lat, sort_type, log_year, log_month, log_day], splits=[21089])
.
at com.dremio.exec.planner.acceleration.MaterializationExpander.expand(MaterializationExpander.java:84)
at com.dremio.exec.planner.acceleration.MaterializationDescriptor.getMaterializationFor(MaterializationDescriptor.java:160)
at com.dremio.service.reflection.ReflectionServiceImpl$CacheHelperImpl.expand(ReflectionServiceImpl.java:1136)
at com.dremio.service.reflection.ReflectionServiceImpl$CacheHelperImpl.expand(ReflectionServiceImpl.java:1117)
at com.dremio.service.reflection.MaterializationCache.updateEntry(MaterializationCache.java:222)
at com.dremio.service.reflection.MaterializationCache.update(MaterializationCache.java:269)
at com.dremio.service.reflection.ReflectionServiceImpl$DescriptorCacheImpl.update(ReflectionServiceImpl.java:1223)
at com.dremio.service.reflection.ReflectionManager.metadataRefreshJobSucceeded(ReflectionManager.java:972)
at com.dremio.service.reflection.ReflectionManager.handleSuccessfulJob(ReflectionManager.java:783)
at com.dremio.service.reflection.ReflectionManager.handleRefreshingEntry(ReflectionManager.java:464)
at com.dremio.service.reflection.ReflectionManager.handleEntry(ReflectionManager.java:388)
at com.dremio.service.reflection.ReflectionManager.handleEntries(ReflectionManager.java:341)
at com.dremio.service.reflection.ReflectionManager.sync(ReflectionManager.java:223)
at com.dremio.service.reflection.ReflectionManager.run(ReflectionManager.java:202)
at com.dremio.common.WakeupHandler$1.run(WakeupHandler.java:67)
at com.dremio.context.RequestContext.run(RequestContext.java:95)
at com.dremio.common.concurrent.ContextMigratingExecutorService.lambda$decorate$3(ContextMigratingExecutorService.java:199)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Is anyone else having similar issues?

Dremio version is:
Build
20.0.0-202201050826310141-8cc7162b
Edition
AWS Edition (activated)

Thank you.

@Switch Is your reflection still valid, hat happens if you refresh the reflection, does the error go away or you still see it?

It’s valid now.
This issue does not happen every time.
Sometime reflection refresh successfully, sometimes not.

@Switch Do you create the reflection using SQL or API or the UI?

The reflections all created on UI.
I found the refresh method of these reflections are incremental refresh. But sometimes, datas may be delete from the underlying PDSs. Could that be a possible cause of this issue.

@Switch Deletes are not supported with incremental refresh so could be related, are you able to see when the failure occurs, as there a delete before that refresh?