SOURCE_BAD_STATE ERROR: The source ["__metadata"] is currently unavailable

VahagnBleyan · April 29, 2022, 10:32am

Hi guys, on our AWS Enterprise Edition I’m getting this kind error in server.log when trying to refresh metadata on some s3 resource.It’s continuously spamming in server.log and on UI I see error Unable to refresh dataset. on ALTER TABLE "cbs-etl-parquet-dremio" REFRESH METADATA FORCE UPDATE.

Dremio version is 21.0.0-202204051402590848-b4cdafb2

2022-04-29 10:20:01,117 [Fabric-RPC-Offload19] INFO  c.d.s.s.LocalSchedulerService - Cancelling task com.dremio.resource.wlm.scheduler.DremioWLMAllocator$QueryRunTimeWatcher@694b2835
2022-04-29 10:20:01,193 [Fabric-RPC-Offload23] WARN  c.d.exec.work.foreman.AttemptManager - Dropping request to move to COMPLETED state as query 1d94442f-7c88-1a38-0369-b8745cf53600 is already at FAILED state (which is terminal).
2022-04-29 10:20:01,194 [metadata-refresh-modifiable-scheduler-23:JobId{id=1d94442f-7c88-1a38-0369-b8745cf53600, name=null, sessionId=null}] INFO  c.d.service.jobs.LocalJobsService - Submitted job (JobID JobId{id=1d94442f-7c88-1a38-0369-b8745cf53600, name=null, sessionId=null}) has failed
2022-04-29 10:20:01,194 [out-of-band-observer] INFO  query.logger - Query: 1d94442f-7c88-1a38-0369-b8745cf53600; outcome: FAILED
2022-04-29 10:20:01,207 [metadata-refresh-modifiable-scheduler-23:JobId{id=1d94442f-7c88-1a38-0369-b8745cf53600, name=null, sessionId=null}] INFO  c.d.service.jobs.LocalJobsService - Submitted job (JobID JobId{id=1d94442f-7c88-1a38-0369-b8745cf53600, name=null, sessionId=null}) has failed
2022-04-29 10:20:01,750 [metadata-refresh-modifiable-scheduler-23] INFO  c.d.service.jobs.LocalJobsService - The SQL query REFRESH DATASET "dremio-campaigns"."51189"."1573"."12-03-2022"."vins_for_targetset"."1648752545" will be submitted on the same thread
2022-04-29 10:20:01,753 [1d94442d-a834-1713-2b11-a4b5c0005200/0:foreman-planning] INFO  c.d.e.p.s.h.RefreshDatasetHandler - Initialised com.dremio.exec.planner.sql.handlers.RefreshDatasetHandler
2022-04-29 10:20:01,753 [1d94442d-a834-1713-2b11-a4b5c0005200/0:foreman-planning] INFO  c.d.e.p.s.h.r.UnlimitedSplitsMetadataProvider - Table metadata found for "dremio-campaigns"."51189"."1573"."12-03-2022".vins_for_targetset."1648752545", at s3://dremio-me-704b4030-c493-438a-b4c5-ffa816f05ae1-adef02cd770ee811/dremio/metadata/1a1d3f25-34e2-4392-9d48-9db2eef6be98/metadata/00000-1a45e952-fea3-49d7-8e78-3a9ff3751afc.metadata.json
2022-04-29 10:20:02,084 [1d94442d-a834-1713-2b11-a4b5c0005200/0:foreman-planning] INFO  c.d.e.p.s.h.r.AbstractRefreshPlanBuilder - Writing metadata for "dremio-campaigns"."51189"."1573"."12-03-2022".vins_for_targetset."1648752545" at /dremio-me-704b4030-c493-438a-b4c5-ffa816f05ae1-adef02cd770ee811/dremio/metadata/1a1d3f25-34e2-4392-9d48-9db2eef6be98
2022-04-29 10:20:02,158 [metadata-refresh-modifiable-scheduler-23] INFO  c.d.service.jobs.LocalJobsService - New job submitted. Job Id: JobId{id=1d94442d-a834-1713-2b11-a4b5c0005200, name=null, sessionId=null} - Type: METADATA_REFRESH - Query: REFRESH DATASET "dremio-campaigns"."51189"."1573"."12-03-2022"."vins_for_targetset"."1648752545"
2022-04-29 10:20:02,160 [Fabric-RPC-Offload23] INFO  c.d.exec.maestro.FragmentTracker - Fragment 1d94442d-a834-1713-2b11-a4b5c0005200:0:0 failed, cancelling remaining fragments.
2022-04-29 10:20:02,161 [Fabric-RPC-Offload22] INFO  c.d.exec.maestro.FragmentTracker - Fragment 1d94442d-a834-1713-2b11-a4b5c0005200:1:22 failed, cancelling remaining fragments.
2022-04-29 10:20:02,162 [Fabric-RPC-Offload20] INFO  c.d.exec.work.foreman.AttemptManager - 1d94442d-a834-1713-2b11-a4b5c0005200: State change requested RUNNING --> FAILED, Exception com.dremio.common.exceptions.UserRemoteException: SOURCE_BAD_STATE ERROR: The source ["__metadata"] is currently unavailable. Info: []

Can you help me to fix this issue?

Thanks
Vahagn

balaji.ramaswamy · May 2, 2022, 6:09am

@VahagnBleyan Are you seeing a lot of “SOURCE_BAD_STATE ERROR: The source [”__metadata"] is currently unavailable" errors? If that is the case, the storage location where Dremio has to read and write metadata from and to are unavailable. Where have we configured to write metadata? Are we able to write files to the same location using another method?

VahagnBleyan · May 2, 2022, 8:27am

Yes I see a lot of that kind of messages, storage configured on AWS S3 bucket and it’s accessible from dremio instance.
! While writing I realize that I’ve disabled public ip-s on Engine nodes and that errors should be because of that. Enabling public IP solve the problem, thanks to your reply).
Is there any way to start elastic engine instances in different subnet than coordinator?
Also it will be great to have more clear error descriptions to understand that there is connection error. Timeout or unreachable error maybe)

Thanks
Vahagn

balaji.ramaswamy · May 4, 2022, 10:04pm

@VahagnBleyan Different subnet should be ok as long as they can communicate on the inter node communication port, latency might be an issue,

VahagnBleyan · May 6, 2022, 7:21am

And how can I configure that?

balaji.ramaswamy · May 9, 2022, 10:40pm

@VahagnBleyan Are you able to ping and telnet back and forth between the coordinator and the executor?

Topic		Replies	Views
Failure while retrieving metadata for table	2	2603	February 15, 2018
Refreshing physical datasets in external s3 source failing Dremio University	1	1244	October 29, 2021
Refresh Metadata Taking Ling Time	15	4074	February 25, 2021
Metadata Retrieval at query time (AWS Glue)	7	1726	January 16, 2021
Near real time metadata refresh	8	2449	December 10, 2021

SOURCE_BAD_STATE ERROR: The source ["__metadata"] is currently unavailable

Related topics