Physical Datasets disappearing

We are using Dremio 19.0 CE and we frequently find physical datasets promoted from S3 files to disappear from the catalog: some query fails with Error while expanding view and we have to promote them again.

Note that we have the option Remove dataset definitions if underlying data is unavailable. so we don’t expect them to be automatically forgotten.

What can we do to investigate the issue?

@ma82 Have you unchecked the above option? Under the Dremio log folder, there will be a file called metadata_refresh.log and more metadata_refresh logs under the log/archive folder with the date as part of the file name. If you look through these logs, do you see Dremio losing connectivity to S3?

@balaji.ramaswamy

Thanks for your reply.

Yes, we had the Remove dataset definitions if underlying data is unavailable option unchecked, sorry for writing imprecisely.

As a workaround we decided to try enabling the
Automatically format files into physical datasets when users issue queries.
option and it seems to work correctly but we’d prefer not to incur in the costs of reformatting datasets which we expect to be already formatted.

I’m looking for metadata_refresh.log but in our helm chart-based setup i cannot find it in any pods.
Should I look into the kubernetes logs of these pods, instead? If so, can I filter them somehow?

@ma82 You are right, maybe all logs are going into standard out, few questions

  • Does your Dremio coordinator go down unexpectedly?
  • Do you change any settings on your S3 source?
  • Does Dremio lose connection S3 anytime?

Hi @balaji.ramaswamy

  • Our dremio coordinator doesn’t go down unexpectedly: sometimes we restart either that or the executors, though (e.g. for k8s node updates), and in that case we do see physical datasets disappearing. However, it doesn’t explain all sudden losses of catalog entries.
  • We changed settings very rarely on our S3 source, so we don’t expect this to be a relevant reason.
  • We can’t see any sign of losing connection to S3: this is something that I wouldn’t expect to happen frequently in an AWS VPC from EKS pods so I think this is not a relevant event.

Please let me know whether I can grep for some message in particular.

@ma82 Would you be able to see if during the restarts there was a metadata refresh happening in the background? Basically match restart timestamps with metadata_refresh.log and see if there was one in process