Refreshing metadata

Hi,
I have a distributed distribution of dremio using docker and I have an oracle database in another container. I have configured the connection to check every minute if there are changes to the database as indicated here. Then I go to the database and insert new lines and wait a couple of minutes and I do not see any changes. When a lot of time passes, the changes appear, but not every minute as I want.
I appreciate all help.

Hi @German

You have set metadata refresh via the source at one hour … are you saying you are not seeing new data even after 1 hour? Are there any errors in the server.log?

Thanks
@balaji.ramaswamy

Hi @balaji.ramaswamy

This is my current data refresh configuration. In my previous post I said that I had followed the instructions on that page, which says that the refresh of data is every hour, but I had put it in a minute, I apologize if I had not understood. The minutes pass and until a good while, the changes in the union are not seen when I modify the database.

Thanks.

Hi @German

It could be possible the metadata refresh failed. Are there any errors in the server.log for metadata refresh. Another option is to enable the DEBUG logger for source metadata refresh, add below entry into logback.xml under the $DREMIO_HOME/conf folder , restart coordinator and try again. Review logfile for errors

<logger name= "com.dremio.exec.catalog.SourceMetadataManager" additivity= "false" >

<level value= "debug" />

<appender-ref ref= "text" />

</logger>

Thanks
@balaji.ramaswamy

1 Like

@balaji.ramaswamy

Thank you! That worked perfectly, now refreshes the metadata every minute. I had not published the server.log because in the / var / log / dremio folder that file does not appear, only the server.out file

Hi @German

Adding DEBUG should not fix the issue. If the issue is fixed then kindly remove the debug logger and restart Dremio

Kindly let us know if you have any other question

Thanks
@balaji.ramaswamy

I’m having this issue as well suddenly out of the blue. Noticed this in server log:

2019-12-19 04:43:42,508 [metadata-refresh-test-client-s3] INFO c.d.e.catalog.SourceMetadataManager - Source ‘test-client-s3’ sync run started
2019-12-19 04:43:43,571 [metadata-refresh-test-client-s3] INFO c.d.e.catalog.SourceMetadataManager - Source ‘test-client-s3’ sync run ended. Took 1062 milliseconds
2019-12-19 04:44:33,723 [metadata-refresh-test-client-s3] INFO c.d.e.catalog.SourceMetadataManager - Source ‘test-client-s3’ sync run started
2019-12-19 04:44:46,582 [metadata-refresh-test-client-s3] INFO c.d.e.catalog.SourceMetadataManager - Source ‘test-client-s3’ sync run ended. Took 12859 milliseconds
2019-12-19 04:45:34,109 [metadata-refresh-test-client-s3] INFO c.d.e.catalog.SourceMetadataManager - Source ‘test-client-s3’ sync run started
2019-12-19 04:45:34,485 [metadata-refresh-test-client-s3] INFO c.d.e.catalog.SourceMetadataManager - Source ‘test-client-s3’ sync run ended. Took 375 milliseconds
2019-12-19 04:45:46,582 [metadata-refresh-test-client-s3] INFO c.d.e.catalog.SourceMetadataManager - Source ‘test-client-s3’ sync run started
2019-12-19 04:45:57,637 [metadata-refresh-test-client-s3] INFO c.d.e.catalog.SourceMetadataManager - Source ‘test-client-s3’ sync run ended. Took 11054 milliseconds
2019-12-19 04:46:46,859 [metadata-refresh-test-client-s3] INFO c.d.e.catalog.SourceMetadataManager - Source ‘test-client-s3’ sync run started
2019-12-19 04:46:47,211 [metadata-refresh-test-client-s3] INFO c.d.e.catalog.SourceMetadataManager - Source ‘test-client-s3’ sync run ended. Took 352 milliseconds
2019-12-19 04:46:57,637 [metadata-refresh-test-client-s3] INFO c.d.e.catalog.SourceMetadataManager - Source ‘test-client-s3’ sync run started
2019-12-19 04:48:11,446 [metadata-refresh-test-client-s3] INFO c.d.e.catalog.SourceMetadataManager - Source ‘test-client-s3’ sync run ended. Took 73809 milliseconds
2019-12-19 05:27:23,374 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run started
2019-12-19 05:27:23,374 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run ended. Took 0 milliseconds
2019-12-19 06:27:23,374 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run started
2019-12-19 06:27:23,374 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run ended. Took 0 milliseconds
2019-12-19 07:27:23,374 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run started
2019-12-19 07:27:23,375 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run ended. Took 0 milliseconds
2019-12-19 08:27:23,375 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run started
2019-12-19 08:27:23,375 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run ended. Took 0 milliseconds
2019-12-19 09:27:23,375 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run started
2019-12-19 09:27:23,376 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run ended. Took 0 milliseconds
2019-12-19 10:27:23,375 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run started
2019-12-19 10:27:23,375 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run ended. Took 0 milliseconds
2019-12-19 11:27:23,375 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run started
2019-12-19 11:27:23,375 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run ended. Took 0 milliseconds
2019-12-19 12:27:23,376 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run started
2019-12-19 12:27:23,376 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run ended. Took 0 milliseconds
2019-12-19 13:27:23,376 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run started
2019-12-19 13:27:23,376 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run ended. Took 0 milliseconds
2019-12-19 14:27:23,376 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run started
2019-12-19 14:27:23,377 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run ended. Took 0 milliseconds
2019-12-19 15:27:23,377 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run started
2019-12-19 15:27:23,377 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run ended. Took 0 milliseconds
2019-12-19 16:27:23,377 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run started
2019-12-19 16:27:23,377 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run ended. Took 0 milliseconds
2019-12-19 17:27:23,378 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run started
2019-12-19 17:27:23,378 [metadata-refresh-__logs] INFO c.d.e.catalog.SourceMetadataManager - Source ‘__logs’ sync run ended. Took 0 milliseconds

The logs went from taking alot of time to refresh metadata to 0 seconds and the metadata appears to not be refreshing. Also I found this in the server log:

[]# cat server.log | grep metadata
2019-12-19 04:48:11,446 [scheduler-17] INFO c.d.concurrent.SingletonRunnable - Task metadata-refresh-test-client-s3 already running, skipping secondary execution.

Restarting the dremio master node fixed the issue, i’ll do this from now on if the metadata refresh scheduler starts mysteriously failing.

@kprifogle

Sorry, I do not find any errors, these are just regular metadata information. Have you turned on DEBUG ?

The error is that the metadata refresh only takes a trivial amount of time, and when I restart dremio it suddenly starts taking longer and additionally the metadata actually refreshes. I think maybe the metadata refresh server is falling into some type of failure mode? This happened infrequently at first but now happens every day. Is there anything in the metadata that would cause this? Are we updating too frequently?

@balaji.ramaswamy The error is actually that if you look for test-client-s3 its not refreshing in the log at all for the timespans where the metadata is not refreshing, its only syncing the logs which takes 0 seconds. Does the mean the metadata refresh is breaking somehow? How do we ensure that it stays up? The second error seems to imply that the metadata refresh is getting stuck and is being skipped. Its set to refresh every 1 minute and it looks like some times the refresh may take longer than that? (73,000 milliseconds)

My current theory of whats happening is that:

  1. A particular metadata refresh takes more than 1 minute.
  2. It skips the next refresh to avoid queuing up refresh jobs.
  3. Somehow it never then queues them again.

@kprifogle

What version Dremio is this. Are you refreshing every minute? Is that data changing so often? Why don’t you consider refreshing the datasets explicitly for ones that have only changed right after your ETL job? We had one bug similar to what you are saying on an older version. Send met the server.log as an attachment to confirm

3.2.0-201905102005330382-0598733 is my version. Unforunately I’m not being allowed to upgrade.