swh.loader.metadata.journal_client module#

class swh.loader.metadata.journal_client.JournalClient(scheduler: swh.scheduler.interface.SchedulerInterface, storage: swh.storage.interface.StorageInterface, metadata_fetcher_credentials: Dict[str, Dict[str, List[Dict[str, str]]]] | None, reload_after_days: int)[source]#

Bases: object

scheduler: SchedulerInterface#
storage: StorageInterface#
metadata_fetcher_credentials: Dict[str, Dict[str, List[Dict[str, str]]]] | None#
reload_after_days: int#
statsd_timed(name: str, tags: Dict[str, Any] = {})[source]#

Wrapper for swh.core.statsd.Statsd.timed(), which uses the standard metric name and tag.

statsd_timing(name: str, value: float, tags: Dict[str, Any] = {}) None[source]#

Wrapper for swh.core.statsd.Statsd.timing(), which uses the standard metric name and tags for loaders.

process_journal_objects(messages: Dict[str, List[Dict]]) None[source]#

Loads metadata for origins not recently loaded:

  1. reads messages from the origin journal topic

  2. queries the scheduler for a list of listers that produced this origin (to guess what type of forge it is)

  3. if it is a forge we can get extrinsic metadata from, check if we got any recently, using the storage

  4. if not, trigger a metadata load