swh.loader.dir package


swh.loader.dir.converters module


Convert a timestamp to utc datetime.


Convert a git string timezone format string (e.g +0200, -0310) to minutes.

Parameters:offset_str – a string representing an offset.
Returns:A positive or negative number of minutes of such input
swh.loader.dir.converters.commit_to_revision(commit, log=None)[source]

Format a commit as a revision.

swh.loader.dir.converters.annotated_tag_to_release(release, log=None)[source]

Format a swh release.

swh.loader.dir.loader module

swh.loader.dir.loader.revision_from(directory_hash, revision)[source]
swh.loader.dir.loader.release_from(revision_hash, release)[source]
swh.loader.dir.loader.snapshot_from(revision_hash, branch)[source]

Build a snapshot from an origin, a visit, a revision, and a branch.

class swh.loader.dir.loader.DirLoader(logging_class='swh.loader.dir.DirLoader', config=None)[source]

Bases: swh.loader.core.loader.BufferedLoader

A bulk loader for a directory.

visit_type = 'dir'
__init__(logging_class='swh.loader.dir.DirLoader', config=None)[source]

Initialize self. See help(type(self)) for accurate signature.

list_objs(*, dir_path, revision, release, branch_name)[source]

List all objects from dir_path.

  • dir_path (str) – the directory to list
  • revision (dict) – revision dictionary representation
  • release (dict) – release dictionary representation
  • branch_name (str) – branch name

a mapping from object types (‘content’, ‘directory’, ‘revision’, ‘release’, ‘snapshot’) with a dictionary mapping each object’s id to the object

Return type:


load(*, dir_path, origin, visit_date, revision, release, branch_name=None)[source]

Load the content of the directory to the archive.

  • dir_path – root of the directory to import
  • origin (dict) – an origin dictionary as returned by swh.storage.storage.Storage.origin_get_one()
  • visit_date (str) – the date the origin was visited (as an isoformatted string)
  • revision (dict) – a revision as passed to swh.storage.storage.Storage.revision_add(), excluding the id and directory keys (computed from the directory)
  • release (dict) – a release as passed to swh.storage.storage.Storage.release_add(), excluding the id, target and target_type keys (computed from the revision)’
  • branch_name (str) – the optional branch_name to use for snapshot
prepare_origin_visit(*, origin, visit_date=None, **kwargs)[source]

First step executed by the loader to prepare origin and visit references. Set/update self.origin, and optionally self.origin_url, self.visit_date.

prepare(*, dir_path, origin, revision, release, visit_date=None, branch_name=None)[source]

Prepare the loader for directory loading.

Args: identical to load().


Nothing to clean up.


Walk the directory, load all objects with their hashes.

Sets self.objects reference with results.


Store fetched data in the database.

Should call the maybe_load_xyz() methods, which handle the bundles sent to storage, rather than send directly.

__abstractmethods__ = frozenset()
__module__ = 'swh.loader.dir.loader'
_abc_impl = <_abc_data object>

swh.loader.dir.tasks module

Module contents