swh.loader.dir package

Submodules

swh.loader.dir.converters module

swh.loader.dir.converters.to_datetime(ts)[source]

Convert a timestamp to utc datetime.

swh.loader.dir.converters.format_to_minutes(offset_str)[source]

Convert a git string timezone format string (e.g +0200, -0310) to minutes.

Parameters:offset_str – a string representing an offset.
Returns:A positive or negative number of minutes of such input
swh.loader.dir.converters.commit_to_revision(commit, log=None)[source]

Format a commit as a revision.

swh.loader.dir.converters.annotated_tag_to_release(release, log=None)[source]

Format a swh release.

swh.loader.dir.loader module

swh.loader.dir.loader.revision_from(directory_hash, revision)[source]
swh.loader.dir.loader.release_from(revision_hash, release)[source]
swh.loader.dir.loader.snapshot_from(revision_hash, branch)[source]

Build a snapshot from an origin, a visit, a revision, and a branch.

class swh.loader.dir.loader.DirLoader(logging_class='swh.loader.dir.DirLoader', config=None)[source]

Bases: swh.loader.core.loader.BufferedLoader

A bulk loader for a directory.

CONFIG_BASE_FILENAME = 'loader/dir'
__init__(logging_class='swh.loader.dir.DirLoader', config=None)[source]

Initialize self. See help(type(self)) for accurate signature.

list_objs(*, dir_path, revision, release, branch_name)[source]

List all objects from dir_path.

Parameters:
  • dir_path (str) – the directory to list
  • revision (dict) – revision dictionary representation
  • release (dict) – release dictionary representation
  • branch_name (str) – branch name
Returns:

a mapping from object types (‘content’, ‘directory’, ‘revision’, ‘release’, ‘snapshot’) with a dictionary mapping each object’s id to the object

Return type:

dict

load(*, dir_path, origin, visit_date, revision, release, branch_name=None)[source]

Load the content of the directory to the archive.

Parameters:
  • dir_path – root of the directory to import
  • origin (dict) – an origin dictionary as returned by swh.storage.storage.Storage.origin_get_one()
  • visit_date (str) – the date the origin was visited (as an isoformatted string)
  • revision (dict) – a revision as passed to swh.storage.storage.Storage.revision_add(), excluding the id and directory keys (computed from the directory)
  • release (dict) – a release as passed to swh.storage.storage.Storage.release_add(), excluding the id, target and target_type keys (computed from the revision)’
  • branch_name (str) – the optional branch_name to use for snapshot
prepare_origin_visit(*, origin, visit_date=None, **kwargs)[source]

First step executed by the loader to prepare origin and visit references. Set/update self.origin, and optionally self.origin_url, self.visit_date.

prepare(*, dir_path, origin, revision, release, visit_date=None, branch_name=None)[source]

Prepare the loader for directory loading.

Args: identical to load().

cleanup()[source]

Nothing to clean up.

fetch_data()[source]

Walk the directory, load all objects with their hashes.

Sets self.objects reference with results.

store_data()[source]

Store fetched data in the database.

Should call the maybe_load_xyz() methods, which handle the bundles sent to storage, rather than send directly.

__abstractmethods__ = frozenset()
__module__ = 'swh.loader.dir.loader'
_abc_cache = <_weakrefset.WeakSet object>
_abc_negative_cache = <_weakrefset.WeakSet object>
_abc_negative_cache_version = 111
_abc_registry = <_weakrefset.WeakSet object>

swh.loader.dir.tasks module

Module contents