swh.loader.svn.loader module

Loader in charge of injecting either new or existing svn mirrors to swh-storage.

class swh.loader.svn.loader.SvnLoader(url, origin_url=None, visit_date=None, destination_path=None, swh_revision=None, start_from_scratch=False)[source]

Bases: swh.loader.core.loader.BaseLoader

Swh svn loader.

The repository is either remote or local. The loader deals with update on an already previously loaded repository.

CONFIG_BASE_FILENAME = 'loader/svn'
ADDITIONAL_CONFIG = {'check_revision': ('dict', {'status': False, 'limit': 1000}), 'debug': ('bool', False), 'temp_directory': ('str', '/tmp')}
visit_type = 'svn'
pre_cleanup()[source]

Cleanup potential dangling files from prior runs (e.g. OOM killed tasks)

cleanup()[source]

Clean up the svn repository’s working representation on disk.

swh_revision_hash_tree_at_svn_revision(revision)[source]

Compute and return the hash tree at a given svn revision.

Parameters

rev (int) – the svn revision we want to check

Returns

The hash tree directory as bytes.

build_swh_revision(rev, commit, dir_id, parents)[source]

Build the swh revision dictionary.

This adds:

  • the ‘synthetic’ flag to true

  • the ‘extra_headers’ containing the repository’s uuid and the svn revision number.

Parameters
  • rev (dict) – the svn revision

  • commit (dict) – the commit metadata

  • dir_id (bytes) – the upper tree’s hash identifier

  • parents ([bytes]) – the parents’ identifiers

Returns

The swh revision corresponding to the svn revision.

check_history_not_altered(svnrepo, revision_start: int, swh_rev: swh.model.model.Revision) → bool[source]

Given a svn repository, check if the history was modified in between visits.

start_from(start_from_scratch: bool = False) → Tuple[int, int, Dict[int, Tuple[bytes, ]]][source]

Determine from where to start the loading.

Parameters

start_from_scratch – As opposed to start from the last snapshot

Returns

tuple (revision_start, revision_end, revision_parents)

Raises
process_svn_revisions(svnrepo, revision_start, revision_end, revision_parents) → Iterator[Tuple[List[swh.model.model.Content], List[swh.model.model.SkippedContent], List[swh.model.model.Directory], swh.model.model.Revision]][source]

Process svn revisions from revision_start to revision_end.

At each svn revision, apply new diffs and simultaneously compute swh hashes. This yields those computed swh hashes as a tuple (contents, directories, revision).

Note that at every self.check_revision, a supplementary check takes place to check for hash-tree divergence (related T570).

Yields

tuple (contents, directories, revision) of dict as a dictionary with keys, sha1_git, sha1, etc…

Raises

ValueError in case of a hash divergence detection

prepare_origin_visit(*args, **kwargs)[source]

First step executed by the loader to prepare origin and visit references. Set/update self.origin, and optionally self.origin_url, self.visit_date.

prepare(*args, **kwargs)[source]

Second step executed by the loader to prepare some state needed by the loader.

fetch_data()[source]

Fetching svn revision information.

This will apply svn revision as patch on disk, and at the same time, compute the swh hashes.

In effect, fetch_data fetches those data and compute the necessary swh objects. It’s then stored in the internal state instance variables (initialized in _prepare_state).

This is up to store_data to actually discuss with the storage to store those objects.

Returns

True to continue fetching data (next svn revision), False to stop.

Return type

bool

store_data()[source]

We store the data accumulated in internal instance variable. If the iteration over the svn revisions is done, we create the snapshot and flush to storage the data.

This also resets the internal instance variable state.

generate_and_load_snapshot(revision: Optional[swh.model.model.Revision] = None, snapshot: Optional[swh.model.model.Snapshot] = None)swh.model.model.Snapshot[source]

Create the snapshot either from existing revision or snapshot.

Revision (supposedly new) has priority over the snapshot (supposedly existing one).

Parameters
  • revision (dict) – Last revision seen if any (None by default)

  • snapshot (dict) – Snapshot to use if any (None by default)

Returns

Optional[Snapshot] The newly created snapshot

load_status()[source]

Detailed loading status.

Defaults to logging an eventful load.

Returns: a dictionary that is eventually passed back as the task’s

result to the scheduler, allowing tuning of the task recurrence mechanism.

visit_status()[source]

Detailed visit status.

Defaults to logging a full visit.

class swh.loader.svn.loader.SvnLoaderFromDumpArchive(url, archive_path, origin_url=None, destination_path=None, swh_revision=None, start_from_scratch=None, visit_date=None)[source]

Bases: swh.loader.svn.loader.SvnLoader

Uncompress an archive containing an svn dump, mount the svn dump as an svn repository and load said repository.

prepare(*args, **kwargs)[source]

Second step executed by the loader to prepare some state needed by the loader.

cleanup()[source]

Clean up the svn repository’s working representation on disk.

class swh.loader.svn.loader.SvnLoaderFromRemoteDump(url, origin_url=None, destination_path=None, swh_revision=None, start_from_scratch=False, visit_date=None)[source]

Bases: swh.loader.svn.loader.SvnLoader

Create a subversion repository dump using the svnrdump utility, mount it locally and load the repository from it.

get_last_loaded_svn_rev(svn_url: str) → int[source]

Check if the svn repository has already been visited and return the last loaded svn revision number or -1 otherwise.

dump_svn_revisions(svn_url, last_loaded_svn_rev=- 1)[source]

Generate a subversion dump file using the svnrdump tool. If the svnrdump command failed somehow, the produced dump file is analyzed to determine if a partial loading is still feasible.

prepare(*args, **kwargs)[source]

Second step executed by the loader to prepare some state needed by the loader.

cleanup()[source]

Clean up the svn repository’s working representation on disk.

visit_status()[source]

Detailed visit status.

Defaults to logging a full visit.