swh.loader.svn.loader module#
Loader in charge of injecting either new or existing svn mirrors to swh-storage.
- class swh.loader.svn.loader.SvnLoader(storage: StorageInterface, url: str, origin_url: str | None = None, visit_date: datetime | None = None, incremental: bool = True, temp_directory: str = '/tmp', debug: bool = False, check_revision: int = 0, check_revision_from: int = 0, **kwargs: Any)[source]#
Bases:
BaseLoader
SVN loader. The repository is either remote or local. The loader deals with update on an already previously loaded repository.
Load a svn repository (either remote or local).
- Parameters:
url – The default origin url
origin_url – Optional original url override to use as origin reference in the archive. If not provided, “url” is used as origin.
visit_date – Optional date to override the visit date
incremental – If True, the default, starts from the last snapshot (if any). Otherwise, starts from the initial commit of the repository.
temp_directory – The temporary directory to use as root directory for working directory computations
debug – If true, run the loader in debug mode. At the end of the loading, the temporary working directory is not cleaned up to ease inspection. Defaults to false.
check_revision – The number of svn commits between checks for hash divergence
- swh_revision_hash_tree_at_svn_revision(revision: int) Directory [source]#
Compute and return the hash tree at a given svn revision.
- Parameters:
rev – the svn revision we want to check
- Returns:
The hash tree directory as bytes.
- build_swh_revision(rev: int, commit: Dict, dir_id: bytes, parents: Sequence[bytes]) Revision [source]#
Build the swh revision dictionary.
This adds:
the ‘synthetic’ flag to true
the ‘extra_headers’ containing the repository’s uuid and the svn revision number.
- Parameters:
rev – the svn revision number
commit – the commit data: revision id, date, author, and message
dir_id – the upper tree’s hash identifier
parents – the parents’ identifiers
- Returns:
The swh revision corresponding to the svn revision.
- check_history_not_altered(revision_start: int, swh_rev: Revision) bool [source]#
Given a svn repository, check if the history was modified in between visits.
- start_from() Tuple[int, int] [source]#
Determine from where to start the loading.
- Returns:
tuple (revision_start, revision_end)
- Raises:
SvnLoaderHistoryAltered – When a hash divergence has been detected (should not happen)
SvnLoaderUneventful – Nothing changed since last visit
- process_svn_revisions(svnrepo, revision_start, revision_end) Iterator[Tuple[List[Content], List[SkippedContent], List[Directory], Revision]] [source]#
Process svn revisions from revision_start to revision_end.
At each svn revision, apply new diffs and simultaneously compute swh hashes. This yields those computed swh hashes as a tuple (contents, directories, revision).
Note that at every self.check_revision, a supplementary check takes place to check for hash-tree divergence (related T570).
- Yields:
tuple (contents, directories, revision) of dict as a dictionary with keys, sha1_git, sha1, etc…
- Raises:
ValueError in case of a hash divergence detection –
- prepare()[source]#
- Second step executed by the loader to prepare some state needed by
the loader.
- Raises
NotFound exception if the origin to ingest is not found.
- fetch_data()[source]#
Fetching svn revision information.
This will apply svn revision as patch on disk, and at the same time, compute the swh hashes.
In effect, fetch_data fetches those data and compute the necessary swh objects. It’s then stored in the internal state instance variables (initialized in _prepare_state).
This is up to store_data to actually discuss with the storage to store those objects.
- Returns:
True to continue fetching data (next svn revision), False to stop.
- Return type:
- store_data()[source]#
We store the data accumulated in internal instance variable. If the iteration over the svn revisions is done, we create the snapshot and flush to storage the data.
This also resets the internal instance variable state.
- generate_and_load_snapshot(revision: Revision | None = None, snapshot: Snapshot | None = None) Snapshot [source]#
Create the snapshot either from existing revision or snapshot.
Revision (supposedly new) has priority over the snapshot (supposedly existing one).
- load_status()[source]#
Detailed loading status.
Defaults to logging an eventful load.
- Returns: a dictionary that is eventually passed back as the task’s
result to the scheduler, allowing tuning of the task recurrence mechanism.
- post_load(success: bool = True) None [source]#
Permit the loader to do some additional actions according to status after the loading is done. The flag success indicates the loading’s status.
Defaults to doing nothing.
This is up to the implementer of this method to make sure this does not break.
- Parameters:
success (bool) – the success status of the loading
- class swh.loader.svn.loader.SvnLoaderFromDumpArchive(storage: StorageInterface, url: str, archive_path: str, origin_url: str | None = None, incremental: bool = False, visit_date: datetime | None = None, temp_directory: str = '/tmp', debug: bool = False, check_revision: int = 0, **kwargs: Any)[source]#
Bases:
SvnLoader
Uncompress an archive containing an svn dump, mount the svn dump as a local svn repository and load that repository.
Load a svn repository (either remote or local).
- Parameters:
url – The default origin url
origin_url – Optional original url override to use as origin reference in the archive. If not provided, “url” is used as origin.
visit_date – Optional date to override the visit date
incremental – If True, the default, starts from the last snapshot (if any). Otherwise, starts from the initial commit of the repository.
temp_directory – The temporary directory to use as root directory for working directory computations
debug – If true, run the loader in debug mode. At the end of the loading, the temporary working directory is not cleaned up to ease inspection. Defaults to false.
check_revision – The number of svn commits between checks for hash divergence
- class swh.loader.svn.loader.SvnLoaderFromRemoteDump(storage: StorageInterface, url: str, origin_url: str | None = None, incremental: bool = True, visit_date: datetime | None = None, temp_directory: str = '/tmp', debug: bool = False, check_revision: int = 0, **kwargs: Any)[source]#
Bases:
SvnLoader
Create a subversion repository dump out of a remote svn repository (using the svnrdump utility). Then, mount the repository locally and load that repository.
Load a svn repository (either remote or local).
- Parameters:
url – The default origin url
origin_url – Optional original url override to use as origin reference in the archive. If not provided, “url” is used as origin.
visit_date – Optional date to override the visit date
incremental – If True, the default, starts from the last snapshot (if any). Otherwise, starts from the initial commit of the repository.
temp_directory – The temporary directory to use as root directory for working directory computations
debug – If true, run the loader in debug mode. At the end of the loading, the temporary working directory is not cleaned up to ease inspection. Defaults to false.
check_revision – The number of svn commits between checks for hash divergence
- get_last_loaded_svn_rev(svn_url: str) int [source]#
Check if the svn repository has already been visited and return the last loaded svn revision number or -1 otherwise.
- dump_svn_revisions(svn_url: str, last_loaded_svn_rev: int = -1) Tuple[str, int] [source]#
Generate a compressed subversion dump file using the svnrdump tool and gzip. If the svnrdump command failed somehow, the produced dump file is analyzed to determine if a partial loading is still feasible.
- Raises:
NotFound when the repository is no longer found at url –
- Returns:
The dump_path of the repository mounted and the max dumped revision number (-1 if all revisions were dumped)