swh.loader.cvs.loader module#
Loader in charge of injecting either new or existing cvs repositories to swh-storage.
- class swh.loader.cvs.loader.CvsLoader(storage: StorageInterface, url: str, origin_url: str | None = None, visit_date: datetime | None = None, cvsroot_path: str | None = None, temp_directory: str = '/tmp', **kwargs: Any)[source]#
Bases:
BaseLoader
Swh cvs loader.
The repository is local. The loader deals with update on an already previously loaded repository.
- compute_swh_revision(k: ChangeSetKey, logmsg: bytes | None) Tuple[Revision, Directory] [source]#
Compute swh hash data per CVS changeset.
- Returns:
tuple (rev, swh_directory) - rev: current SWH revision computed from checked out work tree - swh_directory: dictionary of path, swh hash data with type
- checkout_file_with_rcsparse(k: ChangeSetKey, f: FileRevision, rcsfile: rcsfile) None [source]#
- checkout_file_with_cvsclient(k: ChangeSetKey, f: FileRevision, cvsclient: CVSClient)[source]#
- process_cvs_changesets(cvs_changesets: List[ChangeSetKey], use_rcsparse: bool) Iterator[Tuple[List[Content], List[SkippedContent], List[Directory], Revision]] [source]#
Process CVS revisions.
At each CVS revision, check out contents and compute swh hashes.
- Yields:
tuple (contents, skipped-contents, directories, revision) of dict as a dictionary with keys, sha1_git, sha1, etc…
- pre_cleanup() None [source]#
Cleanup potential dangling files from prior runs (e.g. OOM killed tasks)
- configure_custom_id_keyword(cvsconfig: TextIO)[source]#
Parse CVSROOT/config and look for a custom keyword definition. There are two different configuration directives in use for this purpose.
The first variant stems from a patch which was never accepted into upstream CVS and uses the tag directive: tag=MyName With this, the “MyName” keyword becomes an alias for the “Id” keyword. This variant is prelevant in CVS versions shipped on BSD.
The second variant stems from upstream CVS 1.12 and looks like: LocalKeyword=MyName=SomeKeyword KeywordExpand=iMyName We only support “SomeKeyword” if it specifies “Id” or “CVSHeader”, for now. The KeywordExpand directive can be used to suppress expansion of keywords by listing keywords after an initial “e” character (“exclude”, as opposed to an “include” list which uses an initial “i” character). For example, this disables expansion of the Date and Name keywords: KeywordExpand=eDate,Name
- execute_rsync(rsync_cmd: List[str], **run_opts) CompletedProcess [source]#
- prepare() None [source]#
- Second step executed by the loader to prepare some state needed by
the loader.
- Raises
NotFound exception if the origin to ingest is not found.
- build_swh_revision(k: ChangeSetKey, logmsg: bytes | None, dir_id: bytes, parents: Sequence[bytes]) Revision [source]#
Given a CVS revision, build a swh revision.
- Parameters:
k – changeset data
logmsg – the changeset’s log message
dir_id – the tree’s hash identifier
parents – the revision’s parents identifier
- Returns:
The swh revision dictionary.
- generate_and_load_snapshot(revision: Revision | None = None) Snapshot [source]#
Create the snapshot either from existing revision.
- Parameters:
revision (dict) – Last revision seen if any (None by default)
- Returns:
Optional[Snapshot] The newly created snapshot