swh.loader.cvs.loader module#

Loader in charge of injecting either new or existing cvs repositories to swh-storage.

swh.loader.cvs.loader.rsync_retry()[source]#
exception swh.loader.cvs.loader.BadPathException[source]#

Bases: Exception

class swh.loader.cvs.loader.CvsLoader(storage: StorageInterface, url: str, origin_url: str | None = None, visit_date: datetime | None = None, cvsroot_path: str | None = None, temp_directory: str = '/tmp', **kwargs: Any)[source]#

Bases: BaseLoader

Swh cvs loader.

The repository is local. The loader deals with update on an already previously loaded repository.

visit_type: str = 'cvs'#
cvs_module_name: str#
cvsclient: CVSClient#
rlog_file: BinaryIO#
swh_revision_gen: Iterator[Tuple[List[Content], List[SkippedContent], List[Directory], Revision]]#
compute_swh_revision(k: ChangeSetKey, logmsg: bytes | None) Tuple[Revision, Directory][source]#

Compute swh hash data per CVS changeset.

Returns:

tuple (rev, swh_directory) - rev: current SWH revision computed from checked out work tree - swh_directory: dictionary of path, swh hash data with type

file_path_is_safe(wtpath: bytes)[source]#
add_content(path: bytes, wtpath: bytes)[source]#
checkout_file_with_rcsparse(k: ChangeSetKey, f: FileRevision, rcsfile: rcsfile) None[source]#
checkout_file_with_cvsclient(k: ChangeSetKey, f: FileRevision, cvsclient: CVSClient)[source]#
process_cvs_changesets(cvs_changesets: List[ChangeSetKey], use_rcsparse: bool) Iterator[Tuple[List[Content], List[SkippedContent], List[Directory], Revision]][source]#

Process CVS revisions.

At each CVS revision, check out contents and compute swh hashes.

Yields:

tuple (contents, skipped-contents, directories, revision) of dict as a dictionary with keys, sha1_git, sha1, etc…

pre_cleanup() None[source]#

Cleanup potential dangling files from prior runs (e.g. OOM killed tasks)

cleanup() None[source]#

Last step executed by the loader.

configure_custom_id_keyword(cvsconfig: TextIO)[source]#

Parse CVSROOT/config and look for a custom keyword definition. There are two different configuration directives in use for this purpose.

The first variant stems from a patch which was never accepted into upstream CVS and uses the tag directive: tag=MyName With this, the “MyName” keyword becomes an alias for the “Id” keyword. This variant is prelevant in CVS versions shipped on BSD.

The second variant stems from upstream CVS 1.12 and looks like: LocalKeyword=MyName=SomeKeyword KeywordExpand=iMyName We only support “SomeKeyword” if it specifies “Id” or “CVSHeader”, for now. The KeywordExpand directive can be used to suppress expansion of keywords by listing keywords after an initial “e” character (“exclude”, as opposed to an “include” list which uses an initial “i” character). For example, this disables expansion of the Date and Name keywords: KeywordExpand=eDate,Name

execute_rsync(rsync_cmd: List[str], **run_opts) CompletedProcess[source]#
fetch_cvs_repo_with_rsync(host: str, path: str) None[source]#
prepare() None[source]#
Second step executed by the loader to prepare some state needed by

the loader.

Raises

NotFound exception if the origin to ingest is not found.

fetch_data() bool[source]#

Fetch the next CVS revision.

build_swh_revision(k: ChangeSetKey, logmsg: bytes | None, dir_id: bytes, parents: Sequence[bytes]) Revision[source]#

Given a CVS revision, build a swh revision.

Parameters:
  • k – changeset data

  • logmsg – the changeset’s log message

  • dir_id – the tree’s hash identifier

  • parents – the revision’s parents identifier

Returns:

The swh revision dictionary.

generate_and_load_snapshot(revision: Revision | None = None) Snapshot[source]#

Create the snapshot either from existing revision.

Parameters:

revision (dict) – Last revision seen if any (None by default)

Returns:

Optional[Snapshot] The newly created snapshot

store_data() None[source]#

Add our current CVS changeset to the archive.

load_status() Dict[str, Any][source]#

Detailed loading status.

Defaults to logging an eventful load.

Returns: a dictionary that is eventually passed back as the task’s

result to the scheduler, allowing tuning of the task recurrence mechanism.

visit_status() str[source]#

Detailed visit status.

Defaults to logging a full visit.