swh.loader.svn package

Submodules

swh.loader.svn.converters module

swh.loader.svn.converters.svn_date_to_gitsvn_date(strdate)[source]

Convert a string date to an swh one.

Parameters:
  • strdate – A string formatted for .utils.strdate_to_timestamp
  • do its jobs (to) –
Returns:

An swh date format with an integer timestamp.

swh.loader.svn.converters.svn_date_to_swh_date(strdate)[source]

Convert a string date to an swh one.

Parameters:
  • strdate – A string formatted for .utils.strdate_to_timestamp
  • do its jobs (to) –
Returns:

An swh date format

swh.loader.svn.converters.svn_author_to_swh_person(author)[source]

Convert an svn author to an swh person. Default policy: No information is added.

Parameters:author (string) – the svn author (in bytes)
Returns: a dictionary with keys:
fullname: the author’s associated fullname name: the author’s associated name email: None (no email in svn)
swh.loader.svn.converters.svn_author_to_gitsvn_person(author, repo_uuid)[source]

Convert an svn author to a person suitable for insertion.

Default policy: If no email is found, the email is created using the author and the repo_uuid.

Parameters:
  • author (string) – the svn author (in bytes)
  • repo_uuid (bytes) – the repository’s uuid
Returns: a dictionary with keys:
fullname: the author’s associated fullname name: the author’s associated name email: None (no email in svn)
swh.loader.svn.converters.build_swh_revision(rev, commit, repo_uuid, dir_id, parents)[source]

Given a svn revision, build a swh revision.

This adds an [‘metadata’][‘extra-headers’] entry with the repository’s uuid and the svn revision.

Parameters:
  • rev (-) – the svn revision number
  • commit (-) – the commit metadata
  • repo_uuid (-) – The repository’s uuid
  • dir_id (-) – the tree’s hash identifier
  • parents (-) – the revision’s parents identifier
Returns:

The swh revision dictionary.

swh.loader.svn.converters.build_gitsvn_swh_revision(rev, commit, dir_id, parents)[source]

Given a svn revision, build a swh revision.

Parameters:
  • rev (-) – the svn revision number
  • commit (-) – the commit metadata
  • dir_id (-) – the tree’s hash identifier
  • parents (-) – the revision’s parents identifier
Returns:

The swh revision dictionary.

swh.loader.svn.exception module

exception swh.loader.svn.exception.SvnLoaderEventful(e, swh_revision)[source]

Bases: ValueError

Loading happens with some events. This transit the latest revision seen.

__init__(e, swh_revision)[source]

Initialize self. See help(type(self)) for accurate signature.

__module__ = 'swh.loader.svn.exception'
__weakref__

list of weak references to the object (if defined)

exception swh.loader.svn.exception.SvnLoaderUneventful[source]

Bases: ValueError

‘Loading did nothing.

__module__ = 'swh.loader.svn.exception'
__weakref__

list of weak references to the object (if defined)

exception swh.loader.svn.exception.SvnLoaderHistoryAltered[source]

Bases: ValueError

History altered detected

__module__ = 'swh.loader.svn.exception'
__weakref__

list of weak references to the object (if defined)

swh.loader.svn.loader module

Loader in charge of injecting either new or existing svn mirrors to swh-storage.

swh.loader.svn.loader._revision_id(revision)[source]
swh.loader.svn.loader.build_swh_snapshot(revision_id, branch=b'HEAD')[source]

Build a swh snapshot from the revision id, origin url, and visit.

class swh.loader.svn.loader.SvnLoader[source]

Bases: swh.loader.core.loader.BufferedLoader

Swh svn loader.

The repository is either remote or local. The loader deals with update on an already previously loaded repository.

CONFIG_BASE_FILENAME = 'loader/svn'
ADDITIONAL_CONFIG = {'check_revision': ('dict', {'status': False, 'limit': 1000}), 'debug': ('bool', False), 'temp_directory': ('str', '/tmp')}
visit_type = 'svn'
__init__()[source]

Initialize self. See help(type(self)) for accurate signature.

pre_cleanup()[source]

Cleanup potential dangling files from prior runs (e.g. OOM killed tasks)

cleanup()[source]

Clean up the svn repository’s working representation on disk.

swh_revision_hash_tree_at_svn_revision(revision)[source]

Compute and return the hash tree at a given svn revision.

Parameters:rev (int) – the svn revision we want to check
Returns:The hash tree directory as bytes.
get_svn_repo(svn_url, local_dirname, origin_url)[source]

Instantiates the needed svnrepo collaborator to permit reading svn repository.

Parameters:
  • svn_url (str) – the svn repository url to read from
  • local_dirname (str) – the local path on disk to compute data
  • origin_url (str) – the corresponding origin url
Returns:

Instance of swh.loader.svn.svn clients

swh_latest_snapshot_revision(origin_url, previous_swh_revision=None)[source]

Look for latest snapshot revision and returns it if any.

Parameters:
  • origin_url (str) – Origin identifier
  • previous_swh_revision – (optional) id of a possible previous swh revision
Returns:

The latest known point in time. Dict with keys:

’revision’: latest visited revision ‘snapshot’: latest snapshot

If None is found, return an empty dict.

Return type:

dict

build_swh_revision(rev, commit, dir_id, parents)[source]

Build the swh revision dictionary.

This adds:

  • the ‘synthetic’ flag to true
  • the ‘extra_headers’ containing the repository’s uuid and the svn revision number.
Parameters:
  • rev (dict) – the svn revision
  • commit (dict) – the commit metadata
  • dir_id (bytes) – the upper tree’s hash identifier
  • parents ([bytes]) – the parents’ identifiers
Returns:

The swh revision corresponding to the svn revision.

check_history_not_altered(svnrepo, revision_start, swh_rev)[source]

Given a svn repository, check if the history was not tampered with.

_init_from(partial_swh_revision, previous_swh_revision)[source]

Function to determine from where to start from.

Parameters:
  • partial_swh_revision (dict) – A known revision from which the previous loading did not finish.
  • known_previous_revision (dict) – A known revision from which the previous loading did finish.
Returns:

The revision from which to start or None if nothing (fresh start).

start_from(last_known_swh_revision=None, start_from_scratch=False)[source]

Determine from where to start the loading.

Parameters:
  • last_known_swh_revision (dict) – Last know swh revision or None
  • start_from_scratch (bool) – To start loading from scratch or not
Returns:

tuple (revision_start, revision_end, revision_parents)

Raises:
  • SvnLoaderHistoryAltered – When a hash divergence has been detected (should not happen)
  • SvnLoaderUneventful – Nothing changed since last visit
_check_revision_divergence(count, rev, dir_id)[source]

Check for hash revision computation divergence.

The Rationale behind this is that svn can trigger unknown edge cases (mixed CRLF, svn properties, etc…). Those are not always easy to spot. Adding a check will help in spotting missing edge cases.
Parameters:
  • count (int) – The number of revisions done so far
  • rev (dict) – The actual revision we are computing from
  • dir_id (bytes) – The actual directory for the given revision
Returns:

False if no hash divergence detected

Raises
ValueError if a hash divergence is detected
process_svn_revisions(svnrepo, revision_start, revision_end, revision_parents)[source]

Process svn revisions from revision_start to revision_end.

At each svn revision, apply new diffs and simultaneously compute swh hashes. This yields those computed swh hashes as a tuple (contents, directories, revision).

Note that at every self.check_revision, a supplementary check takes place to check for hash-tree divergence (related T570).

Yields:tuple (contents, directories, revision) of dict as a dictionary with keys, sha1_git, sha1, etc…
Raises:ValueError in case of a hash divergence detection
prepare_origin_visit(*, svn_url, visit_date=None, origin_url=None, **kwargs)[source]

First step executed by the loader to prepare origin and visit references. Set/update self.origin, and optionally self.origin_url, self.visit_date.

prepare(*, svn_url, destination_path=None, swh_revision=None, start_from_scratch=False, **kwargs)[source]

Second step executed by the loader to prepare some state needed by the loader.

fetch_data()[source]

Fetching svn revision information.

This will apply svn revision as patch on disk, and at the same time, compute the swh hashes.

In effect, fetch_data fetches those data and compute the necessary swh objects. It’s then stored in the internal state instance variables (initialized in _prepare_state).

This is up to store_data to actually discuss with the storage to store those objects.

Returns:True to continue fetching data (next svn revision), False to stop.
Return type:bool
store_data()[source]

We store the data accumulated in internal instance variable. If the iteration over the svn revisions is done, we create the snapshot and flush to storage the data.

This also resets the internal instance variable state.

generate_and_load_snapshot(revision=None, snapshot=None)[source]

Create the snapshot either from existing revision or snapshot.

Revision (supposedly new) has priority over the snapshot (supposedly existing one).

Parameters:
  • revision (dict) – Last revision seen if any (None by default)
  • snapshot (dict) – Snapshot to use if any (None by default)
load_status()[source]

Detailed loading status.

Defaults to logging an eventful load.

Returns: a dictionary that is eventually passed back as the task’s
result to the scheduler, allowing tuning of the task recurrence mechanism.
visit_status()[source]

Detailed visit status.

Defaults to logging a full visit.

__abstractmethods__ = frozenset()
__module__ = 'swh.loader.svn.loader'
_abc_impl = <_abc_data object>
class swh.loader.svn.loader.SvnLoaderFromDumpArchive(archive_path)[source]

Bases: swh.loader.svn.loader.SvnLoader

Uncompress an archive containing an svn dump, mount the svn dump as an svn repository and load said repository.

__init__(archive_path)[source]

Initialize self. See help(type(self)) for accurate signature.

prepare(*, svn_url, destination_path=None, swh_revision=None, start_from_scratch=False, **kwargs)[source]

Second step executed by the loader to prepare some state needed by the loader.

cleanup()[source]

Clean up the svn repository’s working representation on disk.

__abstractmethods__ = frozenset()
__module__ = 'swh.loader.svn.loader'
_abc_impl = <_abc_data object>
class swh.loader.svn.loader.SvnLoaderFromRemoteDump[source]

Bases: swh.loader.svn.loader.SvnLoader

Create a subversion repository dump using the svnrdump utility, mount it locally and load the repository from it.

__init__()[source]

Initialize self. See help(type(self)) for accurate signature.

get_last_loaded_svn_rev(svn_url)[source]

Check if the svn repository has already been visited and return the last loaded svn revision number or -1 otherwise.

dump_svn_revisions(svn_url, last_loaded_svn_rev=-1)[source]

Generate a subversion dump file using the svnrdump tool. If the svnrdump command failed somehow, the produced dump file is analyzed to determine if a partial loading is still feasible.

prepare(*, svn_url, destination_path=None, swh_revision=None, start_from_scratch=False, **kwargs)[source]

Second step executed by the loader to prepare some state needed by the loader.

cleanup()[source]

Clean up the svn repository’s working representation on disk.

__abstractmethods__ = frozenset()
__module__ = 'swh.loader.svn.loader'
_abc_impl = <_abc_data object>
visit_status()[source]

Detailed visit status.

Defaults to logging a full visit.

swh.loader.svn.ra module

Remote Access client to svn server.

swh.loader.svn.ra._normalize_line_endings(lines, eol_style='native')[source]

Normalize line endings to unix ( ), windows ( ) or mac ( ).

Args:

lines (bytes): The lines to normalize line_ending (str): The line ending format as defined for

svn:eol-style property. Acceptable values are ‘native’, ‘CRLF’, ‘LF’ and ‘CR’
Returns
bytes: lines with endings normalized
swh.loader.svn.ra.apply_txdelta_handler(sbuf, target_stream)[source]

Return a function that can be called repeatedly with txdelta windows. When done, closes the target_stream.

Adapted from subvertpy.delta.apply_txdelta_handler to close the stream when done.

Parameters:
  • sbuf – Source buffer
  • target_stream – Target stream to write to.
Returns:

Function to be called to apply txdelta windows

Read the svn link’s content.

Parameters:data (bytes) – svn link’s raw content
Returns:The tuple of (filetype, destination path)

Determine if a filepath is an svnlink or something else.

Parameters:fullpath (str/bytes) – Full path to the potential symlink to check
Returns:boolean value to determine if it’s indeed a symlink (as per svn) or not.
swh.loader.svn.ra._ra_codecs_error_handler(e)[source]
Subvertpy may fail to decode to utf-8 the user svn properties. As
they are not used by the loader, return an empty string instead of the decoded content.
Parameters:e (UnicodeDecodeError) – exception raised during the svn properties decoding.
class swh.loader.svn.ra.FileEditor(directory, rootpath, path)[source]

Bases: object

File Editor in charge of updating file on disk and memory objects.

__slots__ = ['directory', 'path', 'fullpath', 'executable', 'link']
__init__(directory, rootpath, path)[source]

Initialize self. See help(type(self)) for accurate signature.

directory
path
executable
fullpath
change_prop(key, value)[source]
apply_textdelta(base_checksum)[source]
close()[source]

When done with the file, this is called.

So the file exists and is updated, we can:

  • adapt accordingly its execution flag if any
  • compute the objects’ checksums
  • replace the svnlink with a real symlink (for disk computation purposes)

Convert the symlink to a svnlink on disk.

Returns:The symlink’s svnlink data (b'type <path-to-src>')

Convert the svnlink to a symlink on disk.

This function expects self.fullpath to be a svn link.

Parameters:src (bytes) – Path to the link’s source
Returns:The svnlink’s data tuple:
  • type (should be only ‘link’)
  • <path-to-src>
Return type:tuple
__module__ = 'swh.loader.svn.ra'
class swh.loader.svn.ra.BaseDirEditor(directory, rootpath)[source]

Bases: object

Base class implementation of dir editor.

see DirEditor for an implementation that hashes every directory encountered.

Instantiate a new class inheriting from this class and define the following functions:

def update_checksum(self):
    # Compute the checksums at current state

def open_directory(self, *args):
    # Update an existing folder.

def add_directory(self, *args):
    # Add a new one.
__slots__ = ['directory', 'rootpath']
__init__(directory, rootpath)[source]

Initialize self. See help(type(self)) for accurate signature.

directory
rootpath
remove_child(path)[source]

Remove a path from the current objects.

The path can be resolved as link, file or directory.

This function takes also care of removing the link between the child and the parent.

Parameters:path – to remove from the current objects.
update_checksum()[source]
open_directory(*args)[source]
add_directory(*args)[source]
open_file(*args)[source]

Updating existing file.

add_file(path, copyfrom_path=None, copyfrom_rev=-1)[source]

Creating a new file.

change_prop(key, value)[source]

Change property callback on directory.

delete_entry(path, revision)[source]

Remove a path.

close()[source]

Function called when we finish walking a repository.

__module__ = 'swh.loader.svn.ra'
class swh.loader.svn.ra.DirEditor(directory, rootpath)[source]

Bases: swh.loader.svn.ra.BaseDirEditor

Directory Editor in charge of updating directory hashes computation.

This implementation includes empty folder in the hash computation.

update_checksum()[source]

Update the root path self.path’s checksums according to the children’s objects.

This function is expected to be called when the folder has been completely ‘walked’.

open_directory(*args)[source]

Updating existing directory.

add_directory(path, copyfrom_path=None, copyfrom_rev=-1)[source]

Adding a new directory.

__dict__ = mappingproxy({'__module__': 'swh.loader.svn.ra', '__doc__': 'Directory Editor in charge of updating directory hashes computation.\n\n This implementation includes empty folder in the hash computation.\n\n ', 'update_checksum': <function DirEditor.update_checksum>, 'open_directory': <function DirEditor.open_directory>, 'add_directory': <function DirEditor.add_directory>, '__dict__': <attribute '__dict__' of 'DirEditor' objects>, '__weakref__': <attribute '__weakref__' of 'DirEditor' objects>})
__module__ = 'swh.loader.svn.ra'
__weakref__

list of weak references to the object (if defined)

class swh.loader.svn.ra.Editor(rootpath, directory)[source]

Bases: object

Editor in charge of replaying svn events and computing objects along.

This implementation accounts for empty folder during hash computations.

__init__(rootpath, directory)[source]

Initialize self. See help(type(self)) for accurate signature.

set_target_revision(revnum)[source]
abort()[source]
close()[source]
open_root(base_revnum)[source]
__dict__ = mappingproxy({'__module__': 'swh.loader.svn.ra', '__doc__': 'Editor in charge of replaying svn events and computing objects\n along.\n\n This implementation accounts for empty folder during hash\n computations.\n\n ', '__init__': <function Editor.__init__>, 'set_target_revision': <function Editor.set_target_revision>, 'abort': <function Editor.abort>, 'close': <function Editor.close>, 'open_root': <function Editor.open_root>, '__dict__': <attribute '__dict__' of 'Editor' objects>, '__weakref__': <attribute '__weakref__' of 'Editor' objects>})
__module__ = 'swh.loader.svn.ra'
__weakref__

list of weak references to the object (if defined)

class swh.loader.svn.ra.Replay(conn, rootpath, directory=None)[source]

Bases: object

Replay class.

__init__(conn, rootpath, directory=None)[source]

Initialize self. See help(type(self)) for accurate signature.

replay(rev)[source]

Replay svn actions between rev and rev+1.

This method updates in place the self.editor.directory, as well as the filesystem.

Returns:The updated root directory
compute_hashes(rev)[source]

Compute hashes at revisions rev. Expects the state to be at previous revision’s objects.

Parameters:rev – The revision to start the replay from.
Returns:The updated objects between rev and rev+1. Beware that this mutates the filesystem at rootpath accordingly.
__dict__ = mappingproxy({'__module__': 'swh.loader.svn.ra', '__doc__': 'Replay class.\n ', '__init__': <function Replay.__init__>, 'replay': <function Replay.replay>, 'compute_hashes': <function Replay.compute_hashes>, '__dict__': <attribute '__dict__' of 'Replay' objects>, '__weakref__': <attribute '__weakref__' of 'Replay' objects>})
__module__ = 'swh.loader.svn.ra'
__weakref__

list of weak references to the object (if defined)

swh.loader.svn.svn module

SVN client in charge of iterating over svn logs and yield commit representations including the hash tree/content computations per svn commit.

class swh.loader.svn.svn.SvnRepo(remote_url, origin_url, local_dirname)[source]

Bases: object

Svn repository representation.

Parameters:
  • remote_url (str) –
  • origin_url (str) – Associated origin identifier
  • local_dirname (str) – Path to write intermediary svn action results
__init__(remote_url, origin_url, local_dirname)[source]

Initialize self. See help(type(self)) for accurate signature.

__str__()[source]

Return str(self).

head_revision()[source]

Retrieve current head revision.

initial_revision()[source]

Retrieve the initial revision from which the remote url appeared.

convert_commit_message(msg)[source]

Simply encode the commit message.

Parameters:msg (str) – the commit message to convert.
Returns:The transformed message as bytes.
convert_commit_date(date)[source]

Convert the message commit date into a timestamp in swh format. The precision is kept.

Parameters:date (str) – the commit date to convert.
Returns:The transformed date.
convert_commit_author(author)[source]

Convert the commit author into an swh person.

The user becomes a dictionary of the form:

{
  name: author,
  email: '',
  fullname: author
}
Parameters:author (str) – the commit author to convert.
Returns:The transformed author as dict.
logs(revision_start, revision_end)[source]

Stream svn logs between revision_start and revision_end by chunks of block_size logs.

Yields revision and associated revision information between the revision start and revision_end.

Parameters:
  • revision_start – the svn revision starting bound
  • revision_end – the svn revision ending bound
Yields:

tuple

tuple of revisions and logs:

  • revisions: list of revisions in order

  • logs: Dictionary with key revision number and value the log entry. The log entry is a dictionary with the following keys:

    • author_date: date of the commit
    • author_name: name of the author
    • message: commit message
export(revision)[source]

Export the repository to a given version.

export_temporary(revision)[source]

Export the repository to a given revision in a temporary location. This is up to the caller of this function to clean up the temporary location when done (cf. self.clean_fs method)

Parameters:revision – Revision to export at
Returns:The tuple local_dirname the temporary location root folder, local_url where the repository was exported.
swh_hash_data_per_revision(start_revision, end_revision)[source]

Compute swh hash data per each revision between start_revision and end_revision.

Parameters:
  • start_revision – starting revision
  • end_revision – ending revision
Yields:

tuple (rev, nextrev, commit, objects_per_path) - rev: current revision - nextrev: next revision - commit: commit data (author, date, message) for such revision - objects_per_path: dictionary of path, swh hash data with type

swh_hash_data_at_revision(revision)[source]

Compute the hash data at revision.

Expected to be used for update only.

clean_fs(local_dirname=None)[source]

Clean up the local working copy.

Parameters:
  • local_dirname (str) – Path to remove recursively if
  • Otherwise, remove the temporary upper root tree (provided.) –
  • for svn repository loading. (used) –
_SvnRepo__to_entry(log_entry)
__dict__ = mappingproxy({'__module__': 'swh.loader.svn.svn', '__doc__': 'Svn repository representation.\n\n Args:\n remote_url (str):\n origin_url (str): Associated origin identifier\n local_dirname (str): Path to write intermediary svn action results\n\n ', '__init__': <function SvnRepo.__init__>, '__str__': <function SvnRepo.__str__>, 'head_revision': <function SvnRepo.head_revision>, 'initial_revision': <function SvnRepo.initial_revision>, 'convert_commit_message': <function SvnRepo.convert_commit_message>, 'convert_commit_date': <function SvnRepo.convert_commit_date>, 'convert_commit_author': <function SvnRepo.convert_commit_author>, '_SvnRepo__to_entry': <function SvnRepo.__to_entry>, 'logs': <function SvnRepo.logs>, 'export': <function SvnRepo.export>, 'export_temporary': <function SvnRepo.export_temporary>, 'swh_hash_data_per_revision': <function SvnRepo.swh_hash_data_per_revision>, 'swh_hash_data_at_revision': <function SvnRepo.swh_hash_data_at_revision>, 'clean_fs': <function SvnRepo.clean_fs>, '__dict__': <attribute '__dict__' of 'SvnRepo' objects>, '__weakref__': <attribute '__weakref__' of 'SvnRepo' objects>})
__module__ = 'swh.loader.svn.svn'
__weakref__

list of weak references to the object (if defined)

swh.loader.svn.tasks module

swh.loader.svn.utils module

swh.loader.svn.utils.strdate_to_timestamp(strdate)[source]

Convert a string date to an int timestamp.

Parameters:
  • strdate – A string representing a date with format like
  • 'YYYY-mm-DDTHH – MM:SS.800722Z’
Returns:

seconds, microseconds

Return type:

A couple of integers

class swh.loader.svn.utils.OutputStream(fileno)[source]

Bases: object

Helper class to read lines from a program output while it is running

Parameters:fileno (int) – File descriptor of a program output stream opened in text mode
__init__(fileno)[source]

Initialize self. See help(type(self)) for accurate signature.

read_lines()[source]

Read available lines from the output stream and return them.

Returns:
A tuple whose first member is the read
lines and second member a boolean indicating if there are still some other lines available to read.
Return type:Tuple[List[str], bool]
__dict__ = mappingproxy({'__module__': 'swh.loader.svn.utils', '__doc__': 'Helper class to read lines from a program output while\n it is running\n\n Args:\n fileno (int): File descriptor of a program output stream\n opened in text mode\n ', '__init__': <function OutputStream.__init__>, 'read_lines': <function OutputStream.read_lines>, '__dict__': <attribute '__dict__' of 'OutputStream' objects>, '__weakref__': <attribute '__weakref__' of 'OutputStream' objects>})
__module__ = 'swh.loader.svn.utils'
__weakref__

list of weak references to the object (if defined)

swh.loader.svn.utils.init_svn_repo_from_dump(dump_path, prefix=None, suffix=None, root_dir='/tmp', gzip=False)[source]

Given a path to a svn dump. Initialize an svn repository with the content of said dump.

Returns:

  • temporary folder (str): containing the mounted repository
  • repo_path (str): path to the mounted repository inside the
    temporary folder

Return type:

A tuple

Raises:
  • ValueError in case of failure to run the command to uncompress
  • and load the dump.
swh.loader.svn.utils.init_svn_repo_from_archive_dump(archive_path, prefix=None, suffix=None, root_dir='/tmp')[source]

Given a path to an archive containing an svn dump. Initialize an svn repository with the content of said dump.

Returns:

  • temporary folder (str): containing the mounted repository
  • repo_path (str): path to the mounted repository inside the
    temporary folder

Return type:

A tuple

Raises:
  • ValueError in case of failure to run the command to uncompress
  • and load the dump.

Module contents