swh.loader.core.loader module

class swh.loader.core.loader.BaseLoader(logging_class: Optional[str] = None, config: Dict[str, Any] = {})[source]

Bases: swh.core.config.SWHConfig

Mixin base class for loader.

To use this class, you must:

  • inherit from this class

  • and implement the @abstractmethod methods:

    • prepare(): First step executed by the loader to prepare some state needed by the func:load method.

    • get_origin(): Retrieve the origin that is currently being loaded.

    • fetch_data(): Fetch the data is actually the method to implement to compute data to inject in swh (through the store_data method)

    • store_data(): Store data fetched.

    • visit_status(): Explicit status of the visit (‘partial’ or ‘full’)

    • load_status(): Explicit status of the loading, for use by the scheduler (eventful/uneventful/temporary failure/permanent failure).

    • cleanup(): Last step executed by the loader.

The entry point for the resulting loader is load().

You can take a look at some example classes:

  • BaseSvnLoader

CONFIG_BASE_FILENAME = None
DEFAULT_CONFIG = {'max_content_size': ('int', 104857600), 'save_data': ('bool', False), 'save_data_path': ('str', ''), 'storage': ('dict', {'cls': 'remote', 'url': 'http://localhost:5002/'})}
ADDITIONAL_CONFIG = {}
save_data() → None[source]

Save the data associated to the current load

get_save_data_path() → str[source]

The path to which we archive the loader’s raw data

flush() → None[source]

Flush any potential buffered data not sent to swh-storage.

abstract cleanup() → None[source]

Last step executed by the loader.

abstract prepare_origin_visit(*args, **kwargs) → None[source]

First step executed by the loader to prepare origin and visit references. Set/update self.origin, and optionally self.origin_url, self.visit_date.

abstract prepare(*args, **kwargs) → None[source]

Second step executed by the loader to prepare some state needed by the loader.

get_origin()swh.model.model.Origin[source]

Get the origin that is currently being loaded. self.origin should be set in prepare_origin()

Returns

an origin ready to be sent to storage by origin_add().

Return type

dict

abstract fetch_data() → bool[source]
Fetch the data from the source the loader is currently loading

(ex: git/hg/svn/… repository).

Returns

a value that is interpreted as a boolean. If True, fetch_data needs to be called again to complete loading.

abstract store_data()[source]

Store fetched data in the database.

Should call the maybe_load_xyz() methods, which handle the bundles sent to storage, rather than send directly.

store_metadata() → None[source]

Store fetched metadata in the database.

For more information, see implementation in DepositLoader.

load_status() → Dict[str, str][source]

Detailed loading status.

Defaults to logging an eventful load.

Returns: a dictionary that is eventually passed back as the task’s

result to the scheduler, allowing tuning of the task recurrence mechanism.

post_load(success: bool = True) → None[source]

Permit the loader to do some additional actions according to status after the loading is done. The flag success indicates the loading’s status.

Defaults to doing nothing.

This is up to the implementer of this method to make sure this does not break.

Parameters

success (bool) – the success status of the loading

visit_status() → str[source]

Detailed visit status.

Defaults to logging a full visit.

pre_cleanup() → None[source]

As a first step, will try and check for dangling data to cleanup. This should do its best to avoid raising issues.

load(*args, **kwargs) → Dict[str, str][source]

Loading logic for the loader to follow:

class swh.loader.core.loader.DVCSLoader(logging_class: Optional[str] = None, config: Dict[str, Any] = {})[source]

Bases: swh.loader.core.loader.BaseLoader

This base class is a pattern for dvcs loaders (e.g. git, mercurial).

Those loaders are able to load all the data in one go. For example, the loader defined in swh-loader-git BulkUpdater.

For other loaders (stateful one, (e.g SWHSvnLoader), inherit directly from BaseLoader.

ADDITIONAL_CONFIG = {}
cleanup() → None[source]

Clean up an eventual state installed for computations.

has_contents() → bool[source]

Checks whether we need to load contents

get_contents() → Iterable[swh.model.model.BaseContent][source]

Get the contents that need to be loaded

has_directories() → bool[source]

Checks whether we need to load directories

get_directories() → Iterable[swh.model.model.Directory][source]

Get the directories that need to be loaded

has_revisions() → bool[source]

Checks whether we need to load revisions

get_revisions() → Iterable[swh.model.model.Revision][source]

Get the revisions that need to be loaded

has_releases() → bool[source]

Checks whether we need to load releases

get_releases() → Iterable[swh.model.model.Release][source]

Get the releases that need to be loaded

get_snapshot()swh.model.model.Snapshot[source]

Get the snapshot that needs to be loaded

eventful() → bool[source]

Whether the load was eventful

store_data() → None[source]

Store fetched data in the database.

Should call the maybe_load_xyz() methods, which handle the bundles sent to storage, rather than send directly.