swh.loader.core.loader module¶
-
class
swh.loader.core.loader.
BaseLoader
(logging_class: Optional[str] = None, config: Optional[Dict[str, Any]] = None)[source]¶ Bases:
object
Mixin base class for loader.
To use this class, you must:
inherit from this class
and implement the @abstractmethod methods:
prepare()
: First step executed by the loader to prepare some state needed by the func:load method.get_origin()
: Retrieve the origin that is currently being loaded.fetch_data()
: Fetch the data is actually the method to implement to compute data to inject in swh (through the store_data method)store_data()
: Store data fetched.visit_status()
: Explicit status of the visit (‘partial’ or ‘full’)load_status()
: Explicit status of the loading, for use by the scheduler (eventful/uneventful/temporary failure/permanent failure).cleanup()
: Last step executed by the loader.
The entry point for the resulting loader is
load()
.You can take a look at some example classes:
SvnLoader
-
abstract
prepare_origin_visit
(*args, **kwargs) → None[source]¶ First step executed by the loader to prepare origin and visit references. Set/update self.origin, and optionally self.origin_url, self.visit_date.
-
abstract
prepare
(*args, **kwargs) → None[source]¶ Second step executed by the loader to prepare some state needed by the loader.
-
get_origin
() → swh.model.model.Origin[source]¶ Get the origin that is currently being loaded. self.origin should be set in
prepare_origin()
- Returns
an origin ready to be sent to storage by
origin_add()
.- Return type
dict
-
abstract
fetch_data
() → bool[source]¶ - Fetch the data from the source the loader is currently loading
(ex: git/hg/svn/… repository).
- Returns
a value that is interpreted as a boolean. If True, fetch_data needs to be called again to complete loading.
-
abstract
store_data
()[source]¶ Store fetched data in the database.
Should call the
maybe_load_xyz()
methods, which handle the bundles sent to storage, rather than send directly.
-
store_metadata
() → None[source]¶ Store fetched metadata in the database.
For more information, see implementation in
DepositLoader
.
-
load_status
() → Dict[str, str][source]¶ Detailed loading status.
Defaults to logging an eventful load.
- Returns: a dictionary that is eventually passed back as the task’s
result to the scheduler, allowing tuning of the task recurrence mechanism.
-
post_load
(success: bool = True) → None[source]¶ Permit the loader to do some additional actions according to status after the loading is done. The flag success indicates the loading’s status.
Defaults to doing nothing.
This is up to the implementer of this method to make sure this does not break.
- Parameters
success (bool) – the success status of the loading
-
pre_cleanup
() → None[source]¶ As a first step, will try and check for dangling data to cleanup. This should do its best to avoid raising issues.
-
load
(*args, **kwargs) → Dict[str, str][source]¶ Loading logic for the loader to follow:
Call
prepare_origin_visit()
to prepare the origin and visit we will associate loading data to
Store the actual
origin_visit
to storage
Call
prepare()
to prepare any eventual state
Call
get_origin()
to get the origin we work with and store
while True:
Call
fetch_data()
to fetch the data to store
Call
store_data()
to store the data
-
class
swh.loader.core.loader.
DVCSLoader
(logging_class: Optional[str] = None, config: Optional[Dict[str, Any]] = None)[source]¶ Bases:
swh.loader.core.loader.BaseLoader
This base class is a pattern for dvcs loaders (e.g. git, mercurial).
Those loaders are able to load all the data in one go. For example, the loader defined in swh-loader-git
BulkUpdater
.For other loaders (stateful one, (e.g
SWHSvnLoader
), inherit directly fromBaseLoader
.-
get_contents
() → Iterable[swh.model.model.BaseContent][source]¶ Get the contents that need to be loaded
-
get_directories
() → Iterable[swh.model.model.Directory][source]¶ Get the directories that need to be loaded
-
get_revisions
() → Iterable[swh.model.model.Revision][source]¶ Get the revisions that need to be loaded
-
get_releases
() → Iterable[swh.model.model.Release][source]¶ Get the releases that need to be loaded
-
get_snapshot
() → swh.model.model.Snapshot[source]¶ Get the snapshot that needs to be loaded
-