swh.deposit.loader package


swh.deposit.loader.checker module

class swh.deposit.loader.checker.DepositChecker(client=None)[source]

Bases: object

Deposit checker implementation.

Trigger deposit’s checks through the private api.


swh.deposit.loader.loader module

class swh.deposit.loader.loader.DepositLoader(client=None)[source]

Bases: swh.loader.tar.loader.LegacyLocalTarLoader

Deposit loader implementation.

This is a subclass of the :class:TarLoader as the main goal of this class is to first retrieve the deposit’s tarball contents as one and its associated metadata. Then provide said tarball to be loaded by the TarLoader.

This will:

  • retrieves the deposit’s archive locally
  • provide the archive to be loaded by the tar loader
  • clean up the temporary location used to retrieve the archive locally
  • update the deposit’s status accordingly
CONFIG_BASE_FILENAME = 'loader/deposit'
ADDITIONAL_CONFIG = {'extraction_dir': ('str', '/tmp/swh.deposit.loader/')}
load(*, archive_url, deposit_meta_url, deposit_update_url)[source]

Loading logic for the loader to follow:

    1. Call prepare_origin_visit() to prepare the origin and visit we will associate loading data to
    1. Store the actual origin_visit to storage
    1. Call prepare() to prepare any eventual state
    1. Call get_origin() to get the origin we work with and store
  • while True:
      1. Call fetch_data() to fetch the data to store
      1. Call store_data() to store the data
    1. Call cleanup() to clean up any eventual state put in place in prepare() method.
prepare_origin_visit(*, deposit_meta_url, **kwargs)[source]

Prepare the origin visit information.

  • origin (dict) – Dict with keys {url, type}
  • visit_date (str) – Date representing the date of the visit. None by default will make it the current time during the loading process.
prepare(*, archive_url, deposit_meta_url, deposit_update_url)[source]

Prepare the loading by first retrieving the deposit’s raw archive content.


Storing the origin_metadata during the load processus.

Provider_id and tool_id are resolved during the prepare() method.


Updating the deposit’s status according to its loading status.

If not successful, we update its status to ‘failed’. Otherwise, we update its status to ‘done’ and pass along its associated revision.


Clean up temporary directory where we retrieved the tarball.

swh.deposit.loader.scheduler module

Module in charge of sending deposit loading/checking as either celery task or scheduled one-shot tasks.

class swh.deposit.loader.scheduler.SWHScheduling[source]

Bases: swh.core.config.SWHConfig

Base swh scheduling class to aggregate the schedule deposit loading.

CONFIG_BASE_FILENAME = 'deposit/server'
DEFAULT_CONFIG = {'dry_run': ('bool', False)}

Schedule the new deposit loading.

Parameters:data (dict) – Deposit aggregated data
class swh.deposit.loader.scheduler.SWHCeleryScheduling(config=None)[source]

Bases: swh.deposit.loader.scheduler.SWHScheduling

Deposit loading as Celery task scheduling.


Schedule the new deposit loading directly through celery.

Parameters:depositdata (dict) – Deposit aggregated information.
class swh.deposit.loader.scheduler.SWHSchedulerScheduling(config=None)[source]

Bases: swh.deposit.loader.scheduler.SWHScheduling

Deposit loading through SWH’s task scheduling interface.

ADDITIONAL_CONFIG = {'scheduler': ('dict', {'args': {'url': 'http://localhost:5008'}, 'cls': 'remote'})}

Schedule the new deposit loading through swh.scheduler’s api.

Parameters:deposits (dict) – Deposit aggregated information.

Filter deposit given a specific status.


Convert deposit to argument for task to be executed.

swh.deposit.loader.tasks module

class swh.deposit.loader.tasks.LoadDepositArchiveTsk[source]

Bases: swh.scheduler.task.Task

Deposit archive loading task described by the following steps:

  1. Retrieve tarball from deposit’s private api and store locally in a temporary directory
  2. Trigger the loading
  3. clean up the temporary directory
  4. Update the deposit’s status according to result using the deposit’s private update status api
task_queue = 'swh_loader_deposit'
run_task(*, archive_url, deposit_meta_url, deposit_update_url)[source]

Import a deposit tarball into swh.

Args: see DepositLoader.load().

ignore_result = False
rate_limit = None
reject_on_worker_lost = None
request_stack = <celery.utils.threads._LocalStack object>
serializer = 'json'
store_errors_even_if_ignored = False
track_started = False
typing = True
class swh.deposit.loader.tasks.ChecksDepositTsk[source]

Bases: swh.scheduler.task.Task

Deposit checks task.

task_queue = 'swh_checker_deposit'

Check a deposit’s status

Args: see DepositChecker.check().

ignore_result = False
rate_limit = None
reject_on_worker_lost = None
request_stack = <celery.utils.threads._LocalStack object>
serializer = 'json'
store_errors_even_if_ignored = False
track_started = False
typing = True

Module contents