swh.web.common.origin_save module

swh.web.common.origin_save.get_origin_save_authorized_urls()[source]

Get the list of origin url prefixes authorized to be immediately loaded into the archive (whitelist).

Returns

The list of authorized origin url prefix

Return type

list

swh.web.common.origin_save.get_origin_save_unauthorized_urls()[source]

Get the list of origin url prefixes forbidden to be loaded into the archive (blacklist).

Returns

the list of unauthorized origin url prefix

Return type

list

swh.web.common.origin_save.can_save_origin(origin_url)[source]

Check if a software origin can be saved into the archive.

Based on the origin url, the save request will be either:

  • immediately accepted if the url is whitelisted

  • rejected if the url is blacklisted

  • put in pending state for manual review otherwise

Parameters

origin_url (str) – the software origin url to check

Returns

the origin save request status, either accepted, rejected or pending

Return type

str

swh.web.common.origin_save.get_savable_visit_types()[source]
swh.web.common.origin_save.create_save_origin_request(visit_type, origin_url)[source]

Create a loading task to save a software origin into the archive.

This function aims to create a software origin loading task trough the use of the swh-scheduler component.

First, some checks are performed to see if the visit type and origin url are valid but also if the the save request can be accepted. If those checks passed, the loading task is then created. Otherwise, the save request is put in pending or rejected state.

All the submitted save requests are logged into the swh-web database to keep track of them.

Parameters
  • visit_type (str) – the type of visit to perform (currently only git but svn and hg will soon be available)

  • origin_url (str) – the url of the origin to save

Raises
Returns

A dict describing the save request with the following keys:

  • visit_type: the type of visit to perform

  • origin_url: the url of the origin

  • save_request_date: the date the request was submitted

  • save_request_status: the request status, either accepted, rejected or pending

  • save_task_status: the origin loading task status, either not created, not yet scheduled, scheduled, succeed or failed

Return type

dict

swh.web.common.origin_save.get_save_origin_requests_from_queryset(requests_queryset)[source]

Get all save requests from a SaveOriginRequest queryset.

Parameters

requests_queryset (django.db.models.QuerySet) – input SaveOriginRequest queryset

Returns

A list of save origin requests dict as described in swh.web.common.origin_save.create_save_origin_request()

Return type

list

swh.web.common.origin_save.get_save_origin_requests(visit_type, origin_url)[source]

Get all save requests for a given software origin.

Parameters
  • visit_type (str) – the type of visit

  • origin_url (str) – the url of the origin

Raises
Returns

A list of save origin requests dict as described in swh.web.common.origin_save.create_save_origin_request()

Return type

list

swh.web.common.origin_save.get_save_origin_task_info(save_request_id: int, full_info: bool = True) → Dict[str, Any][source]

Get detailed information about an accepted save origin request and its associated loading task.

If the associated loading task info is archived and removed from the scheduler database, returns an empty dictionary.

Parameters
  • save_request_id – identifier of a save origin request

  • full_info – whether to return detailed info for staff users

Returns

  • type: loading task type
    • arguments: loading task arguments

    • id: loading task database identifier

    • backend_id: loading task celery identifier

    • scheduled: loading task scheduling date

    • ended: loading task termination date

    • status: loading task execution status

Depending on the availability of the task logs in the elasticsearch cluster of Software Heritage, the returned dictionary may also contain the following keys:

  • name: associated celery task name

  • message: relevant log message from task execution

  • duration: task execution time (only if it succeeded)

  • worker: name of the worker that executed the task

Return type

A dictionary with the following keys

swh.web.common.origin_save.compute_save_requests_metrics()[source]

Compute a couple of Prometheus metrics related to origin save requests