swh.web.common.origin_save module¶
Get the list of origin url prefixes authorized to be immediately loaded into the archive (whitelist).
- Returns
The list of authorized origin url prefix
- Return type
list
Get the list of origin url prefixes forbidden to be loaded into the archive (blacklist).
- Returns
the list of unauthorized origin url prefix
- Return type
list
-
swh.web.common.origin_save.
can_save_origin
(origin_url)[source]¶ Check if a software origin can be saved into the archive.
Based on the origin url, the save request will be either:
immediately accepted if the url is whitelisted
rejected if the url is blacklisted
put in pending state for manual review otherwise
- Parameters
origin_url (str) – the software origin url to check
- Returns
the origin save request status, either accepted, rejected or pending
- Return type
str
-
swh.web.common.origin_save.
create_save_origin_request
(visit_type, origin_url)[source]¶ Create a loading task to save a software origin into the archive.
This function aims to create a software origin loading task trough the use of the swh-scheduler component.
First, some checks are performed to see if the visit type and origin url are valid but also if the the save request can be accepted. If those checks passed, the loading task is then created. Otherwise, the save request is put in pending or rejected state.
All the submitted save requests are logged into the swh-web database to keep track of them.
- Parameters
visit_type (str) – the type of visit to perform (currently only
git
butsvn
andhg
will soon be available)origin_url (str) – the url of the origin to save
- Raises
BadInputExc – the visit type or origin url is invalid
ForbiddenExc – the provided origin url is blacklisted
- Returns
A dict describing the save request with the following keys:
visit_type: the type of visit to perform
origin_url: the url of the origin
save_request_date: the date the request was submitted
save_request_status: the request status, either accepted, rejected or pending
save_task_status: the origin loading task status, either not created, not yet scheduled, scheduled, succeed or failed
- Return type
dict
-
swh.web.common.origin_save.
get_save_origin_requests_from_queryset
(requests_queryset)[source]¶ Get all save requests from a SaveOriginRequest queryset.
- Parameters
requests_queryset (django.db.models.QuerySet) – input SaveOriginRequest queryset
- Returns
A list of save origin requests dict as described in
swh.web.common.origin_save.create_save_origin_request()
- Return type
list
-
swh.web.common.origin_save.
get_save_origin_requests
(visit_type, origin_url)[source]¶ Get all save requests for a given software origin.
- Parameters
visit_type (str) – the type of visit
origin_url (str) – the url of the origin
- Raises
BadInputExc – the visit type or origin url is invalid
swh.web.common.exc.NotFoundExc – no save requests can be found for the given origin
- Returns
A list of save origin requests dict as described in
swh.web.common.origin_save.create_save_origin_request()
- Return type
list
-
swh.web.common.origin_save.
get_save_origin_task_info
(save_request_id: int, full_info: bool = True) → Dict[str, Any][source]¶ Get detailed information about an accepted save origin request and its associated loading task.
If the associated loading task info is archived and removed from the scheduler database, returns an empty dictionary.
- Parameters
save_request_id – identifier of a save origin request
full_info – whether to return detailed info for staff users
- Returns
- type: loading task type
arguments: loading task arguments
id: loading task database identifier
backend_id: loading task celery identifier
scheduled: loading task scheduling date
ended: loading task termination date
status: loading task execution status
Depending on the availability of the task logs in the elasticsearch cluster of Software Heritage, the returned dictionary may also contain the following keys:
name: associated celery task name
message: relevant log message from task execution
duration: task execution time (only if it succeeded)
worker: name of the worker that executed the task
- Return type
A dictionary with the following keys