swh.web.save_code_now.origin_save module#

swh.web.save_code_now.origin_save.get_origin_save_authorized_urls() List[str][source]#

Get the list of origin url prefixes authorized to be immediately loaded into the archive (whitelist).

Returns:

The list of authorized origin url prefix

Return type:

list

swh.web.save_code_now.origin_save.get_origin_save_unauthorized_urls() List[str][source]#

Get the list of origin url prefixes forbidden to be loaded into the archive (blacklist).

Returns:

the list of unauthorized origin url prefix

Return type:

list

swh.web.save_code_now.origin_save.can_save_origin(origin_url: str, bypass_pending_review: bool = False) str[source]#

Check if a software origin can be saved into the archive.

Based on the origin url, the save request will be either:

  • immediately accepted if the url is whitelisted

  • rejected if the url is blacklisted

  • put in pending state for manual review otherwise

Parameters:

origin_url (str) – the software origin url to check

Returns:

the origin save request status, either accepted, rejected or pending

Return type:

str

swh.web.save_code_now.origin_save.get_scheduler_load_task_types() List[str][source]#
swh.web.save_code_now.origin_save.get_savable_visit_types_dict(privileged_user: bool = False) Dict[source]#

Returned the supported task types the user has access to.

Parameters:

privileged_user – Flag to determine if all visit types should be returned or not. Default to False to only list unprivileged visit types.

Returns:

the dict of supported visit types for the user

swh.web.save_code_now.origin_save.get_savable_visit_types(privileged_user: bool = False) List[str][source]#

Return the list of visit types the user can perform save requests on.

Parameters:

privileged_user – Flag to determine if all visit types should be returned or not. Default to False to only list unprivileged visit types.

Returns:

the list of saveable visit types

swh.web.save_code_now.origin_save.validate_origin_url(origin_url: str) None[source]#

Check an origin URL is well formed and does not contain password.

Parameters:

origin_url – The URL to check

Raises:

BadInputExc – if one of the checks failed

swh.web.save_code_now.origin_save.origin_exists(origin_url: str) OriginExistenceCheckInfo[source]#

Check the origin url for existence. If it exists, extract some more useful information on the origin.

swh.web.save_code_now.origin_save.create_save_origin_request(visit_type: str, origin_url: str, privileged_user: bool = False, user_id: int | None = None, from_webhook: bool = False, webhook_origin: str | None = None, **kwargs) SaveOriginRequestInfo[source]#

Create a loading task to save a software origin into the archive.

This function aims to create a software origin loading task through the use of the swh-scheduler component.

First, some checks are performed to see if the visit type and origin url are valid but also if the the save request can be accepted. For the ‘archives’ visit type, this also ensures the artifacts actually exists. If those checks passed, the loading task is then created. Otherwise, the save request is put in pending or rejected state.

All the submitted save requests are logged into the swh-web database to keep track of them.

Parameters:
  • visit_type – the type of visit to perform (e.g. git, hg, svn, archives, …)

  • origin_url – the url of the origin to save

  • privileged – Whether the user has some more privilege than other (bypass review, access to privileged other visit types)

  • user_id – User identifier (provided when authenticated)

  • from_webhook – Indicates if the save request is created from a webhook receiver

  • webhook_origin – Indicates which forge type sent the webhook

  • kwargs – Optional parameters (e.g. artifact_url, artifact_filename, artifact_version)

Raises:
  • BadInputExc – the visit type or origin url is invalid or inexistent

  • ForbiddenExc – the provided origin url is blacklisted

Returns:

A dict describing the save request with the following keys:

  • visit_type: the type of visit to perform

  • origin_url: the url of the origin

  • save_request_date: the date the request was submitted

  • save_request_status: the request status, either accepted, rejected or pending

  • save_task_status: the origin loading task status, either not created, pending, scheduled, running, succeeded or failed

Return type:

dict

swh.web.save_code_now.origin_save.update_save_origin_requests_from_queryset(requests_queryset: QuerySet) List[SaveOriginRequestInfo][source]#

Update all save requests from a SaveOriginRequest queryset, update their status in db and return the list of impacted save_requests.

Parameters:

requests_queryset – input SaveOriginRequest queryset

Returns:

A list of save origin request info dicts as described in swh.web.save_code_now.origin_save.create_save_origin_request()

Return type:

list

swh.web.save_code_now.origin_save.get_save_origin_requests_to_update(origin_url: str | None = None) QuerySet[source]#

Get the set of recent save origin requests that have non terminal statuses and require update.

Non-terminal requests are those whose status is accepted and their task status are either created, pending, scheduled or running.

Parameters:

origin_url – If provided, only return requests to update for the given origin URL

Returns:

Django queryset of requests to update

swh.web.save_code_now.origin_save.refresh_save_origin_request_statuses() List[SaveOriginRequestInfo][source]#

Refresh non-terminal save origin requests (SOR) in the backend.

Non-terminal SOR are requests whose status is accepted and their task status are either created, pending, scheduled or running.

This shall compute this list of save requests, checks their status in the scheduler, then update those in database.

Finally, this returns the refreshed information on those save requests.

swh.web.save_code_now.origin_save.get_save_origin_requests(visit_type: str, origin_url: str) List[SaveOriginRequestInfo][source]#

Get all save requests for a given software origin.

Parameters:
  • visit_type – the type of visit

  • origin_url – the url of the origin

Raises:
Returns:

A list of save origin requests dict as described in swh.web.save_code_now.origin_save.create_save_origin_request()

Return type:

list

swh.web.save_code_now.origin_save.get_save_origin_request(request_id: int) SaveOriginRequestInfo[source]#

Get save request with given identifier.

Parameters:

request_id – the save request identifier

Raises:

swh.web.utils.exc.NotFoundExc – no save request can be found for the given identifier

Returns:

A save origin request dict as described in swh.web.save_code_now.origin_save.create_save_origin_request()

swh.web.save_code_now.origin_save.get_save_origin_task_info(save_request_id: int) Dict[str, Any][source]#

Get detailed information about an accepted save origin request and its associated loading task.

If the associated loading task info is archived and removed from the scheduler database, returns an empty dictionary.

Parameters:

save_request_id – identifier of a save origin request

Returns:

  • type: loading task type

  • arguments: loading task arguments

  • id: loading task database identifier

  • backend_id: loading task celery identifier

  • scheduled: loading task scheduling date

  • ended: loading task termination date

  • status: loading task execution status

  • visit_status: Actual visit status

  • metadata: any other metadata related to the loading task;

    typically comes with the error for a failed task

Return type:

A dictionary with the following keys

swh.web.save_code_now.origin_save.schedule_origins_recurrent_visits(save_requests: List[SaveOriginRequestInfo]) int[source]#

Schedule recurrent visits of origin URLs submitted to Save Code Now.

Parameters:

save_requests – List of save origin requests from which to schedule recurrent visits

Returns:

The number of origins that were scheduled for recurrent visits

swh.web.save_code_now.origin_save.has_pending_save_code_now_requests() bool[source]#

Return True if at least one submitted save request requires manual validation by staff member.