swh.web.save_bulk.api_views module#

class swh.web.save_bulk.api_views.OriginsDataCSVParser[source]#

Bases: BaseParser

media_type = 'text/csv'#
parse(stream, media_type=None, parser_context=None)[source]#

Given a stream to read from, return the parsed representation. Should return parsed data, or a DataAndFiles object consisting of the parsed data and files.

swh.web.save_bulk.api_views.api_origin_save_bulk(request: Request) Response[source]#
POST /api/1/origin/save/bulk/#

Request the saving of multiple software origins into the archive.

That endpoint enables to request the archival of multiple software origins through a POST request containing a list of origin URLs and their visit types in its body.

The following visit types are supported: bzr, cvs, hg, git, svn and tarball-directory.

The origins list data can be provided using the following content types:

  • text/csv (default)

    When using CSV format, first column must contain origin URLs and second column the visit types.

    "https://git.example.org/user/project","git"
    "https://download.example.org/project/source.tar.gz","tarball-directory"
    

    To post the content of such file to the endpoint, you can use the following curl command.

    $ curl -X POST -H "Authorization: Bearer ****" \
        --data-binary @/path/to/origins.csv \
        https://archive.softwareheritage.org/api/1/origin/save/bulk/
    
  • application/json

    When using JSON format, the following schema must be used.

    [
        {
            "origin_url": "https://git.example.org/user/project",
            "visit_type": "git"
        },
        {
            "origin_url": "https://download.example.org/project/source.tar.gz",
            "visit_type": "tarball-directory"
        }
    ]
    

    To post the content of such file to the endpoint, you can use the following curl command.

    $ curl -X POST -H "Authorization: Bearer ****" \
        -H "Content-Type: application/json" \
        --data-binary @/path/to/origins.json \
        https://archive.softwareheritage.org/api/1/origin/save/bulk/
    
  • application/yaml

    When using YAML format, the following schema must be used.

    - origin_url: https://git.example.org/user/project
      visit_type: git
    
    - origin_url: https://download.example.org/project/source.tar.gz
      visit_type: tarball-directory
    

    To post the content of such file to the endpoint, you can use the following curl command.

    $ curl -X POST -H "Authorization: Bearer ****" \
        -H "Content-Type: application/yaml" \
        --data-binary @/path/to/origins.yaml \
        https://archive.softwareheritage.org/api/1/origin/save/bulk/
    

Once received, origins data are checked for correctness by validating URLs and verifying if visit types are supported. A request cannot be accepted if at least one origin is not valid. All origins with invalid format will be reported in the rejected request response.

Warning

That endpoint is not publicly available and requires authentication and special user permission in order to request it.

Request Headers:
  • Accept – the requested response content type, either application/json (default) or application/yaml

  • Content-Type – the content type of posted data, either text/csv (default), application/json or application/yaml

Response Headers:
Response JSON Object:
  • status (string) – either accepted or rejected

  • reason (string) – details about why a request got rejected

  • request_id (string) – request identifier (only when it its accepted)

  • rejected_origins (array) – list of rejected origins and details about the reasons (only when the request is rejected)

Status Codes:
class swh.web.save_bulk.api_views.SumbittedOriginInfo[source]#

Bases: TypedDict

origin_url: str#
visit_type: str#
status: str#
last_scheduling_date: str | None#
last_visit_date: str | None#
last_visit_status: str | None#
last_snapshot_swhid: str | None#
rejection_reason: str | None#
browse_url: str | None#
swh.web.save_bulk.api_views.api_origin_save_bulk_request_info(request: Request, request_id: UUID)[source]#
GET /api/1/origin/save/bulk/request/(request_id)/#

Get feedback about loading statuses of origins submitted through a save bulk request.

That endpoint enables to track the archival statuses of origins sumitted through a POST request using the POST /api/1/origin/save/bulk/ endpoint. Info about submitted origins are returned in a paginated way.

Note

Only origin visits whose dates are greater than the request date are reported by that endpoint.

Warning

That endpoint is not publicly available and requires authentication and special user permission in order to request it. Staff users are also allowed to query it.

Warning

Only the user that created a save bulk request or a staff user can get feedback about it.

Parameters:
  • request_id (string) – UUID identifier of a save bulk request

Query Parameters:
  • page (number) – The submitted origins info page number to retrieve

  • per_page (number) – Number of submitted origins info per page, default to 1000, maximum is 10000

Response JSON Array of Objects:
  • origin_url (string) – URL of submitted origin

  • visit_type (string) – visit type for the origin

  • status (string) – submitted origin status, either pending, accepted or rejected

  • last_scheduling_date (date) – ISO8601/RFC3339 representation of the last date (in UTC) when the origin was scheduled for loading into the archive, null if the origin got rejected

  • last_visit_date (date) – ISO8601/RFC3339 representation of the last date (in UTC) when the origin was visited by Software Heritage, null if the origin got rejected or was not visited yet

  • last_visit_status (string) – last visit status for the origin, either successful or failed, null if the origin got rejected or was not visited yet

  • last_snapshot_swhid (string) – last produced snapshot SWHID associated to the visit, null if the origin got rejected or was not visited yet

  • rejection_reason (string) – if the origin got rejected gives more details about it

  • browse_url (string) – URL to browse the submitted origin if it got accepted and loaded into the archive, null if the origin got rejected or was not visited yet

Request Headers:
  • Accept – the requested response content type, either application/json (default) or application/yaml

Response Headers:
  • Content-Type – this depends on Accept header of request

  • Link – indicates that a subsequent result page is available and contains the url pointing to it

Status Codes:
  • 200 OK – no error

  • 401 Unauthorized – request is not authenticated

  • 403 Forbidden – user does not have permission to query the endpoint or get feedback about a request he did not submit

swh.web.save_bulk.api_views.api_origin_save_bulk_requests(request: Request)[source]#
GET /api/1/origin/save/bulk/requests/#

List previously submitted save bulk requests.

That endpoint enables to list the save bulk requests submitted by your user account and get their info URLs (see GET /api/1/origin/save/bulk/request/(request_id)/). That list is returned in a paginated way if the number or requests is large.

Warning

That endpoint is not publicly available and requires authentication and special user permission in order to request it.

Query Parameters:
  • page (number) – The submitted requests page number to retrieve

  • per_page (number) – Number of submitted requests per page, default to 1000, maximum is 10000

Response JSON Array of Objects:
  • request_id (string) – UUID identifier of the request

  • request_date (date) – the date the request was submitted

  • request_info_url (string) – URL to get detailed info about the request

Request Headers:
  • Accept – the requested response content type, either application/json (default) or application/yaml

Response Headers:
  • Content-Type – this depends on Accept header of request

  • Link – indicates that a subsequent result page is available and contains the url pointing to it

Status Codes: