Origin#

GET /api/1/origin/(origin_url)/get/#

Get information about a software origin.

Parameters:
  • origin_url (string) – the origin url

Response JSON Object:
  • origin_visits_url (string) – link to in order to get information about the visits for that origin

  • url (string) – the origin canonical url

  • metadata_authorities_url (string) – link to GET /api/1/raw-extrinsic-metadata/swhid/(target)/authorities/ to get the list of metadata authorities providing extrinsic metadata on this origin (and, indirectly, to the origin’s extrinsic metadata itself)

  • visit_types (array) – set of visit types for that origin

  • has_visits (boolean) – indicates if Software Heritage made at least one full visit of the origin

Request Headers:
  • Accept – the requested response content type, either application/json (default) or application/yaml

Response Headers:
Status Codes:

Example:

https://archive.softwareheritage.org/api/1/origin/https://github.com/python/cpython/get/
GET /api/1/origin/search/(url_pattern)/#

Search for software origins whose urls contain a provided string pattern or match a provided regular expression. The search is performed in a case insensitive way.

Warning

This endpoint used to provide an offset query parameter, and guarantee an order on results. This is no longer true, and only the Link header should be used for paginating through results.

Parameters:
  • url_pattern (string) – a string pattern

Query Parameters:
  • use_ql (boolean) – whether to use swh search query language or not

  • limit (int) – the maximum number of found origins to return (bounded to 1000)

  • with_visit (boolean) – if true, only return origins with at least one visit by Software heritage

  • visit_type (string) – if provided, only return origins with that specific visit type (currently the supported types are ???)

Response JSON Array of Objects:
  • origin_visits_url (string) – link to in order to get information about the visits for that origin

  • url (string) – the origin canonical url

  • metadata_authorities_url (string) – link to GET /api/1/raw-extrinsic-metadata/swhid/(target)/authorities/ to get the list of metadata authorities providing extrinsic metadata on this origin (and, indirectly, to the origin’s extrinsic metadata itself)

  • visit_types (array) – set of visit types for that origin

  • has_visits (boolean) – indicates if Software Heritage made at least one full visit of the origin

Request Headers:
  • Accept – the requested response content type, either application/json (default) or application/yaml

Response Headers:
  • Content-Type – this depends on Accept header of request

  • Link – indicates that a subsequent result page is available and contains the url pointing to it

Status Codes:

Example:

https://archive.softwareheritage.org/api/1/origin/search/python/?limit=2
GET /api/1/origin/(origin_url)/visits/#

Get information about all visits of a software origin. Visits are returned sorted in descending order according to their date.

Parameters:
  • origin_url (str) – a software origin URL

Query Parameters:
  • per_page (int) – specify the number of visits to list, for pagination purposes

  • last_visit (int) – visit to start listing from, for pagination purposes

Request Headers:
  • Accept – the requested response content type, either application/json (default) or application/yaml

Response Headers:
  • Content-Type – this depends on Accept header of request

  • Link – indicates that a subsequent result page is available and contains the url pointing to it

Response JSON Array of Objects:
  • date (string) – ISO8601/RFC3339 representation of the visit date (in UTC)

  • origin (str) – the origin canonical url

  • origin_url (string) – link to get information about the origin

  • status (string) – status of the visit (either full, partial or ongoing)

  • visit (number) – the unique identifier of the visit

  • id (number) – the unique identifier of the origin

  • origin_visit_url (string) – link to GET /api/1/origin/(origin_url)/visit/(visit_id)/ in order to get information about the visit

>jsonarrarr string snapshot:

the snapshot identifier of the visit (may be null if status is not full).

>jsonarrarr string snapshot_url:

link to GET /api/1/snapshot/(snapshot_id)/ in order to get information about the snapshot of the visit (may be null if status is not full).

Status Codes:

Example:

https://archive.softwareheritage.org/api/1/origin/https://github.com/hylang/hy/visits/
GET /api/1/origin/(origin_url)/visit/(visit_id)/#

Get information about a specific visit of a software origin.

Parameters:
  • origin_url (str) – a software origin URL

  • visit_id (int) – a visit identifier

Request Headers:
  • Accept – the requested response content type, either application/json (default) or application/yaml

Response Headers:
Response JSON Object:
  • date (string) – ISO8601/RFC3339 representation of the visit date (in UTC)

  • origin (str) – the origin canonical url

  • origin_url (string) – link to get information about the origin

  • status (string) – status of the visit (either full, partial or ongoing)

  • visit (number) – the unique identifier of the visit

Response JSON Array of Objects:
  • snapshot (string) – the snapshot identifier of the visit (may be null if status is not full).

  • snapshot_url (string) – link to GET /api/1/snapshot/(snapshot_id)/ in order to get information about the snapshot of the visit (may be null if status is not full).

Status Codes:
  • 200 OK – no error

  • 404 Not Found – requested origin or visit cannot be found in the archive

Example:

https://archive.softwareheritage.org/api/1/origin/https://github.com/hylang/hy/visit/1/
GET /api/1/origin/(origin_url)/visit/(visit_id)/#

Get information about a specific visit of a software origin.

Parameters:
  • origin_url (str) – a software origin URL

  • visit_id (int) – a visit identifier

Request Headers:
  • Accept – the requested response content type, either application/json (default) or application/yaml

Response Headers:
Response JSON Object:
  • date (string) – ISO8601/RFC3339 representation of the visit date (in UTC)

  • origin (str) – the origin canonical url

  • origin_url (string) – link to get information about the origin

  • status (string) – status of the visit (either full, partial or ongoing)

  • visit (number) – the unique identifier of the visit

Response JSON Array of Objects:
  • snapshot (string) – the snapshot identifier of the visit (may be null if status is not full).

  • snapshot_url (string) – link to GET /api/1/snapshot/(snapshot_id)/ in order to get information about the snapshot of the visit (may be null if status is not full).

Status Codes:
  • 200 OK – no error

  • 404 Not Found – requested origin or visit cannot be found in the archive

Example:

https://archive.softwareheritage.org/api/1/origin/https://github.com/hylang/hy/visit/1/
GET /api/1/origin/save/(visit_type)/url/(origin_url)/#
POST /api/1/origin/save/(visit_type)/url/(origin_url)/#
GET /api/1/origin/save/(request_id)/#

Request the saving of a software origin into the archive or check the status of previously created save requests.

That endpoint enables to create a saving task for a software origin through a POST request.

Depending of the provided origin url, the save request can either be:

  • immediately accepted, for well known code hosting providers like for instance GitHub or GitLab

  • rejected, in case the url is blacklisted by Software Heritage

  • put in pending state until a manual check is done in order to determine if it can be loaded or not

Once a saving request has been accepted, its associated saving task status can then be checked through a GET request on the same url. Returned status can either be:

  • not created: no saving task has been created

  • pending: saving task has been created and will be scheduled for execution

  • scheduled: the task execution has been scheduled

  • running: the task is currently executed

  • succeeded: the saving task has been successfully executed

  • failed: the saving task has been executed but it failed

When issuing a POST request an object will be returned while a GET request will return an array of objects (as multiple save requests might have been submitted for the same origin).

It is also possible to get info about a specific save request by sending a GET request to the /api/1/origin/save/(request_id)/ endpoint.

Parameters:
  • visit_type (string) – the type of visit to perform (currently the supported types are bzr, cvs, git, hg, and svn)

  • origin_url (string) – the url of the origin to save

  • request_id (number) – a save request identifier

Request Headers:
  • Accept – the requested response content type, either application/json (default) or application/yaml

Response Headers:
Response JSON Object:
  • id (number) – the save request identifier

  • request_url (string) – Web API URL to follow up on that request

  • origin_url (string) – the url of the origin to save

  • visit_type (string) – the type of visit to perform

  • save_request_date (string) – the date (in iso format) the save request was issued

  • save_request_status (string) – the status of the save request, either accepted, rejected or pending

  • save_task_status (string) – the status of the origin saving task, either not created, pending, scheduled, running, succeeded or failed

  • visit_date (string) – the date (in iso format) of the visit if a visit occurred, null otherwise.

  • visit_status (string) – the status of the visit, either full, partial, not_found or failed if a visit occurred, null otherwise.

  • note (string) – optional note giving details about the save request, for instance why it has been rejected

  • snapshot_swhid (string) – SWHID of snapshot associated to the visit (null if it is missing or unknown)

  • snapshot_url (string) – Web API URL to retrieve snapshot data

  • from_webhook (boolean) – indicates if the save request was created from a popular forge webhook receiver (see POST /api/1/origin/save/webhook/github/ for instance)

  • webhook_origin (string) – indicates which forge type sent the webhook, currently the supported types are:bitbucket, gitea, github, gitlab, and sourceforge

Status Codes:
POST /api/1/origin/save/webhook/bitbucket/#

Webhook receiver for Bitbucket to request or update the archival of a repository when new commits are pushed to it.

To add such webhook to one of your git repository hosted on Bitbucket, please follow Bitbucket’s webhooks guide.

The expected content type for the webhook payload must be application/json.

Please not that to avoid abusing the archival service offered by Software Heritage at most one request per hour is created so the effective loading of the repository into the archive might be delayed.

Response JSON Object:
  • id (number) – the save request identifier

  • request_url (string) – Web API URL to follow up on that request

  • origin_url (string) – the url of the origin to save

  • visit_type (string) – the type of visit to perform

  • save_request_date (string) – the date (in iso format) the save request was issued

  • save_request_status (string) – the status of the save request, either accepted, rejected or pending

  • save_task_status (string) – the status of the origin saving task, either not created, pending, scheduled, running, succeeded or failed

  • save_task_next_run (string) – the date and time from which the request is executed

Status Codes:
  • 200 OK – save request for repository has been successfully created from the webhook payload.

  • 400 Bad Request – no save request has been created due to invalid POST request or missing data in webhook payload

POST /api/1/origin/save/webhook/gitea/#

Webhook receiver for Gitea to request or update the archival of a repository when new commits are pushed to it.

To add such webhook to one of your git repository hosted on Gitea, please follow Gitea’s webhooks guide.

The expected content type for the webhook payload must be application/json.

Please not that to avoid abusing the archival service offered by Software Heritage at most one request per hour is created so the effective loading of the repository into the archive might be delayed.

Response JSON Object:
  • id (number) – the save request identifier

  • request_url (string) – Web API URL to follow up on that request

  • origin_url (string) – the url of the origin to save

  • visit_type (string) – the type of visit to perform

  • save_request_date (string) – the date (in iso format) the save request was issued

  • save_request_status (string) – the status of the save request, either accepted, rejected or pending

  • save_task_status (string) – the status of the origin saving task, either not created, pending, scheduled, running, succeeded or failed

  • save_task_next_run (string) – the date and time from which the request is executed

Status Codes:
  • 200 OK – save request for repository has been successfully created from the webhook payload.

  • 400 Bad Request – no save request has been created due to invalid POST request or missing data in webhook payload

POST /api/1/origin/save/webhook/github/#

Webhook receiver for GitHub to request or update the archival of a repository when new commits are pushed to it.

To add such webhook to one of your git repository hosted on GitHub, please follow GitHub’s webhooks guide.

The expected content type for the webhook payload must be application/json.

Please not that to avoid abusing the archival service offered by Software Heritage at most one request per hour is created so the effective loading of the repository into the archive might be delayed.

Response JSON Object:
  • id (number) – the save request identifier

  • request_url (string) – Web API URL to follow up on that request

  • origin_url (string) – the url of the origin to save

  • visit_type (string) – the type of visit to perform

  • save_request_date (string) – the date (in iso format) the save request was issued

  • save_request_status (string) – the status of the save request, either accepted, rejected or pending

  • save_task_status (string) – the status of the origin saving task, either not created, pending, scheduled, running, succeeded or failed

  • save_task_next_run (string) – the date and time from which the request is executed

Status Codes:
  • 200 OK – save request for repository has been successfully created from the webhook payload.

  • 400 Bad Request – no save request has been created due to invalid POST request or missing data in webhook payload

POST /api/1/origin/save/webhook/gitlab/#

Webhook receiver for GitLab to request or update the archival of a repository when new commits are pushed to it.

To add such webhook to one of your git repository hosted on GitLab, please follow GitLab’s webhooks guide.

The expected content type for the webhook payload must be application/json.

Please not that to avoid abusing the archival service offered by Software Heritage at most one request per hour is created so the effective loading of the repository into the archive might be delayed.

Response JSON Object:
  • id (number) – the save request identifier

  • request_url (string) – Web API URL to follow up on that request

  • origin_url (string) – the url of the origin to save

  • visit_type (string) – the type of visit to perform

  • save_request_date (string) – the date (in iso format) the save request was issued

  • save_request_status (string) – the status of the save request, either accepted, rejected or pending

  • save_task_status (string) – the status of the origin saving task, either not created, pending, scheduled, running, succeeded or failed

  • save_task_next_run (string) – the date and time from which the request is executed

Status Codes:
  • 200 OK – save request for repository has been successfully created from the webhook payload.

  • 400 Bad Request – no save request has been created due to invalid POST request or missing data in webhook payload

POST /api/1/origin/save/webhook/sourceforge/#

Webhook receiver for SourceForge to request or update the archival of a repository when new commits are pushed to it.

To add such webhook to one of your git, hg or svn repository hosted on SourceForge, please follow SourceForge’s webhooks guide.

The expected content type for the webhook payload must be application/json.

Please not that to avoid abusing the archival service offered by Software Heritage at most one request per hour is created so the effective loading of the repository into the archive might be delayed.

Response JSON Object:
  • id (number) – the save request identifier

  • request_url (string) – Web API URL to follow up on that request

  • origin_url (string) – the url of the origin to save

  • visit_type (string) – the type of visit to perform

  • save_request_date (string) – the date (in iso format) the save request was issued

  • save_request_status (string) – the status of the save request, either accepted, rejected or pending

  • save_task_status (string) – the status of the origin saving task, either not created, pending, scheduled, running, succeeded or failed

  • save_task_next_run (string) – the date and time from which the request is executed

Status Codes:
  • 200 OK – save request for repository has been successfully created from the webhook payload.

  • 400 Bad Request – no save request has been created due to invalid POST request or missing data in webhook payload