swh-web API URLs#
Content#
- GET /api/1/content/known/(sha1)[,(sha1), ...,(sha1)]/#
Check whether some content(s) (aka “blob(s)”) is present in the archive based on its sha1 checksum.
- Parameters:
sha1 (string) – hexadecimal representation of the sha1 checksum value for the content to check existence. Multiple values can be provided separated by ‘,’.
- Request Headers:
Accept – the requested response content type, either
application/json
(default) orapplication/yaml
- Response Headers:
Content-Type – this depends on Accept header of request
- Response JSON Object:
search_res (array) – array holding the search result for each provided sha1
search_stats (object) – some statistics regarding the number of sha1 provided and the percentage of those found in the archive
- Status Codes:
200 OK – no error
400 Bad Request – an invalid sha1 has been provided
Example:
https://archive.softwareheritage.org/api/1/content/known/dc2830a9e72f23c1dfebef4413003221baa5fb62,0c3f19cb47ebfbe643fb19fa94c874d18fa62d12/
- GET /api/1/content/[(hash_type):](hash)/#
Get information about a content (aka a “blob”) object. In the archive, a content object is identified based on checksum values computed using various hashing algorithms.
- Parameters:
hash_type (string) – optional parameter specifying which hashing algorithm has been used to compute the content checksum. It can be either
sha1
,sha1_git
,sha256
orblake2s256
. If that parameter is not provided, it is assumed that the hashing algorithm used issha1
.hash (string) – hexadecimal representation of the checksum value computed with the specified hashing algorithm.
- Request Headers:
Accept – the requested response content type, either
application/json
(default) orapplication/yaml
- Response Headers:
Content-Type – this depends on Accept header of request
- Response JSON Object:
checksums (object) – object holding the computed checksum values for the requested content
data_url (string) – link to
GET /api/1/content/[(hash_type):](hash)/raw/
for downloading the content raw bytesfiletype_url (string) – link to
GET /api/1/content/[(hash_type):](hash)/filetype/
for getting information about the content MIME typelanguage_url (string) – link to
GET /api/1/content/[(hash_type):](hash)/language/
for getting information about the programming language used in the contentlength (number) – length of the content in bytes
license_url (string) – link to
GET /api/1/content/[(hash_type):](hash)/license/
for getting information about the license of the content
- Status Codes:
200 OK – no error
400 Bad Request – an invalid hash_type or hash has been provided
404 Not Found – requested content can not be found in the archive
Example:
https://archive.softwareheritage.org/api/1/content/sha1_git:fe95a46679d128ff167b7c55df5d02356c5a1ae1/
- GET /api/1/content/[(hash_type):](hash)/raw/#
Get the raw content of a content object (aka a “blob”), as a byte sequence.
- Parameters:
hash_type (string) – optional parameter specifying which hashing algorithm has been used to compute the content checksum. It can be either
sha1
,sha1_git
,sha256
orblake2s256
. If that parameter is not provided, it is assumed that the hashing algorithm used issha1
.hash (string) – hexadecimal representation of the checksum value computed with the specified hashing algorithm.
- Query Parameters:
filename (string) – if provided, the downloaded content will get that filename
- Response Headers:
Content-Type – application/octet-stream
- Status Codes:
200 OK – no error
400 Bad Request – an invalid hash_type or hash has been provided
404 Not Found – requested content can not be found in the archive
Example:
https://archive.softwareheritage.org/api/1/content/sha1:dc2830a9e72f23c1dfebef4413003221baa5fb62/raw/
- GET /api/1/content/[(hash_type):](hash)/filetype/#
Get information about the detected MIME type of a content object.
- Parameters:
hash_type (string) – optional parameter specifying which hashing algorithm has been used to compute the content checksum. It can be either
sha1
,sha1_git
,sha256
orblake2s256
. If that parameter is not provided, it is assumed that the hashing algorithm used issha1
.hash (string) – hexadecimal representation of the checksum value computed with the specified hashing algorithm.
- Response JSON Object:
content_url (object) – link to
GET /api/1/content/[(hash_type):](hash)/
for getting information about the contentencoding (string) – the detected content encoding
id (string) – the sha1 identifier of the content
mimetype (string) – the detected MIME type of the content
tool (object) – information about the tool used to detect the content filetype
- Request Headers:
Accept – the requested response content type, either
application/json
(default) orapplication/yaml
- Response Headers:
Content-Type – this depends on Accept header of request
- Status Codes:
200 OK – no error
400 Bad Request – an invalid hash_type or hash has been provided
404 Not Found – requested content can not be found in the archive
Example:
https://archive.softwareheritage.org/api/1/content/sha1:dc2830a9e72f23c1dfebef4413003221baa5fb62/filetype/
- GET /api/1/content/[(hash_type):](hash)/language/#
Get information about the programming language used in a content object.
Note: this endpoint currently returns no data.
- Parameters:
hash_type (string) – optional parameter specifying which hashing algorithm has been used to compute the content checksum. It can be either
sha1
,sha1_git
,sha256
orblake2s256
. If that parameter is not provided, it is assumed that the hashing algorithm used issha1
.hash (string) – hexadecimal representation of the checksum value computed with the specified hashing algorithm.
- Response JSON Object:
content_url (object) – link to
GET /api/1/content/[(hash_type):](hash)/
for getting information about the contentid (string) – the sha1 identifier of the content
lang (string) – the detected programming language if any
tool (object) – information about the tool used to detect the programming language
- Request Headers:
Accept – the requested response content type, either
application/json
(default) orapplication/yaml
- Response Headers:
Content-Type – this depends on Accept header of request
- Status Codes:
200 OK – no error
400 Bad Request – an invalid hash_type or hash has been provided
404 Not Found – requested content can not be found in the archive
Example:
https://archive.softwareheritage.org/api/1/content/sha1:dc2830a9e72f23c1dfebef4413003221baa5fb62/language/
- GET /api/1/content/[(hash_type):](hash)/license/#
Get information about the license of a content object.
- Parameters:
hash_type (string) – optional parameter specifying which hashing algorithm has been used to compute the content checksum. It can be either
sha1
,sha1_git
,sha256
orblake2s256
. If that parameter is not provided, it is assumed that the hashing algorithm used issha1
.hash (string) – hexadecimal representation of the checksum value computed with the specified hashing algorithm.
- Response JSON Object:
content_url (object) – link to
GET /api/1/content/[(hash_type):](hash)/
for getting information about the contentid (string) – the sha1 identifier of the content
licenses (array) – array of strings containing the detected license names
tool (object) – information about the tool used to detect the license
- Request Headers:
Accept – the requested response content type, either
application/json
(default) orapplication/yaml
- Response Headers:
Content-Type – this depends on Accept header of request
- Status Codes:
200 OK – no error
400 Bad Request – an invalid hash_type or hash has been provided
404 Not Found – requested content can not be found in the archive
Example:
https://archive.softwareheritage.org/api/1/content/sha1:dc2830a9e72f23c1dfebef4413003221baa5fb62/license/
Directory#
- GET /api/1/directory/(sha1_git)/[(path)/]#
Get information about directory objects. Directories are identified by sha1 checksums, compatible with Git directory identifiers. See
swh.model.git_objects.directory_git_object()
in our data model module for details about how they are computed.When given only a directory identifier, this endpoint returns information about the directory itself, returning its content (usually a list of directory entries). When given a directory identifier and a path, this endpoint returns information about the directory entry pointed by the relative path, starting path resolution from the given directory.
- Parameters:
sha1_git (string) – hexadecimal representation of the directory sha1_git identifier
path (string) – optional parameter to get information about the directory entry pointed by that relative path
- Request Headers:
Accept – the requested response content type, either
application/json
(default) orapplication/yaml
- Response Headers:
Content-Type – this depends on Accept header of request
- Response JSON Array of Objects:
checksums (object) – object holding the computed checksum values for a directory entry (only for file entries)
dir_id (string) – sha1_git identifier of the requested directory
length (number) – length of a directory entry in bytes (only for file entries) for getting information about the content MIME type
name (string) – the directory entry name
perms (number) – permissions for the directory entry
target (string) – sha1_git identifier of the directory entry
target_url (string) – link to
GET /api/1/content/[(hash_type):](hash)/
orGET /api/1/directory/(sha1_git)/[(path)/]
depending on the directory entry typetype (string) – the type of the directory entry, can be either
dir
,file
orrev
- Status Codes:
200 OK – no error
400 Bad Request – an invalid hash_type or hash has been provided
404 Not Found – requested directory can not be found in the archive
Example:
https://archive.softwareheritage.org/api/1/directory/977fc4b98c0e85816348cebd3b12026407c368b6/
Graph#
- GET /api/1/graph/(graph_query)/#
Provide fast access to the graph representation of the Software Heritage archive.
That endpoint acts as a proxy for the Software Heritage Graph service.
It provides fast access to the graph representation of the Software Heritage archive.
For more details please refer to the Graph RPC API documentation.
Warning
That endpoint is not publicly available and requires authentication and special user permission in order to be able to request it.
- Parameters:
graph_query (string) – query to forward to the Software Heritage Graph archive (see its documentation)
- Query Parameters:
resolve_origins (boolean) – extra parameter defined by that proxy enabling to resolve origin urls from their sha1 representations
- Status Codes:
200 OK – no error
400 Bad Request – an invalid graph query has been provided
404 Not Found – provided graph node cannot be found
Examples:
https://archive.softwareheritage.org/api/1/graph/leaves/swh:1:dir:432d1b21c1256f7408a07c577b6974bbdbcc1323/ https://archive.softwareheritage.org/api/1/graph/neighbors/swh:1:rev:f39d7d78b70e0f39facb1e4fab77ad3df5c52a35/ https://archive.softwareheritage.org/api/1/graph/visit/nodes/swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc?direction=backward&resolve_origins=true https://archive.softwareheritage.org/api/1/graph/visit/edges/swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc?direction=backward&resolve_origins=true
SWHIDs (persistent identifiers)#
- GET /api/1/resolve/(swhid)/#
Resolve a SoftWare Heritage persistent IDentifier (SWHID)
Try to resolve a provided SoftWare Heritage persistent IDentifier into an url for browsing the pointed archive object.
If the provided identifier is valid, the existence of the object in the archive will also be checked.
- Parameters:
swhid (string) – a SoftWare Heritage persistent IDentifier
- Response JSON Object:
browse_url (string) – the url for browsing the pointed object
metadata (object) – object holding optional parts of the SWHID
namespace (string) – the SWHID namespace
object_id (string) – the hash identifier of the pointed object
object_type (string) – the type of the pointed object
scheme_version (number) – the scheme version of the SWHID
- Request Headers:
Accept – the requested response content type, either
application/json
(default) orapplication/yaml
- Response Headers:
Content-Type – this depends on Accept header of request
- Status Codes:
200 OK – no error
400 Bad Request – an invalid SWHID has been provided
404 Not Found – the pointed object does not exist in the archive
Example:
https://archive.softwareheritage.org/api/1/resolve/swh:1:rev:96db9023b881d7cd9f379b0c154650d6c108e9a3;origin=https://github.com/openssl/openssl/
- POST /api/1/known/#
Check if a list of objects are present in the Software Heritage archive.
The objects to check existence must be provided using SoftWare Heritage persistent IDentifiers.
- Request JSON Array of Objects:
- (string) – input array of SWHIDs, its length can not exceed 1000.
- Response JSON Object:
<swhid> (object) –
an object whose keys are input SWHIDs and values objects with the following keys:
known (bool): whether the object was found
- Request Headers:
Accept – the requested response content type, either
application/json
(default) orapplication/yaml
- Response Headers:
Content-Type – this depends on Accept header of request
- Status Codes:
200 OK – no error
400 Bad Request – an invalid SWHID was provided
413 Request Entity Too Large – the input array of SWHIDs is too large
- GET /api/1/raw/(swhid)/#
Get the object corresponding to the SWHID in raw form.
This endpoint exposes the internal representation (see the
*_git_object
functions inswh.model.git_objects
), and so can be used to fetch a binary blob which hashes to the same identifier.- Parameters:
swhid (string) – the object’s SWHID
- Response Headers:
Content-Type – application/octet-stream
- Status Codes:
200 OK – no error
404 Not Found – the requested object can not be found in the archive
Example:
https://archive.softwareheritage.org/api/1/raw/swh:1:snp:6a3a2cf0b2b90ce7ae1cf0a221ed68035b686f5a
Origin#
- GET /api/1/origin/(origin_url)/get/#
Get information about a software origin.
- Parameters:
origin_url (string) – the origin url
- Response JSON Object:
origin_visits_url (string) – link to in order to get information about the visits for that origin
url (string) – the origin canonical url
metadata_authorities_url (string) – link to
GET /api/1/raw-extrinsic-metadata/swhid/(target)/authorities/
to get the list of metadata authorities providing extrinsic metadata on this origin (and, indirectly, to the origin’s extrinsic metadata itself)
- Request Headers:
Accept – the requested response content type, either
application/json
(default) orapplication/yaml
- Response Headers:
Content-Type – this depends on Accept header of request
- Status Codes:
200 OK – no error
404 Not Found – requested origin can not be found in the archive
Example:
https://archive.softwareheritage.org/api/1/origin/https://github.com/python/cpython/get/
- GET /api/1/origin/search/(url_pattern)/#
Search for software origins whose urls contain a provided string pattern or match a provided regular expression. The search is performed in a case insensitive way.
Warning
This endpoint used to provide an
offset
query parameter, and guarantee an order on results. This is no longer true, and only the Link header should be used for paginating through results.- Parameters:
url_pattern (string) – a string pattern
- Query Parameters:
use_ql (boolean) – whether to use swh search query language or not
limit (int) – the maximum number of found origins to return (bounded to 1000)
with_visit (boolean) – if true, only return origins with at least one visit by Software heritage
visit_type (string) – if provided, only return origins with that specific visit type (currently the supported types are ???)
- Response JSON Array of Objects:
origin_visits_url (string) – link to in order to get information about the visits for that origin
url (string) – the origin canonical url
metadata_authorities_url (string) – link to
GET /api/1/raw-extrinsic-metadata/swhid/(target)/authorities/
to get the list of metadata authorities providing extrinsic metadata on this origin (and, indirectly, to the origin’s extrinsic metadata itself)
- Request Headers:
Accept – the requested response content type, either
application/json
(default) orapplication/yaml
- Response Headers:
Content-Type – this depends on Accept header of request
Link – indicates that a subsequent result page is available and contains the url pointing to it
- Status Codes:
200 OK – no error
Example:
https://archive.softwareheritage.org/api/1/origin/search/python/?limit=2
- GET /api/1/origin/(origin_url)/visits/#
Get information about all visits of a software origin. Visits are returned sorted in descending order according to their date.
- Parameters:
origin_url (str) – a software origin URL
- Query Parameters:
per_page (int) – specify the number of visits to list, for pagination purposes
last_visit (int) – visit to start listing from, for pagination purposes
- Request Headers:
Accept – the requested response content type, either
application/json
(default) orapplication/yaml
- Response Headers:
Content-Type – this depends on Accept header of request
Link – indicates that a subsequent result page is available and contains the url pointing to it
- Response JSON Array of Objects:
date (string) – ISO8601/RFC3339 representation of the visit date (in UTC)
origin (str) – the origin canonical url
origin_url (string) – link to get information about the origin
status (string) – status of the visit (either full, partial or ongoing)
visit (number) – the unique identifier of the visit
id (number) – the unique identifier of the origin
origin_visit_url (string) – link to
GET /api/1/origin/(origin_url)/visit/(visit_id)/
in order to get information about the visit
- >jsonarrarr string snapshot:
the snapshot identifier of the visit (may be null if status is not full).
- >jsonarrarr string snapshot_url:
link to
GET /api/1/snapshot/(snapshot_id)/
in order to get information about the snapshot of the visit (may be null if status is not full).- Status Codes:
200 OK – no error
404 Not Found – requested origin can not be found in the archive
Example:
https://archive.softwareheritage.org/api/1/origin/https://github.com/hylang/hy/visits/
- GET /api/1/origin/(origin_url)/visit/(visit_id)/#
Get information about a specific visit of a software origin.
- Parameters:
origin_url (str) – a software origin URL
visit_id (int) – a visit identifier
- Request Headers:
Accept – the requested response content type, either
application/json
(default) orapplication/yaml
- Response Headers:
Content-Type – this depends on Accept header of request
- Response JSON Object:
date (string) – ISO8601/RFC3339 representation of the visit date (in UTC)
origin (str) – the origin canonical url
origin_url (string) – link to get information about the origin
status (string) – status of the visit (either full, partial or ongoing)
visit (number) – the unique identifier of the visit
- Response JSON Array of Objects:
snapshot (string) – the snapshot identifier of the visit (may be null if status is not full).
snapshot_url (string) – link to
GET /api/1/snapshot/(snapshot_id)/
in order to get information about the snapshot of the visit (may be null if status is not full).
- Status Codes:
200 OK – no error
404 Not Found – requested origin or visit can not be found in the archive
Example:
https://archive.softwareheritage.org/api/1/origin/https://github.com/hylang/hy/visit/1/
- GET /api/1/origin/(origin_url)/visit/(visit_id)/#
Get information about a specific visit of a software origin.
- Parameters:
origin_url (str) – a software origin URL
visit_id (int) – a visit identifier
- Request Headers:
Accept – the requested response content type, either
application/json
(default) orapplication/yaml
- Response Headers:
Content-Type – this depends on Accept header of request
- Response JSON Object:
date (string) – ISO8601/RFC3339 representation of the visit date (in UTC)
origin (str) – the origin canonical url
origin_url (string) – link to get information about the origin
status (string) – status of the visit (either full, partial or ongoing)
visit (number) – the unique identifier of the visit
- Response JSON Array of Objects:
snapshot (string) – the snapshot identifier of the visit (may be null if status is not full).
snapshot_url (string) – link to
GET /api/1/snapshot/(snapshot_id)/
in order to get information about the snapshot of the visit (may be null if status is not full).
- Status Codes:
200 OK – no error
404 Not Found – requested origin or visit can not be found in the archive
Example:
https://archive.softwareheritage.org/api/1/origin/https://github.com/hylang/hy/visit/1/
- GET /api/1/origin/save/(visit_type)/url/(origin_url)/#
- POST /api/1/origin/save/(visit_type)/url/(origin_url)/#
- GET /api/1/origin/save/(request_id)/#
Request the saving of a software origin into the archive or check the status of previously created save requests.
That endpoint enables to create a saving task for a software origin through a POST request.
Depending of the provided origin url, the save request can either be:
immediately accepted, for well known code hosting providers like for instance GitHub or GitLab
rejected, in case the url is blacklisted by Software Heritage
put in pending state until a manual check is done in order to determine if it can be loaded or not
Once a saving request has been accepted, its associated saving task status can then be checked through a GET request on the same url. Returned status can either be:
not created: no saving task has been created
not yet scheduled: saving task has been created but its execution has not yet been scheduled
scheduled: the task execution has been scheduled
succeeded: the saving task has been successfully executed
failed: the saving task has been executed but it failed
When issuing a POST request an object will be returned while a GET request will return an array of objects (as multiple save requests might have been submitted for the same origin).
It is also possible to get info about a specific save request by sending a GET request to the
/api/1/origin/save/(request_id)/
endpoint.- Parameters:
visit_type (string) – the type of visit to perform (currently the supported types are bzr, cvs, git, hg, and svn)
origin_url (string) – the url of the origin to save
request_id (number) – a save request identifier
- Request Headers:
Accept – the requested response content type, either
application/json
(default) orapplication/yaml
- Response Headers:
Content-Type – this depends on Accept header of request
- Response JSON Object:
id (number) – the save request identifier
request_url (string) – Web API URL to follow up on that request
origin_url (string) – the url of the origin to save
visit_type (string) – the type of visit to perform
save_request_date (string) – the date (in iso format) the save request was issued
save_request_status (string) – the status of the save request, either accepted, rejected or pending
save_task_status (string) – the status of the origin saving task, either not created, not yet scheduled, scheduled, succeeded or failed
visit_date (string) – the date (in iso format) of the visit if a visit occurred, null otherwise.
visit_status (string) – the status of the visit, either full, partial, not_found or failed if a visit occurred, null otherwise.
note (string) – optional note giving details about the save request, for instance why it has been rejected
snapshot_swhid (string) – SWHID of snapshot associated to the visit (null if it is missing or unknown)
snapshot_url (string) – Web API URL to retrieve snapshot data
from_webhook (boolean) – indicates if the save request was created from a popular forge webhook receiver (see
POST /api/1/origin/save/webhook/github/
for instance)webhook_origin (string) – indicates which forge type sent the webhook, currently the supported types are:bitbucket, gitea, github, gitlab, and sourceforge
- Status Codes:
200 OK – no error
400 Bad Request – an invalid visit type or origin url has been provided
403 Forbidden – the provided origin url is blacklisted
404 Not Found – no save requests have been found for a given origin
- POST /api/1/origin/save/webhook/bitbucket/#
Webhook receiver for Bitbucket to request or update the archival of a repository when new commits are pushed to it.
To add such webhook to one of your git repository hosted on Bitbucket, please follow Bitbucket’s webhooks guide.
The expected content type for the webhook payload must be
application/json
.Please not that to avoid abusing the archival service offered by Software Heritage at most one request per hour is created so the effective loading of the repository into the archive might be delayed.
- Response JSON Object:
id (number) – the save request identifier
request_url (string) – Web API URL to follow up on that request
origin_url (string) – the url of the origin to save
visit_type (string) – the type of visit to perform
save_request_date (string) – the date (in iso format) the save request was issued
save_request_status (string) – the status of the save request, either accepted, rejected or pending
save_task_status (string) – the status of the origin saving task, either not created, not yet scheduled, scheduled, succeeded or failed
save_task_next_run (string) – the date and time from which the request is executed
- Status Codes:
200 OK – save request for repository has been successfully created from the webhook payload.
400 Bad Request – no save request has been created due to invalid POST request or missing data in webhook payload
- POST /api/1/origin/save/webhook/gitea/#
Webhook receiver for Gitea to request or update the archival of a repository when new commits are pushed to it.
To add such webhook to one of your git repository hosted on Gitea, please follow Gitea’s webhooks guide.
The expected content type for the webhook payload must be
application/json
.Please not that to avoid abusing the archival service offered by Software Heritage at most one request per hour is created so the effective loading of the repository into the archive might be delayed.
- Response JSON Object:
id (number) – the save request identifier
request_url (string) – Web API URL to follow up on that request
origin_url (string) – the url of the origin to save
visit_type (string) – the type of visit to perform
save_request_date (string) – the date (in iso format) the save request was issued
save_request_status (string) – the status of the save request, either accepted, rejected or pending
save_task_status (string) – the status of the origin saving task, either not created, not yet scheduled, scheduled, succeeded or failed
save_task_next_run (string) – the date and time from which the request is executed
- Status Codes:
200 OK – save request for repository has been successfully created from the webhook payload.
400 Bad Request – no save request has been created due to invalid POST request or missing data in webhook payload
- POST /api/1/origin/save/webhook/github/#
Webhook receiver for GitHub to request or update the archival of a repository when new commits are pushed to it.
To add such webhook to one of your git repository hosted on GitHub, please follow GitHub’s webhooks guide.
The expected content type for the webhook payload must be
application/json
.Please not that to avoid abusing the archival service offered by Software Heritage at most one request per hour is created so the effective loading of the repository into the archive might be delayed.
- Response JSON Object:
id (number) – the save request identifier
request_url (string) – Web API URL to follow up on that request
origin_url (string) – the url of the origin to save
visit_type (string) – the type of visit to perform
save_request_date (string) – the date (in iso format) the save request was issued
save_request_status (string) – the status of the save request, either accepted, rejected or pending
save_task_status (string) – the status of the origin saving task, either not created, not yet scheduled, scheduled, succeeded or failed
save_task_next_run (string) – the date and time from which the request is executed
- Status Codes:
200 OK – save request for repository has been successfully created from the webhook payload.
400 Bad Request – no save request has been created due to invalid POST request or missing data in webhook payload
- POST /api/1/origin/save/webhook/gitlab/#
Webhook receiver for GitLab to request or update the archival of a repository when new commits are pushed to it.
To add such webhook to one of your git repository hosted on GitLab, please follow GitLab’s webhooks guide.
The expected content type for the webhook payload must be
application/json
.Please not that to avoid abusing the archival service offered by Software Heritage at most one request per hour is created so the effective loading of the repository into the archive might be delayed.
- Response JSON Object:
id (number) – the save request identifier
request_url (string) – Web API URL to follow up on that request
origin_url (string) – the url of the origin to save
visit_type (string) – the type of visit to perform
save_request_date (string) – the date (in iso format) the save request was issued
save_request_status (string) – the status of the save request, either accepted, rejected or pending
save_task_status (string) – the status of the origin saving task, either not created, not yet scheduled, scheduled, succeeded or failed
save_task_next_run (string) – the date and time from which the request is executed
- Status Codes:
200 OK – save request for repository has been successfully created from the webhook payload.
400 Bad Request – no save request has been created due to invalid POST request or missing data in webhook payload
- POST /api/1/origin/save/webhook/sourceforge/#
Webhook receiver for SourceForge to request or update the archival of a repository when new commits are pushed to it.
To add such webhook to one of your git, hg or svn repository hosted on SourceForge, please follow SourceForge’s webhooks guide.
The expected content type for the webhook payload must be
application/json
.Please not that to avoid abusing the archival service offered by Software Heritage at most one request per hour is created so the effective loading of the repository into the archive might be delayed.
- Response JSON Object:
id (number) – the save request identifier
request_url (string) – Web API URL to follow up on that request
origin_url (string) – the url of the origin to save
visit_type (string) – the type of visit to perform
save_request_date (string) – the date (in iso format) the save request was issued
save_request_status (string) – the status of the save request, either accepted, rejected or pending
save_task_status (string) – the status of the origin saving task, either not created, not yet scheduled, scheduled, succeeded or failed
save_task_next_run (string) – the date and time from which the request is executed
- Status Codes:
200 OK – save request for repository has been successfully created from the webhook payload.
400 Bad Request – no save request has been created due to invalid POST request or missing data in webhook payload
Release#
- GET /api/1/release/(sha1_git)/#
Get information about a release in the archive. Releases are identified by sha1 checksums, compatible with Git tag identifiers. See
swh.model.git_objects.release_git_object()
in our data model module for details about how they are computed.- Parameters:
sha1_git (string) – hexadecimal representation of the release sha1_git identifier
- Request Headers:
Accept – the requested response content type, either
application/json
(default) orapplication/yaml
- Response Headers:
Content-Type – this depends on Accept header of request
- Response JSON Object:
author (object) – information about the author of the release
date (string) – RFC3339 representation of the release date
id (string) – the release unique identifier
message (string) – the message associated to the release
name (string) – the name of the release
target (string) – the target identifier of the release
target_type (string) – the type of the target, can be either release, revision, content, directory
target_url (string) – a link to the adequate api url based on the target type
- Status Codes:
200 OK – no error
400 Bad Request – an invalid sha1_git value has been provided
404 Not Found – requested release can not be found in the archive
Example:
https://archive.softwareheritage.org/api/1/release/208f61cc7a5dbc9879ae6e5c2f95891e270f09ef/
Revision#
- GET /api/1/revision/(sha1_git)/#
Get information about a revision in the archive. Revisions are identified by sha1 checksums, compatible with Git commit identifiers. See
swh.model.git_objects.revision_git_object()
in our data model module for details about how they are computed.- Parameters:
sha1_git (string) – hexadecimal representation of the revision sha1_git identifier
- Request Headers:
Accept – the requested response content type, either
application/json
(default) orapplication/yaml
- Response Headers:
Content-Type – this depends on Accept header of request
- Response JSON Object:
author (object) – information about the author of the revision
committer (object) – information about the committer of the revision
committer_date (string) – RFC3339 representation of the commit date
date (string) – RFC3339 representation of the revision date
directory (string) – the unique identifier that revision points to
directory_url (string) – link to
GET /api/1/directory/(sha1_git)/[(path)/]
to get information about the directory associated to the revisionid (string) – the revision unique identifier
merge (boolean) – whether or not the revision corresponds to a merge commit
message (string) – the message associated to the revision
parents (array) – the parents of the revision, i.e. the previous revisions that head directly to it, each entry of that array contains an unique parent revision identifier but also a link to
GET /api/1/revision/(sha1_git)/
to get more information about ittype (string) – the type of the revision
- Status Codes:
200 OK – no error
400 Bad Request – an invalid sha1_git value has been provided
404 Not Found – requested revision can not be found in the archive
Example:
https://archive.softwareheritage.org/api/1/revision/aafb16d69fd30ff58afdd69036a26047f3aebdc6/
- GET /api/1/revision/(sha1_git)/directory/[(path)/]#
Get information about directory (entry) objects associated to revisions. Each revision is associated to a single “root” directory. This endpoint behaves like
GET /api/1/directory/(sha1_git)/[(path)/]
, but operates on the root directory associated to a given revision.- Parameters:
sha1_git (string) – hexadecimal representation of the revision sha1_git identifier
path (string) – optional parameter to get information about the directory entry pointed by that relative path
- Request Headers:
Accept – the requested response content type, either
application/json
(default) orapplication/yaml
- Response Headers:
Content-Type – this depends on Accept header of request
- Response JSON Object:
content (array) – directory entries as returned by
GET /api/1/directory/(sha1_git)/[(path)/]
path (string) – path of directory from the revision root one
revision (string) – the unique revision identifier
type (string) – the type of the directory
- Status Codes:
200 OK – no error
400 Bad Request – an invalid sha1_git value has been provided
404 Not Found – requested revision can not be found in the archive
Example:
https://archive.softwareheritage.org/api/1/revision/f1b94134a4b879bc55c3dacdb496690c8ebdc03f/directory/
- GET /api/1/revision/(sha1_git)/log/#
Get a list of all revisions heading to a given one, in other words show the commit log.
The revisions are returned in the breadth-first search order while visiting the revision graph. The number of revisions to return is also bounded by the limit query parameter.
Warning
To get the full BFS traversal of the revision graph when the total number of revisions is greater than 1000, it is up to the client to keep track of the multiple branches of history when there’s merge revisions in the returned objects. In other words, identify all the continuation points that need to be followed to get the full history through recursion.
- Parameters:
sha1_git (string) – hexadecimal representation of the revision sha1_git identifier
- Query Parameters:
limit (int) – maximum number of revisions to return when performing BFS traversal on the revision graph (default to 10, can not exceed 1000)
- Request Headers:
Accept – the requested response content type, either
application/json
(default) orapplication/yaml
- Response Headers:
Content-Type – this depends on Accept header of request
- Response JSON Array of Objects:
author (object) – information about the author of the revision
committer (object) – information about the committer of the revision
committer_date (string) – RFC3339 representation of the commit date
date (string) – RFC3339 representation of the revision date
directory (string) – the unique identifier that revision points to
directory_url (string) – link to
GET /api/1/directory/(sha1_git)/[(path)/]
to get information about the directory associated to the revisionid (string) – the revision unique identifier
merge (boolean) – whether or not the revision corresponds to a merge commit
message (string) – the message associated to the revision
parents (array) – the parents of the revision, i.e. the previous revisions that head directly to it, each entry of that array contains an unique parent revision identifier but also a link to
GET /api/1/revision/(sha1_git)/
to get more information about ittype (string) – the type of the revision
- Status Codes:
200 OK – no error
400 Bad Request – an invalid sha1_git value has been provided
404 Not Found – head revision can not be found in the archive
Example:
https://archive.softwareheritage.org/api/1/revision/e1a315fa3fa734e2a6154ed7b5b9ae0eb8987aad/log/
Snapshot#
- GET /api/1/snapshot/(snapshot_id)/#
Get information about a snapshot in the archive.
A snapshot is a set of named branches, which are pointers to objects at any level of the Software Heritage DAG. It represents a full picture of an origin at a given time.
As well as pointing to other objects in the Software Heritage DAG, branches can also be aliases, in which case their target is the name of another branch in the same snapshot, or dangling, in which case the target is unknown.
A snapshot identifier is a salted sha1. See
swh.model.git_objects.snapshot_git_object()
in our data model module for details about how they are computed.- Parameters:
snapshot_id (sha1) – a snapshot identifier
- Query Parameters:
branches_from (str) – optional parameter used to skip branches whose name is lesser than it before returning them
branches_count (int) – optional parameter used to restrain the amount of returned branches (default to 1000)
target_types (str) – optional comma separated list parameter used to filter the target types of branch to return (possible values that can be contained in that list are
content
,directory
,revision
,release
,snapshot
oralias
)
- Request Headers:
Accept – the requested response content type, either
application/json
(default) orapplication/yaml
- Response Headers:
Content-Type – this depends on Accept header of request
Link – indicates that a subsequent result page is available and contains the url pointing to it
- Response JSON Object:
branches (object) – object containing all branches associated to the snapshot,for each of them the associated target type and id are given but also a link to get information about that target
id (string) – the unique identifier of the snapshot
- Status Codes:
200 OK – no error
400 Bad Request – an invalid snapshot identifier has been provided
404 Not Found – requested snapshot can not be found in the archive
Example:
https://archive.softwareheritage.org/api/1/snapshot/6a3a2cf0b2b90ce7ae1cf0a221ed68035b686f5a/
Archive statistics#
- GET /api/1/stat/counters/#
Get statistics about the content of the archive.
- Response JSON Object:
content (number) – current number of content objects (aka files) in the archive
directory (number) – current number of directory objects in the archive
origin (number) – current number of software origins (an origin is a “place” where code source can be found, e.g. a git repository, a tarball, …) in the archive
origin_visit (number) – current number of visits on software origins to fill the archive
person (number) – current number of persons (code source authors or committers) in the archive
release (number) – current number of releases objects in the archive
revision (number) – current number of revision objects (aka commits) in the archive
skipped_content (number) – current number of content objects (aka files) which where not inserted in the archive
snapshot (number) – current number of snapshot objects (aka set of named branches) in the archive
- Request Headers:
Accept – the requested response content type, either
application/json
(default) orapplication/yaml
- Response Headers:
Content-Type – this depends on Accept header of request
- Status Codes:
200 OK – no error
Example:
https://archive.softwareheritage.org/api/1/stat/counters/
Vault#
- GET /api/1/vault/directory/(dir_id)/#
This endpoint was replaced by
GET /api/1/vault/flat/(swhid)/
- GET /api/1/vault/directory/(dir_id)/raw/#
This endpoint was replaced by
GET /api/1/vault/flat/(swhid)/raw/
- GET /api/1/vault/revision/(rev_id)/gitfast/#
This endpoint was replaced by
GET /api/1/vault/gitfast/(swhid)/
- GET /api/1/vault/gitfast/(swhid)/raw/#
Fetch the cooked gitfast archive for a revision.
See
GET /api/1/vault/gitfast/(swhid)/
to get more details on gitfast cooking.- Parameters:
rev_id (string) – the revision’s sha1 identifier
- Response Headers:
Content-Type – application/gzip
- Status Codes:
200 OK – no error
404 Not Found – requested directory did not receive any cooking request yet (in case of GET) or can not be found in the archive (in case of POST)