swh-web API URLs

Content

GET /api/1/content/known/(sha1)[,(sha1), ...,(sha1)]/

Check whether some content(s) (aka “blob(s)”) is present in the archive based on its sha1 checksum.

Parameters:
  • sha1 (string) – hexadecimal representation of the sha1 checksum value for the content to check existence. Multiple values can be provided separated by ‘,’.
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 
Response JSON Object:
 
  • search_res (array) – array holding the search result for each provided sha1
  • search_stats (object) – some statistics regarding the number of sha1 provided and the percentage of those found in the archive

Allowed HTTP Methods: GET, HEAD, OPTIONS

Status Codes:

Example:

https://archive.softwareheritage.org/api/1/content/known/dc2830a9e72f23c1dfebef4413003221baa5fb62,0c3f19cb47ebfbe643fb19fa94c874d18fa62d12/
GET /api/1/content/[(hash_type):](hash)/

Get information about a content (aka a “blob”) object. In the archive, a content object is identified based on checksum values computed using various hashing algorithms.

Parameters:
  • hash_type (string) – optional parameter specifying which hashing algorithm has been used to compute the content checksum. It can be either sha1, sha1_git, sha256 or blake2s256. If that parameter is not provided, it is assumed that the hashing algorithm used is sha1.
  • hash (string) – hexadecimal representation of the checksum value computed with the specified hashing algorithm.
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 
Response JSON Object:
 

Allowed HTTP Methods: GET, HEAD, OPTIONS

Status Codes:

Example:

curl -i https://archive.softwareheritage.org/api/1/content/sha1_git:fe95a46679d128ff167b7c55df5d02356c5a1ae1/
GET /api/1/content/[(hash_type):](hash)/raw/

Get the raw content of a content object (aka a “blob”), as a byte sequence.

Parameters:
  • hash_type (string) – optional parameter specifying which hashing algorithm has been used to compute the content checksum. It can be either sha1, sha1_git, sha256 or blake2s256. If that parameter is not provided, it is assumed that the hashing algorithm used is sha1.
  • hash (string) – hexadecimal representation of the checksum value computed with the specified hashing algorithm.
Query Parameters:
 
  • filename (string) – if provided, the downloaded content will get that filename
Response Headers:
 

Allowed HTTP Methods: GET, HEAD, OPTIONS

Status Codes:

Example:

https://archive.softwareheritage.org/api/1/content/sha1:dc2830a9e72f23c1dfebef4413003221baa5fb62/raw/
GET /api/1/content/[(hash_type):](hash)/filetype/

Get information about the detected MIME type of a content object.

Parameters:
  • hash_type (string) – optional parameter specifying which hashing algorithm has been used to compute the content checksum. It can be either sha1, sha1_git, sha256 or blake2s256. If that parameter is not provided, it is assumed that the hashing algorithm used is sha1.
  • hash (string) – hexadecimal representation of the checksum value computed with the specified hashing algorithm.
Response JSON Object:
 
  • content_url (object) – link to GET /api/1/content/[(hash_type):](hash)/ for getting information about the content
  • encoding (string) – the detected content encoding
  • id (string) – the sha1 identifier of the content
  • mimetype (string) – the detected MIME type of the content
  • tool (object) – information about the tool used to detect the content filetype
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 

Allowed HTTP Methods: GET, HEAD, OPTIONS

Status Codes:

Example:

https://archive.softwareheritage.org/api/1/content/sha1:dc2830a9e72f23c1dfebef4413003221baa5fb62/filetype/
GET /api/1/content/[(hash_type):](hash)/language/

Get information about the programming language used in a content object.

Note: this endpoint currently returns no data.

Parameters:
  • hash_type (string) – optional parameter specifying which hashing algorithm has been used to compute the content checksum. It can be either sha1, sha1_git, sha256 or blake2s256. If that parameter is not provided, it is assumed that the hashing algorithm used is sha1.
  • hash (string) – hexadecimal representation of the checksum value computed with the specified hashing algorithm.
Response JSON Object:
 
  • content_url (object) – link to GET /api/1/content/[(hash_type):](hash)/ for getting information about the content
  • id (string) – the sha1 identifier of the content
  • lang (string) – the detected programming language if any
  • tool (object) – information about the tool used to detect the programming language
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 

Allowed HTTP Methods: GET, HEAD, OPTIONS

Status Codes:

Example:

https://archive.softwareheritage.org/api/1/content/sha1:dc2830a9e72f23c1dfebef4413003221baa5fb62/language/
GET /api/1/content/[(hash_type):](hash)/license/

Get information about the license of a content object.

Parameters:
  • hash_type (string) – optional parameter specifying which hashing algorithm has been used to compute the content checksum. It can be either sha1, sha1_git, sha256 or blake2s256. If that parameter is not provided, it is assumed that the hashing algorithm used is sha1.
  • hash (string) – hexadecimal representation of the checksum value computed with the specified hashing algorithm.
Response JSON Object:
 
  • content_url (object) – link to GET /api/1/content/[(hash_type):](hash)/ for getting information about the content
  • id (string) – the sha1 identifier of the content
  • licenses (array) – array of strings containing the detected license names if any
  • tool (object) – information about the tool used to detect the license
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 

Allowed HTTP Methods: GET, HEAD, OPTIONS

Status Codes:

Example:

https://archive.softwareheritage.org/api/1/content/sha1:dc2830a9e72f23c1dfebef4413003221baa5fb62/license/

Directory

GET /api/1/directory/(sha1_git)/[(path)/]

Get information about directory objects. Directories are identified by sha1 checksums, compatible with Git directory identifiers. See swh.model.identifiers.directory_identifier() in our data model module for details about how they are computed.

When given only a directory identifier, this endpoint returns information about the directory itself, returning its content (usually a list of directory entries). When given a directory identifier and a path, this endpoint returns information about the directory entry pointed by the relative path, starting path resolution from the given directory.

Parameters:
  • sha1_git (string) – hexadecimal representation of the directory sha1_git identifier
  • path (string) – optional parameter to get information about the directory entry pointed by that relative path
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 
Response JSON Array of Objects:
 
  • checksums (object) – object holding the computed checksum values for a directory entry (only for file entries)
  • dir_id (string) – sha1_git identifier of the requested directory
  • length (number) – length of a directory entry in bytes (only for file entries) for getting information about the content MIME type
  • name (string) – the directory entry name
  • perms (number) – permissions for the directory entry
  • target (string) – sha1_git identifier of the directory entry
  • target_url (string) – link to GET /api/1/content/[(hash_type):](hash)/ or GET /api/1/directory/(sha1_git)/[(path)/] depending on the directory entry type
  • type (string) – the type of the directory entry, can be either dir, file or rev

Allowed HTTP Methods: GET, HEAD, OPTIONS

Status Codes:

Example:

https://archive.softwareheritage.org/api/1/directory/977fc4b98c0e85816348cebd3b12026407c368b6/

Persistent identifiers

GET /api/1/resolve/(swh_id)/

Resolve a Software Heritage persistent identifier.

Try to resolve a provided persistent identifier into an url for browsing the pointed archive object. If the provided identifier is valid, the existence of the object in the archive will also be checked.

Parameters:
  • swh_id (string) – a Software Heritage presistent identifier
Response JSON Object:
 
  • browse_url (string) – the url for browsing the pointed object
  • metadata (object) – object holding optional parts of the persistent identifier
  • namespace (string) – the persistent identifier namespace
  • object_id (string) – the hash identifier of the pointed object
  • object_type (string) – the type of the pointed object
  • scheme_version (number) – the scheme version of the persistent identifier
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 

Allowed HTTP Methods: GET, HEAD, OPTIONS

Status Codes:

Example:

https://archive.softwareheritage.org/api/1/resolve/swh:1:rev:96db9023b881d7cd9f379b0c154650d6c108e9a3;origin=https://github.com/openssl/openssl/

Origin

GET /api/1/origin/(origin_url)/get/

Get information about a software origin.

Parameters:
  • origin_url (string) – the origin url
Response JSON Object:
 
  • origin_visits_url (string) – link to in order to get information about the visits for that origin
  • url (string) – the origin canonical url
  • type (string) – the type of software origin (deprecated value; types are now associated to visits instead of origins)
  • id (number) – the origin unique identifier (deprecated value; you should only refer to origins based on their URL)
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 
Allowed HTTP Methods: GET, HEAD,
OPTIONS
Status Codes:

Example:

https://archive.softwareheritage.org/api/1/origin/git/url/https://github.com/python/cpython/
GET /api/1/origin/(origin_id)/

Get information about a software origin.

Warning

All endpoints using an origin_id or an origin_type are deprecated and will be removed in the near future. Only those using an origin_url will remain available. You should use GET /api/1/origin/(origin_url)/get/ instead.

Parameters:
  • origin_id (int) – a software origin identifier
Response JSON Object:
 
  • origin_visits_url (string) – link to in order to get information about the visits for that origin
  • url (string) – the origin canonical url
  • type (string) – the type of software origin (deprecated value; types are now associated to visits instead of origins)
  • id (number) – the origin unique identifier (deprecated value; you should only refer to origins based on their URL)
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 
Allowed HTTP Methods: GET, HEAD,
OPTIONS
Status Codes:

Example:

https://archive.softwareheritage.org/api/1/origin/1/
GET /api/1/origin/(origin_type)/url/(origin_url)/

Get information about a software origin.

Warning

All endpoints using an origin_id or an origin_type are deprecated and will be removed in the near future. Only those using an origin_url will remain available. You should use GET /api/1/origin/(origin_url)/get/ instead.

Parameters:
  • origin_type (string) – the origin type (possible values are git, svn, hg, deb, pypi, npm, ftp or deposit)
  • origin_url (string) – the origin url
Response JSON Object:
 
  • origin_visits_url (string) – link to in order to get information about the visits for that origin
  • url (string) – the origin canonical url
  • type (string) – the type of software origin (deprecated value; types are now associated to visits instead of origins)
  • id (number) – the origin unique identifier (deprecated value; you should only refer to origins based on their URL)
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 
Allowed HTTP Methods: GET, HEAD,
OPTIONS
Status Codes:

Example:

https://archive.softwareheritage.org/api/1/origin/git/url/https://github.com/python/cpython/
GET /api/1/origin/search/(url_pattern)/

Search for software origins whose urls contain a provided string pattern or match a provided regular expression. The search is performed in a case insensitive way.

Parameters:
  • url_pattern (string) – a string pattern or a regular expression
Query Parameters:
 
  • offset (int) – the number of found origins to skip before returning results
  • limit (int) – the maximum number of found origins to return
  • regexp (boolean) – if true, consider provided pattern as a regular expression and search origins whose urls match it
  • with_visit (boolean) – if true, only return origins with at least one visit by Software heritage
Response JSON Array of Objects:
 
  • origin_visits_url (string) – link to in order to get information about the visits for that origin
  • url (string) – the origin canonical url
  • type (string) – the type of software origin (deprecated value; types are now associated to visits instead of origins)
  • id (number) – the origin unique identifier (deprecated value; you should only refer to origins based on their URL)
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 
Allowed HTTP Methods: GET, HEAD,
OPTIONS
Status Codes:

Example:

https://archive.softwareheritage.org/api/1/origin/search/python/?limit=2
GET /api/1/origin/(origin_url)/visits/

Get information about all visits of a software origin. Visits are returned sorted in descending order according to their date.

Parameters:
  • origin_url (str) – a software origin URL
Query Parameters:
 
  • per_page (int) – specify the number of visits to list, for pagination purposes
  • last_visit (int) – visit to start listing from, for pagination purposes
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 
  • Content-Type – this depends on Accept header of request
  • Link – indicates that a subsequent result page is available and contains the url pointing to it
Response JSON Array of Objects:
 
  • date (string) – ISO representation of the visit date (in UTC)
  • origin (str) – the origin canonical url
  • origin_url (string) – link to get information about the origin
  • status (string) – status of the visit (either full, partial or ongoing)
  • visit (number) – the unique identifier of the visit
  • id (number) – the unique identifier of the origin
  • origin_visit_url (string) – link to GET /api/1/origin/(origin_url)/visit/(visit_id)/ in order to get information about the visit
>jsonarrarr string snapshot:
 

the snapshot identifier of the visit

>jsonarrarr string snapshot_url:
 

link to GET /api/1/snapshot/(snapshot_id)/ in order to get information about the snapshot of the visit

Allowed HTTP Methods: GET, HEAD,
OPTIONS
Status Codes:

Example:

https://archive.softwareheritage.org/api/1/origin/https://github.com/hylang/hy/visits/
GET /api/1/origin/(origin_id)/visits/

Get information about all visits of a software origin. Visits are returned sorted in descending order according to their date.

Warning

All endpoints using an origin_id are deprecated and will be removed in the near future. Only those using an origin_url will remain available. Use GET /api/1/origin/(origin_url)/visits/ instead.

Parameters:
  • origin_id (int) – a software origin identifier
Query Parameters:
 
  • per_page (int) – specify the number of visits to list, for pagination purposes
  • last_visit (int) – visit to start listing from, for pagination purposes
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 
  • Content-Type – this depends on Accept header of request
  • Link – indicates that a subsequent result page is available and contains the url pointing to it
Response JSON Array of Objects:
 
  • date (string) – ISO representation of the visit date (in UTC)
  • origin (str) – the origin canonical url
  • origin_url (string) – link to get information about the origin
  • status (string) – status of the visit (either full, partial or ongoing)
  • visit (number) – the unique identifier of the visit
  • id (number) – the unique identifier of the origin
  • origin_visit_url (string) – link to GET /api/1/origin/(origin_url)/visit/(visit_id)/ in order to get information about the visit
>jsonarrarr string snapshot:
 

the snapshot identifier of the visit

>jsonarrarr string snapshot_url:
 

link to GET /api/1/snapshot/(snapshot_id)/ in order to get information about the snapshot of the visit

Allowed HTTP Methods: GET, HEAD,
OPTIONS
Status Codes:

Example:

https://archive.softwareheritage.org/api/1/origin/1/visits/
GET /api/1/origin/(origin_url)/visit/(visit_id)/

Get information about a specific visit of a software origin.

Parameters:
  • origin_url (str) – a software origin URL
  • visit_id (int) – a visit identifier
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 
Response JSON Object:
 
  • date (string) – ISO representation of the visit date (in UTC)
  • origin (str) – the origin canonical url
  • origin_url (string) – link to get information about the origin
  • status (string) – status of the visit (either full, partial or ongoing)
  • visit (number) – the unique identifier of the visit
Response JSON Array of Objects:
 
  • snapshot (string) – the snapshot identifier of the visit
  • snapshot_url (string) – link to GET /api/1/snapshot/(snapshot_id)/ in order to get information about the snapshot of the visit
Allowed HTTP Methods: GET, HEAD,
OPTIONS
Status Codes:
  • 200 OK – no error
  • 404 Not Found – requested origin or visit can not be found in the archive

Example:

https://archive.softwareheritage.org/api/1/origin/https://github.com/hylang/hy/visit/1/
GET /api/1/origin/(origin_id)/visit/(visit_id)/

Get information about a specific visit of a software origin.

Warning

All endpoints using an origin_id are deprecated and will be removed in the near future. Only those using an origin_url will remain available. Use GET /api/1/origin/(origin_url)/visit/(visit_id) instead.

Parameters:
  • origin_id (int) – a software origin identifier
  • visit_id (int) – a visit identifier
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 
Response JSON Object:
 
  • date (string) – ISO representation of the visit date (in UTC)
  • origin (str) – the origin canonical url
  • origin_url (string) – link to get information about the origin
  • status (string) – status of the visit (either full, partial or ongoing)
  • visit (number) – the unique identifier of the visit
Response JSON Array of Objects:
 
  • snapshot (string) – the snapshot identifier of the visit
  • snapshot_url (string) – link to GET /api/1/snapshot/(snapshot_id)/ in order to get information about the snapshot of the visit
Allowed HTTP Methods: GET, HEAD,
OPTIONS
Status Codes:
  • 200 OK – no error
  • 404 Not Found – requested origin or visit can not be found in the archive

Example:

https://archive.softwareheritage.org/api/1/origin/1500/visit/1/

Person

GET /api/1/person/(person_id)/

Get information about a person in the archive.

Parameters:
  • person_id (int) – a person identifier
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 
Response JSON Object:
 
  • email (string) – the email of the person
  • fullname (string) – the full name of the person: combination of its name and email
  • id (number) – the unique identifier of the person
  • name (string) – the name of the person
Allowed HTTP Methods: GET, HEAD,
OPTIONS
Status Codes:

Example:

https://archive.softwareheritage.org/api/1/person/8275/

Release

GET /api/1/release/(sha1_git)/

Get information about a release in the archive. Releases are identified by sha1 checksums, compatible with Git tag identifiers. See swh.model.identifiers.release_identifier() in our data model module for details about how they are computed.

Parameters:
  • sha1_git (string) – hexadecimal representation of the release sha1_git identifier
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 
Response JSON Object:
 
  • author (object) – information about the author of the release
  • author_url (string) – link to GET /api/1/person/(person_id)/ to get information about the author of the release
  • date (string) – ISO representation of the release date (in UTC)
  • id (string) – the release unique identifier
  • message (string) – the message associated to the release
  • name (string) – the name of the release
  • target (string) – the target identifier of the release
  • target_type (string) – the type of the target, can be either release, revision, content, directory
  • target_url (string) – a link to the adequate api url based on the target type
Allowed HTTP Methods: GET, HEAD,
OPTIONS
Status Codes:

Example:

https://archive.softwareheritage.org/api/1/release/208f61cc7a5dbc9879ae6e5c2f95891e270f09ef/

Revision

GET /api/1/revision/(sha1_git)/

Get information about a revision in the archive. Revisions are identified by sha1 checksums, compatible with Git commit identifiers. See swh.model.identifiers.revision_identifier() in our data model module for details about how they are computed.

Parameters:
  • sha1_git (string) – hexadecimal representation of the revision sha1_git identifier
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 
Response JSON Object:
 
  • author (object) – information about the author of the revision
  • author_url (string) – link to GET /api/1/person/(person_id)/ to get information about the author of the revision
  • committer (object) – information about the committer of the revision
  • committer_url (string) – link to GET /api/1/person/(person_id)/ to get information about the committer of the revision
  • committer_date (string) – ISO representation of the commit date (in UTC)
  • date (string) – ISO representation of the revision date (in UTC)
  • directory (string) – the unique identifier that revision points to
  • directory_url (string) – link to GET /api/1/directory/(sha1_git)/[(path)/] to get information about the directory associated to the revision
  • id (string) – the revision unique identifier
  • merge (boolean) – whether or not the revision corresponds to a merge commit
  • message (string) – the message associated to the revision
  • parents (array) – the parents of the revision, i.e. the previous revisions that head directly to it, each entry of that array contains an unique parent revision identifier but also a link to GET /api/1/revision/(sha1_git)/ to get more information about it
  • type (string) – the type of the revision
Allowed HTTP Methods: GET, HEAD,
OPTIONS
Status Codes:

Example:

https://archive.softwareheritage.org/api/1/revision/aafb16d69fd30ff58afdd69036a26047f3aebdc6/
GET /api/1/revision/(sha1_git)/directory/[(path)/]

Get information about directory (entry) objects associated to revisions. Each revision is associated to a single “root” directory. This endpoint behaves like GET /api/1/directory/(sha1_git)/[(path)/], but operates on the root directory associated to a given revision.

Parameters:
  • sha1_git (string) – hexadecimal representation of the revision sha1_git identifier
  • path (string) – optional parameter to get information about the directory entry pointed by that relative path
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 
Response JSON Object:
 
  • content (array) – directory entries as returned by GET /api/1/directory/(sha1_git)/[(path)/]
  • path (string) – path of directory from the revision root one
  • revision (string) – the unique revision identifier
  • type (string) – the type of the directory

Allowed HTTP Methods: GET, HEAD, OPTIONS

Status Codes:

Example:

https://archive.softwareheritage.org/api/1/revision/f1b94134a4b879bc55c3dacdb496690c8ebdc03f/directory/
GET /api/1/revision/(sha1_git)[/prev/(prev_sha1s)]/log/

Get a list of all revisions heading to a given one, in other words show the commit log.

Parameters:
  • sha1_git (string) – hexadecimal representation of the revision sha1_git identifier
  • prev_sha1s (string) – optional parameter representing the navigation breadcrumbs (descendant revisions previously visited). If multiple values, use / as delimiter. If provided, revisions information will be added at the beginning of the returned list.
Query Parameters:
 
  • per_page (int) – number of elements in the returned list, for pagination purpose
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 
  • Content-Type – this depends on Accept header of request
  • Link – indicates that a subsequent result page is available and contains the url pointing to it
Response JSON Array of Objects:
 
  • author (object) – information about the author of the revision
  • author_url (string) – link to GET /api/1/person/(person_id)/ to get information about the author of the revision
  • committer (object) – information about the committer of the revision
  • committer_url (string) – link to GET /api/1/person/(person_id)/ to get information about the committer of the revision
  • committer_date (string) – ISO representation of the commit date (in UTC)
  • date (string) – ISO representation of the revision date (in UTC)
  • directory (string) – the unique identifier that revision points to
  • directory_url (string) – link to GET /api/1/directory/(sha1_git)/[(path)/] to get information about the directory associated to the revision
  • id (string) – the revision unique identifier
  • merge (boolean) – whether or not the revision corresponds to a merge commit
  • message (string) – the message associated to the revision
  • parents (array) – the parents of the revision, i.e. the previous revisions that head directly to it, each entry of that array contains an unique parent revision identifier but also a link to GET /api/1/revision/(sha1_git)/ to get more information about it
  • type (string) – the type of the revision

Allowed HTTP Methods: GET, HEAD, OPTIONS

Status Codes:

Example:

https://archive.softwareheritage.org/api/1/revision/e1a315fa3fa734e2a6154ed7b5b9ae0eb8987aad/log/
GET /api/1/revision/origin/(origin_id)/[branch/(branch_name)/][ts/(timestamp)/]

Get information about a revision, searching for it based on software origin, branch name, and/or visit timestamp.

This endpoint behaves like GET /api/1/revision/(sha1_git)/, but operates on the revision that has been found at a given software origin, close to a given point in time, pointed by a given branch.

Warning

All endpoints using an origin_id are deprecated and will be removed in the near future. Only those using an origin_url will remain available. You should instead use successively GET /api/1/origin/(origin_url)/visits/, GET /api/1/snapshot/(snapshot_id)/, and GET /api/1/revision/(sha1_git)/.

Parameters:
  • origin_id (int) – a software origin identifier
  • branch_name (string) – optional parameter specifying a fully-qualified branch name associated to the software origin, e.g., “refs/heads/master”. Defaults to the HEAD branch.
  • timestamp (string) – optional parameter specifying a timestamp close to which the revision pointed by the given branch should be looked up. The timestamp can be expressed either as an ISO date or as a Unix one (in UTC). Defaults to now.
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 
Response JSON Object:
 
  • author (object) – information about the author of the revision
  • author_url (string) – link to GET /api/1/person/(person_id)/ to get information about the author of the revision
  • committer (object) – information about the committer of the revision
  • committer_url (string) – link to GET /api/1/person/(person_id)/ to get information about the committer of the revision
  • committer_date (string) – ISO representation of the commit date (in UTC)
  • date (string) – ISO representation of the revision date (in UTC)
  • directory (string) – the unique identifier that revision points to
  • directory_url (string) – link to GET /api/1/directory/(sha1_git)/[(path)/] to get information about the directory associated to the revision
  • id (string) – the revision unique identifier
  • merge (boolean) – whether or not the revision corresponds to a merge commit
  • message (string) – the message associated to the revision
  • parents (array) – the parents of the revision, i.e. the previous revisions that head directly to it, each entry of that array contains an unique parent revision identifier but also a link to GET /api/1/revision/(sha1_git)/ to get more information about it
  • type (string) – the type of the revision

Allowed HTTP Methods: GET, HEAD, OPTIONS

Status Codes:
  • 200 OK – no error
  • 404 Not Found – no revision matching the given criteria could be found in the archive

Example:

https://archive.softwareheritage.org/api/1/revision/origin/13706355/branch/refs/heads/2.7/
GET /api/1/revision/origin/(origin_id)[/branch/(branch_name)][/ts/(timestamp)]/log

Show the commit log for a revision, searching for it based on software origin, branch name, and/or visit timestamp.

This endpoint behaves like GET /api/1/revision/(sha1_git)[/prev/(prev_sha1s)]/log/, but operates on the revision that has been found at a given software origin, close to a given point in time, pointed by a given branch.

Warning

All endpoints using an origin_id are deprecated and will be removed in the near future. Only those using an origin_url will remain available. You should instead use successively GET /api/1/origin/(origin_url)/visits/, GET /api/1/snapshot/(snapshot_id)/, and GET /api/1/revision/(sha1_git)[/prev/(prev_sha1s)]/log/.

Parameters:
  • origin_id (int) – a software origin identifier
  • branch_name (string) – optional parameter specifying a fully-qualified branch name associated to the software origin, e.g., “refs/heads/master”. Defaults to the HEAD branch.
  • timestamp (string) – optional parameter specifying a timestamp close to which the revision pointed by the given branch should be looked up. The timestamp can be expressed either as an ISO date or as a Unix one (in UTC). Defaults to now.
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 
Response JSON Array of Objects:
 
  • author (object) – information about the author of the revision
  • author_url (string) – link to GET /api/1/person/(person_id)/ to get information about the author of the revision
  • committer (object) – information about the committer of the revision
  • committer_url (string) – link to GET /api/1/person/(person_id)/ to get information about the committer of the revision
  • committer_date (string) – ISO representation of the commit date (in UTC)
  • date (string) – ISO representation of the revision date (in UTC)
  • directory (string) – the unique identifier that revision points to
  • directory_url (string) – link to GET /api/1/directory/(sha1_git)/[(path)/] to get information about the directory associated to the revision
  • id (string) – the revision unique identifier
  • merge (boolean) – whether or not the revision corresponds to a merge commit
  • message (string) – the message associated to the revision
  • parents (array) – the parents of the revision, i.e. the previous revisions that head directly to it, each entry of that array contains an unique parent revision identifier but also a link to GET /api/1/revision/(sha1_git)/ to get more information about it
  • type (string) – the type of the revision

Allowed HTTP Methods: GET, HEAD, OPTIONS

Status Codes:
  • 200 OK – no error
  • 404 Not Found – no revision matching the given criteria could be found in the archive

Example:

https://archive.softwareheritage.org/api/1/revision/origin/723566/ts/2016-01-17T00:00:00+00:00/log/

Snapshot

GET /api/1/snapshot/(snapshot_id)/

Get information about a snapshot in the archive.

A snapshot is a set of named branches, which are pointers to objects at any level of the Software Heritage DAG. It represents a full picture of an origin at a given time.

As well as pointing to other objects in the Software Heritage DAG, branches can also be aliases, in which case their target is the name of another branch in the same snapshot, or dangling, in which case the target is unknown.

A snapshot identifier is a salted sha1. See swh.model.identifiers.snapshot_identifier() in our data model module for details about how they are computed.

Parameters:
  • snapshot_id (sha1) – a snapshot identifier
Query Parameters:
 
  • branches_from (str) – optional parameter used to skip branches whose name is lesser than it before returning them
  • branches_count (int) – optional parameter used to restrain the amount of returned branches (default to 1000)
  • target_types (str) – optional comma separated list parameter used to filter the target types of branch to return (possible values that can be contained in that list are content, directory, revision, release, snapshot or alias)
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 
  • Content-Type – this depends on Accept header of request
  • Link – indicates that a subsequent result page is available and contains the url pointing to it
Response JSON Object:
 
  • branches (object) – object containing all branches associated to the snapshot,for each of them the associated target type and id are given but also a link to get information about that target
  • id (string) – the unique identifier of the snapshot
Allowed HTTP Methods: GET, HEAD,
OPTIONS
Status Codes:

Example:

https://archive.softwareheritage.org/api/1/snapshot/6a3a2cf0b2b90ce7ae1cf0a221ed68035b686f5a/

Archive statistics

GET /api/1/stat/counters/

Get statistics about the content of the archive.

Response JSON Object:
 
  • content (number) – current number of content objects (aka files) in the archive
  • directory (number) – current number of directory objects in the archive
  • origin (number) – current number of software origins (an origin is a “place” where code source can be found, e.g. a git repository, a tarball, …) in the archive
  • origin_visit (number) – current number of visits on software origins to fill the archive
  • person (number) – current number of persons (code source authors or committers) in the archive
  • release (number) – current number of releases objects in the archive
  • revision (number) – current number of revision objects (aka commits) in the archive
  • skipped_content (number) – current number of content objects (aka files) which where not inserted in the archive
  • snapshot (number) – current number of snapshot objects (aka set of named branches) in the archive
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 
Allowed HTTP Methods: GET, HEAD,
OPTIONS
Status Codes:

Example:

https://archive.softwareheritage.org/api/1/stat/counters/

Vault

GET /api/1/vault/directory/(dir_id)/
POST /api/1/vault/directory/(dir_id)/

Request the cooking of an archive for a directory or check its cooking status.

That endpoint enables to create a vault cooking task for a directory through a POST request or check the status of a previously created one through a GET request.

Once the cooking task has been executed, the resulting archive can be downloaded using the dedicated endpoint

Then to extract the cooked directory in the current one, use:

$ tar xvf path/to/directory.tar.gz
Parameters:
  • dir_id (string) – the directory’s sha1 identifier
Query Parameters:
 
  • email (string) – e-mail to notify when the archive is ready
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 
Response JSON Object:
 
  • fetch_url (string) – the url from which to download the archive once it has been cooked (see GET /api/1/vault/directory/(dir_id)/raw/)
  • obj_type (string) – the type of object to cook (directory or revision)
  • progress_message (string) – message describing the cooking task progress
  • id (number) – the cooking task id
  • status (string) – the cooking task status (either new, pending, done or failed)
  • obj_id (string) – the identifier of the object to cook
Allowed HTTP Methods: GET, POST,
HEAD, OPTIONS
Status Codes:
GET /api/1/vault/directory/(dir_id)/raw/

Fetch the cooked archive for a directory.

See GET /api/1/vault/directory/(dir_id)/ to get more details on directory cooking.

Parameters:
  • dir_id (string) – the directory’s sha1 identifier
Response Headers:
 
Allowed HTTP Methods: GET, HEAD,
OPTIONS
Status Codes:
GET /api/1/vault/revision/(rev_id)/gitfast/
POST /api/1/vault/revision/(rev_id)/gitfast/

Request the cooking of a gitfast archive for a revision or check its cooking status.

That endpoint enables to create a vault cooking task for a revision through a POST request or check the status of a previously created one through a GET request.

Once the cooking task has been executed, the resulting gitfast archive can be downloaded using the dedicated endpoint

Then to import the revision in the current directory, use:

$ git init
$ zcat path/to/revision.gitfast.gz | git fast-import
$ git checkout HEAD
Parameters:
  • rev_id (string) – the revision’s sha1 identifier
Query Parameters:
 
  • email (string) – e-mail to notify when the gitfast archive is ready
Request Headers:
 
  • Accept – the requested response content type, either application/json (default) or application/yaml
Response Headers:
 
Response JSON Object:
 
  • fetch_url (string) – the url from which to download the archive once it has been cooked (see GET /api/1/vault/revision/(rev_id)/gitfast/raw/)
  • obj_type (string) – the type of object to cook (directory or revision)
  • progress_message (string) – message describing the cooking task progress
  • id (number) – the cooking task id
  • status (string) – the cooking task status (new/pending/done/failed)
  • obj_id (string) – the identifier of the object to cook
Allowed HTTP Methods: GET, POST,
HEAD, OPTIONS
Status Codes:
GET /api/1/vault/revision/(rev_id)/gitfast/raw/

Fetch the cooked gitfast archive for a revision.

See GET /api/1/vault/revision/(rev_id)/gitfast/ to get more details on directory cooking.

Parameters:
  • rev_id (string) – the revision’s sha1 identifier
Response Headers:
 
Allowed HTTP Methods: GET, HEAD,
OPTIONS
Status Codes: