API Specification

This is Software Heritage’s SWORD 2.0 Server implementation.

S.W.O.R.D (Simple Web-Service Offering Repository Deposit) is an interoperability standard for digital file deposit.

This implementation will permit interaction between a client (a repository) and a server (SWH repository) to push deposits of software source code archives with associated metadata.

Note:

  • In the following document, we will use the archive or software source code archive interchangeably.
  • The supported archive formats are:
    • zip: common zip archive (no multi-disk zip files).
    • tar: tar archive without compression or optionally any of the following compression algorithm gzip (.tar.gz, .tgz), bzip2 (.tar.bz2) , or lzma (.tar.lzma)

Collection

SWORD defines a collection concept. In SWH’s case, this collection refers to a group of deposits. A deposit is some form of software source code archive(s) associated with metadata. By default the client’s collection will have the client’s name.

Limitations

  • upload limitation of 100Mib
  • no mediation

API overview

API access is over HTTPS.

The API is protected through basic authentication.

Endpoints

The API endpoints are rooted at https://deposit.softwareheritage.org/1/.

Data is sent and received as XML (as specified in the SWORD 2.0 specification).

Service document

GET /1/servicedocument/

This is the starting endpoint for the client to discover its initial collection. The answer to this query will describes:

  • the server’s abilities
  • connected client’s collection information

Also known as: SD-IRI - The Service Document IRI

Parameters:
  • <name><pass> (text) – the client’s credentials
Status Codes:

Sample response

<?xml version="1.0" ?>
<service xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:sword="http://purl.org/net/sword/terms/"
    xmlns:atom="http://www.w3.org/2005/Atom"
    xmlns="http://www.w3.org/2007/app">

    <sword:version>2.0</sword:version>
    <sword:maxUploadSize>20971520</sword:maxUploadSize>

    <workspace>
        <atom:title>The Software Heritage (SWH) archive</atom:title>
        <collection href="https://deposit.softwareherigage.org/1/hal/">
            <atom:title>SWH Software Archive</atom:title>
            <accept>application/zip</accept>
            <accept>application/x-tar</accept>
            <sword:collectionPolicy>Collection Policy</sword:collectionPolicy>
            <dcterms:abstract>Software Heritage Archive</dcterms:abstract>
            <sword:mediation>false</sword:mediation>
            <sword:metadataRelevantHeader>false</sword:metadataRelevantHeader>
            <sword:treatment>Collect, Preserve, Share</sword:treatment>
            <sword:acceptPackaging>http://purl.org/net/sword/package/SimpleZip</sword:acceptPackaging>
            <sword:service>https://deposit.softwareheritage.org/1/hal/</sword:service>
        </collection>
    </workspace>
</service>

Create deposit

POST /1/<collection-name>/

Create deposit in a collection.

The client sends a deposit request to a specific collection with:

  • an archive holding the software source code (binary upload)

  • an envelop with metadata describing information regarding a deposit (atom entry deposit)

    Also known as: COL-IRI

Parameters:
  • <name><pass> (text) – the client’s credentials
  • Content-Type (text) – accepted mimetype
  • Content-Length (int) – tarball size
  • Content-MD5 (text) – md5 checksum hex encoded of the tarball
  • Content-Disposition (text) – attachment; filename=[filename]; the filename parameter must be text (ascii)
  • Content-Disposition – for the metadata file set name parameter to ‘atom’.
  • In-progress (bool) – true if not final; false when final request.
Status Codes:

Sample request

curl -i -u hal:<pass> \
    -F "file=@../deposit.json;type=application/zip;filename=payload" \
    -F "atom=@../atom-entry.xml;type=application/atom+xml;charset=UTF-8" \
    -H 'In-Progress: false' \
    -H 'Slug: some-external-id' \
    -XPOST https://deposit.softwareheritage.org/1/hal/

Sample response

HTTP/1.0 201 Created
Date: Tue, 26 Sep 2017 10:32:35 GMT
Server: WSGIServer/0.2 CPython/3.5.3
Vary: Accept, Cookie
Allow: GET, POST, PUT, DELETE, HEAD, OPTIONS
Location: /1/hal/10/metadata/
X-Frame-Options: SAMEORIGIN
Content-Type: application/xml

<entry xmlns="http://www.w3.org/2005/Atom"
       xmlns:sword="http://purl.org/net/sword/"
       xmlns:dcterms="http://purl.org/dc/terms/">
    <deposit_id>10</deposit_id>
    <deposit_date>Sept. 26, 2017, 10:32 a.m.</deposit_date>
    <deposit_archive>None</deposit_archive>
    <deposit_status>deposited</deposit_status>

    <!-- Edit-IRI -->
    <link rel="edit" href="/1/hal/10/metadata/" />
    <!-- EM-IRI -->
    <link rel="edit-media" href="/1/hal/10/media/"/>
    <!-- SE-IRI -->
    <link rel="http://purl.org/net/sword/terms/add" href="/1/hal/10/metadata/" />
    <!-- State-IRI -->
    <link rel="alternate" href="/1/<collection-name>/10/status/"/>

    <sword:packaging>http://purl.org/net/sword/package/SimpleZip</sword:packaging>
</entry>

Update content

POST /1/<collection-name>/<deposit-id>/media/

Add archive(s) to a deposit. Only possible if the deposit’s status is partial.

PUT /1/<collection-name>/<deposit-id>/media/

Replace all content by submitting a new archive. Only possible if the deposit’s status is partial.

Also known as: update iri (EM-IRI)

Parameters:
  • <name><pass> (text) – the client’s credentials
  • Content-Type (text) – accepted mimetype
  • Content-Length (int) – tarball size
  • Content-MD5 (text) – md5 checksum hex encoded of the tarball
  • Content-Disposition (text) – attachment; filename=[filename] ; the filename parameter must be text (ascii)
  • In-progress (bool) – true if not final; false when final request.
Status Codes:

Update metadata

POST /1/<collection-name>/<deposit-id>/metadata/

Add metadata to a deposit. Only possible if the deposit’s status is partial.

PUT /1/<collection-name>/<deposit-id>/metadata/

Replace all metadata by submitting a new metadata file. Only possible if the deposit’s status is partial.

Also known as: update iri (SE-IRI)

Parameters:
  • <name><pass> (text) – the client’s credentials
  • Content-Disposition (text) – attachment; filename=[filename] ; the filename parameter must be text (ascii), with a name parameter set to ‘atom’.
  • In-progress (bool) – true if not final; false when final request.
Status Codes:

Retrieve status

GET /1/<collection-name>/<deposit-id>/

Returns deposit’s status.

The different statuses:

  • partial: multipart deposit is still ongoing
  • deposited: deposit completed, ready for checks
  • rejected: deposit failed the checks
  • verified: content and metadata verified, ready for loading
  • loading: loading in-progress
  • done: loading completed successfully
  • failed: the deposit loading has failed

Also known as STATE-IRI

Parameters:
  • <name><pass> (text) – the client’s credentials
Status Codes:

Rejected deposit

It so happens that deposit could be rejected. In that case, the deposit_status_detail entry will explain failed checks.

Many reasons are possibles, here are some:

  • Deposit without software archive (main goal of the deposit is to deposit software source code)
  • Deposit with malformed software archive (i.e archive within archive)
  • Deposit with invalid software archive (corrupted archive, although, this one should happen during upload and not during checks)
  • Deposit with unsupported archive format
  • Deposit with missing metadata

Sample response

Successful deposit:

<entry xmlns="http://www.w3.org/2005/Atom"
       xmlns:sword="http://purl.org/net/sword/"
       xmlns:dcterms="http://purl.org/dc/terms/">
    <deposit_id>160</deposit_id>
    <deposit_status>done</deposit_status>
    <deposit_status_detail>The deposit has been successfully loaded into the Software Heritage archive</deposit_status_detail>
    <deposit_swh_id>swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9</deposit_swh_id>
    <deposit_swh_id_context>swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9;origin=https://forge.softwareheritage.org/source/jesuisgpl/</deposit_swh_id>
    <deposit_swh_anchor_id>swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb</deposit_swh_id>
    <deposit_swh_anchor_id_context>swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb;origin=https://forge.softwareheritage.org/source/jesuisgpl/</deposit_swh_id>
</entry>

Rejected deposit:

<entry xmlns="http://www.w3.org/2005/Atom"
       xmlns:sword="http://purl.org/net/sword/"
       xmlns:dcterms="http://purl.org/dc/terms/">
    <deposit_id>148</deposit_id>
    <deposit_status>rejected</deposit_status>
    <deposit_status_detail>- At least one url field must be compatible with the client&#39;s domain name (codemeta:url)</deposit_status_detail>
</entry>

Display content

GET /1/<collection-name>/<deposit-id>/content/

Display information on the content’s representation in the sword server.

Also known as: CONT-FILE-IRI

Parameters:
  • <name><pass> (text) – the client’s credentials
Status Codes:

Possible errors:

  • common errors:
    • 401 (unauthenticated) if a client does not provide credential or provide wrong ones
    • 403 (forbidden) if a client tries access to a collection it does not own
    • 404 (not found) if a client tries access to an unknown collection
    • 404 (not found) if a client tries access to an unknown deposit
    • 415 (unsupported media type) if a wrong media type is provided to the endpoint
  • archive/binary deposit:
    • 403 (forbidden) if the length of the archive exceeds the max size configured
    • 412 (precondition failed) if the length or hash provided mismatch the reality of the archive.
    • 415 (unsupported media type) if a wrong media type is provided
  • multipart deposit:
    • 412 (precondition failed) if the md5 hash provided mismatch the reality of the archive
    • 415 (unsupported media type) if a wrong media type is provided
  • Atom entry deposit:
    • 400 (bad request) if the request’s body is empty (for creation only)