The sparse-deposit


A client wishes to transfer a tarball for which part of the content is already in the SWH archive.


To do so, a list of paths with targets must be provided in the metadata and the paths to the missing directories/content should not be included in the tarball. The list will be referred to as the manifest list using the entry name ‘bindings’ in the metadata.

path swh-id
path/to/file.txt swh:1:cnt:aaaaaaaaaaaaaaaaaaaaa…
path/to/dir/ swh:1:dir:aaaaaaaaaaaaaaaaaaaaa…

Note: the name of the file or the directory is given by the path and is not part of the identified object.

TODO: see if a trailing “/” is mandatory for implementation.

A concrete example

The manifest list is included in the metadata xml atomEntry under the swh namespace:

TODO: publish schema at

<?xml version="1.0"?>
  <entry xmlns=""
        <name>HAL</name>it mandatory to have a trailing "/",
      <codemeta:name>The assignment problem</codemeta:name>
      <codemeta:identifier>other identifier, DOI, ARK</codemeta:identifier>
          <codemeta:name> author1 </codemeta:name>
          <codemeta:affiliation> Inria </codemeta:affiliation>
          <codemeta:affiliation> UPMC </codemeta:affiliation>
          <codemeta:name> author2 </codemeta:name>
          <codemeta:affiliation> Inria </codemeta:affiliation>
          <codemeta:affiliation> UPMC </codemeta:affiliation>
        <swh:binding source="path/to/file.txt" destination="swh:1:cnt:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"/>
        <swh:binding source="path/to/second_file.txt destination="swh:1:cnt:bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"/>
        <swh:binding source="path/to/dir/destination="swh:1:dir:ddddddddddddddddddddddddddddddddd"/>


Deposit verification

After checking the integrity of the deposit content and metadata, the following checks should be added:

  1. validate the manifest list structure with a correct swh-id for each path (syntax check on the swh-id format)
  2. verify that the path name corresponds to the object type
  3. locate the identifiers in the SWH archive

Each failing check should return a different error with the deposit and result in a ‘rejected’ deposit.

Loading procedure

The injection procedure should include:

  • load the tarball new data
  • create new objects using the path name and create links from the path to the SWH object using the identifier
  • calculate identifier of the new objects at each level
  • return final swh-id of the new revision

Invariant: the same content should yield the same swh-id, that’s why a complete deposit with all the content and a sparse-deposit with the correct links will result with the same root directory swh-id. The same is expected with the revision swh-id if the metadata provided is identical.