This section complements the Use cases documentation, by detailing how deposits are handled internally after clients deposited them.
For every HTTP request sent by a client, the deposit API checks some simple properties,
then creates a
object containing the data uploaded by the client verbatim (archive and/or metadata),
and inserts in the database
swh.deposit.models.Deposit object is also created
and inserted, if this is the initial request creating a deposit.
Upon receiving the last request, identified by the lack of the
header, the deposit server either:
updates the deposit status and schedules a checking task by querying swh-scheduler, otherwise
For metadata-only deposits, this is the end of the story. The next section narrates what happens next for “normal” deposits.
As we saw above, the deposit API server’s synchronous work ends after sending
a checking task.
This task is implemented by
which is simply an other call to the deposit API,
This API performs longer checks, which require inspecting the deposited archive (or archives, for clients depositing archives in multiple steps). This is why it is run by an asynchronous task instead of being checked immediately when the client sent a query.
When it is done, it sets the deposit’s status to “verified” (so clients polling for the status know this step succeeded) and schedule a loading task.
Note that the check task is actually just a thin wrapper around an API call. While the checks could be done in the task itself, it would mean sending all archives from the deposit API to the celery worker, which would be inefficient. And the gains would not be great, as checking tasks only need to decompress archives, which is not resource intensive. Instead, this long-running call to the API proved to be a simpler and more efficient solution at the current scale of the deposit.
When the check task finished, it scheduled a load task, implemented by
It is part of the
swh.loader.package package instead of
because its design is close to other package loaders:
fetch a tarball
swh.model.from_diskto build SWH objects from it
load these objects in swh-storage
The only difference in this process is fetching the tarball from the deposit server,
instead of external repositories.
This tarball is returned by
which creates it by aggregating all archives sent by the client (usually
only one, but the SWORD protocol allows more).
Finally, when it is done, the loader updates the deposit status via the deposit API.