How to deploy a mirror#

This section describes how to deploy a mirror using the software stack provided by Software Heritage.

A mirror deployment will consists in running several components of the Software Heritage stack:

An instance of the storage (Software Heritage - Storage);
A backend database (PostgreSQL or Cassandra) for the storage;
An instance of the object storage (Software Heritage - Object storage);
A large storage system (zfs or cloud storage) as the objstorage backend;
An instance of the frontend (Software Heritage - Web applications);
An instance of the search engine backend (Software Heritage - Search service);
An elasticsearch instance as swh-search backend;
The vault service and its support tooling (RabbitMQ, swh-scheduler, Software Heritage - Vault, …);
The replayer services:
- swh.storage.replay service (part of the Software Heritage - Storage package)
- swh.objstorage.replayer.replay service (from the Software Heritage - Object storage replayer package)

Each service consists in an HTTP-based RPC served by a gunicorn WSGI server.

Note

It is not recommended to try to deploy each Software Heritage service individually. You should rather start from the example docker-based deployment project linked below.

Docker-based deployment#

This represents a lot of services to configure and orchestrate. In order to help to start the configuration of a mirror, a docker-swarm based deployment solution is provided as a working example of the mirror stack:

https://gitlab.softwareheritage.org/swh/infra/swh-mirror

It is strongly recommended to start from there in a test environment before planning a production-like deployment.

How to deploy a mirror#

Docker-based deployment#

This Page