From an end-user point of view, the Software Heritage platform consists in the archive, which can be accessed using the web interface or its REST API. Behind the scene (and the web app) are several components that expose different aspects of the Software Heritage archive as internal REST APIs.
Each of these internal APIs have a dedicated (Postgresql) database.
A global view of this architecture looks like:
The front API components are:
The main components involved in this choreography are:
Listers: a lister is a type of task aiming at scraping a web site, a forge, etc. to gather all the source code repositories it can find. For each found source code repository, a loader task is created.
The following sequence diagram shows the interactions between these components when a new forge needs to be archived. This example depicts the case of a gitlab forge, but any other supported source type would be very similar.
As one might observe in this diagram, it does create two things:
it insert one loader task for each source code repository that will be in charge of importing the content of that repository.
The sequence diagram below describe this second step of importing the content of a repository. Once again, we take the example of a git repository, but any other type of repository would be very similar.