Software Heritage - How Bazaar/Breezy works#

In Bazaar, a repository is simply the store of revisions. It’s a storing backend and does not have to carry any semantic purpose for the project(s) it’s holding. What users are really dealing with are branches.

A branch is an ordered set of revisions that describes the history of a set of files. Like in Git, it’s a pointer to a single revision. It corresponds to a folder on the file system, and can only have a single head: if two clones of a branch diverge, the only way of uniting them is by merging one into the other. A branch needs to have a repository to store its revisions, but multiple branches can share the same repository.

Note: there isn’t a “Breezy” format, just the different Bazaar formats which are supported by Breezy, along with e.g. the Git format.

Bazaar does not have a very strong opinion on how it should be used and supports multiple different workflows, even a centralized one with bound branches. We need to pick the most “workflow-agnostic” way of saving Bazaar repositories… or rather branches.

For our purposes, we will treat each branch as a separate origin, since we have no way of knowing if branches inside a repository are related in bzr terms, and also because we de-duplicate on the SWH side:

  • From a user standpoint, they will most likely be searching by branch. If they search by shared repository, they will search with a prefix of a branch, which should also work

  • Since bzr branches do not have multiple heads, we don’t have to worry about any sort of mapping, we will simply have HEAD

  • Tags are per-branch, so that also works

  • Ghost revisions can be saved even if we don’t have the target revision since that’s how the nixguix loader does it

Not resolved yet:

  • Bazaar is able to store empty directories, does SWH handle them? (T4201)

  • What do we do about multiple authors (they are line separated) in each commit? (T3887)