# Software Heritage Filesystem (SwhFS) — Design notes

The Software Heritage data model is a Direct Acyclic Graph (DAG) with nodes of different types that correspond to source code artifacts such as directories, commits, etc. Using this FUSE module (SwhFS for short) you can locally mount, and then navigate as a filesystem, parts of the archive identified by Software Heritage identifiers (SWHIDs).

To retrieve information about the source code artifacts, SwhFS interacts over the network with the Software Heritage archive via its Web API.

## Architecture

SwhFS in context (C4 context diagram):

Main components of SwhFS (C4 container diagram):

## Command-line interface

Todo

### Blob cache

cnt SWHID → bytes


The blob cache map SWHIDs of type cnt to the bytes of their archived content.

In general, each SWHID that has an entry in the blob cache also has a matching entry in the metadata cache for other blob attributes (e.g., checksums, size, etc.).

The blob cache entry for a given content object is populated, at the latest, the first time the object is open()-d. It might be populated earlier on due to prefetching, e.g., when a directory pointing to the given content is listed for the first time.

### Direntry cache

dir inode → directory entries


The direntry cache map inode representing directories to the entries they contain. Each entry comes with its name as well as file attributes (i.e., all its needed to perform a detailed directory listing).

Additional attributes of each directory entry should be looked up on a entry by entry basis, possibly hitting the metadata cache.

The direntry cache for a given dir is populated, at the latest, when the content of the directory is listed. More aggressive prefetching might happen. For instance, when first opening a dir a recursive listing of it can be retrieved from the remote backend and used to recursively populate the direntry cache for all (transitive) sub-directories.

Cache location: in-memory.