swh.fuse.cache module

class swh.fuse.cache.FuseCache(cache_conf: Dict[str, Any])[source]

Bases: object

SwhFS retrieves both metadata and file contents from the Software Heritage archive via the network. In order to obtain reasonable performances several caches are used to minimize network transfer.

Caches are stored on disk in SQLite databases located at $XDG_CACHE_HOME/swh/fuse/.

All caches are persistent (i.e., they survive the restart of the SwhFS process) and global (i.e., they are shared by concurrent SwhFS processes).

We assume that no cache invalidation is necessary, due to intrinsic properties of the Software Heritage archive, such as integrity verification and append-only archive changes. To clean the caches one can just remove the corresponding files from disk.

get_cached_swhids() → AsyncGenerator[swh.model.identifiers.SWHID, None][source]

Return a list of all previously cached SWHID

class swh.fuse.cache.AbstractCache(conf: Dict[str, Any])[source]

Bases: abc.ABC

Abstract cache implementation to share common behavior between cache types (such as: YAML config parsing, SQLite context manager)

class swh.fuse.cache.MetadataCache(conf: Dict[str, Any])[source]

Bases: swh.fuse.cache.AbstractCache

The metadata cache map each SWHID to the complete metadata of the referenced object. This is analogous to what is available in meta/<SWHID>.json file (and generally used as data source for returning the content of those files).

async get(swhid: swh.model.identifiers.SWHID, typify: bool = True) → Any[source]
async set(swhid: swh.model.identifiers.SWHID, metadata: Any) → None[source]
class swh.fuse.cache.BlobCache(conf: Dict[str, Any])[source]

Bases: swh.fuse.cache.AbstractCache

The blob cache map SWHIDs of type cnt to the bytes of their archived content.

The blob cache entry for a given content object is populated, at the latest, the first time the object is read()-d. It might be populated earlier on due to prefetching, e.g., when a directory pointing to the given content is listed for the first time.

async get(swhid: swh.model.identifiers.SWHID) → Optional[bytes][source]
async set(swhid: swh.model.identifiers.SWHID, blob: bytes) → None[source]
class swh.fuse.cache.HistoryCache(conf: Dict[str, Any])[source]

Bases: swh.fuse.cache.AbstractCache

The history cache map SWHIDs of type rev to a list of rev SWHIDs corresponding to all its revision ancestors, sorted in reverse topological order. As the parents cache, the history cache is lazily populated and can be prefetched. To efficiently store the ancestor lists, the history cache represents ancestors as graph edges (a pair of two SWHID nodes), meaning the history cache is shared amongst all revisions parents.

async get(swhid: swh.model.identifiers.SWHID) → Optional[List[swh.model.identifiers.SWHID]][source]
async set(history: str) → None[source]