Software Heritage - Storage#
Abstraction layer over the archive, allowing to access all stored source code artifacts as well as their metadata
The Software Heritage storage consist of a high-level storage layer
swh.storage) that exposes a client/server API
swh.storage.api). The API is exposed by a server
swh.storage.api.server) and accessible via a client
The low-level implementation of the storage is split between an object storage
(swh.objstorage), which stores all “blobs” (i.e., the
leaves of the Data model) and a SQL representation of the rest of the
First, note that
swh-storage is an internal API of Software Heritage, that
is only available to software running on the SWH infrastructure and developers
running their own Software Heritage.
If you want to access the Software Heritage archive without running your own,
you should use the Web API instead.
swh-storage has multiple backends, it is instantiated via the
swh.storage.get_storage() function, which takes as argument the backend type
remote, if you already have access to a running swh-storage).
It returns an instance of a class implementing
swh.storage.interface.StorageInterface; which is mostly a set of key-value
stores, one for each object type.
Many of the arguments and return types are “model objects”, ie. immutable objects
that are instances of the classes defined in
Methods returning long lists of arguments are paginated; by returning both a list
of results and an opaque token to get the next page of results.
For example, to list all the visits of an origin using
ten visits at a time, you can do:
storage = get_storage("remote", url="http://localhost:5002") while True: page = storage.origin_visit_get(origin="https://github.com/torvalds/linux") for visit in page.results: print(visit) if page.next_page_token is None: break
swh.core.api.classes.stream_results() for convenience:
storage = get_storage("remote", url="http://localhost:5002") visits = stream_results( storage.origin_visit_get, origin="https://github.com/torvalds/linux" ) for visit in visits: print(visit)