Dataset

We provide the full graph dataset along with two “teaser” datasets that can be used for trying out smaller-scale experiments before using the full graph.

All the main URLs are relative to our dataset prefix: https://annex.softwareheritage.org/public/dataset/.

The Software Heritage Graph Dataset contains a table representation of the full Software Heritage Graph. It is available in the following formats:

Teaser datasets

If the above dataset is too big, we also provide the following “teaser” datasets that can get you started and have a smaller size fingerprint.