Use a dataset#

Download a dataset#

Requirements#

As most of our datasets are currently hosted and available on an Amazon S3 bucket, you will need to install either awscli or swh-datasets.

Find a dataset#

All the datasets published by Software Heritage are listed at datasets.softwareheritage.org.

Download the desired dataset#

Once you have found the dataset you want to download, check its “Download” subsection, which will provide the command to run to download the dataset (both with awscli and swh-datasets).

Example#

If you want to download the compressed graph corresponding to the entire archive as of May 18, 2025, you can:

Warning

The dataset used in the example above is 14 TB, so be sure to have enough space, time and bandwidth before trying to download it.

Advanced usage#

Once you have a dataset available, you can refer to swh-graph and swh-datasets to use it.