Skip to main content
Ctrl+K
Logo image Logo image

Site Navigation

  • Development
  • API reference
  • Usage
  • Infrastructure

Site Navigation

  • Development
  • API reference
  • Usage
  • Infrastructure

Section Navigation

  • swh.auth
    • Command-line interface
    • Django components
  • swh.core
    • Command-line interface
    • Common database utilities
  • swh.counters
  • swh.dataset
    • Software Heritage Graph Dataset
      • Dataset
      • Relational schema
      • Setup on Amazon Athena
      • Setup on Azure Databricks
    • Exporting a dataset
    • Exporting a subdataset
  • swh.deposit
    • Deposit API
      • User Manual
      • API Documentation
        • Service document
        • Create deposit
        • Update content
        • Update metadata
        • Retrieve status
        • Display content
      • Deposit metadata
      • Use cases
      • Register account
    • Deposit internals
      • Running swh-deposit locally
      • Production deployment
      • Authentication
      • Loading workflow
    • Specifications
      • Loading specification
      • Protocol reference
      • The metadata-only deposit
    • Command-line interface
  • swh.fuse
    • Command-line interface
    • Configuration
    • Design notes
    • Tutorial
  • swh.graph
    • Quickstart
    • Graph Querying HTTP API
    • Using the GRPC API
    • Using the Java API
    • Memory & Performance tuning
    • Graph compression
    • Command-line interface
    • Docker environment
    • git2graph
  • swh.graphql
  • swh.indexer
    • swh-indexer
    • Hacking on swh-indexer
    • Metadata workflow
    • Command-line interface
  • swh.journal
    • Software Heritage Journal clients
  • swh.lister
    • Tutorial: list the content of your favorite forge in just a few steps
    • Tutorial: run a lister within docker-dev in just a few steps
    • Save a forge
    • Command-line interface
  • swh.loader
    • swh.loader.core
      • VCS Loader Overview
      • Package Loader Tutorial
      • Package loader specifications
      • Command-line interface
    • swh.loader.bzr
      • Software Heritage - How Bazaar/Breezy works
    • Software Heritage - CVS loader
    • swh.loader.git
    • swh.loader.mercurial
    • swh.loader.metadata
    • swh.loader.svn
  • swh.model
    • Data model
    • SoftWare Heritage persistent IDentifiers (SWHIDs)
    • Command-line interface
  • swh.objstorage
    • Command-line interface
    • Winery backend
  • swh.objstorage.replayer
    • Command-line interface
  • swh.perfecthash
    • Benchmarks
    • Read Shard format
  • swh.scanner
    • Command-line interface
  • swh.scheduler
    • Command-line interface
    • Software Heritage Scheduler Simulator
  • swh.scrubber
  • swh.search
    • Command-line interface
    • Search Query Language
  • swh.storage
    • Command-line interface
  • swh.vault
    • Getting started
    • Vault API Reference
    • Command-line interface
  • swh.web
    • Developers Information
    • swh-web API URLs
    • URI scheme for swh-web Browse application
    • URI scheme for SoftWare Heritage IDentifiers (SWHIDs)
    • Miscellaneous URLs
  • swh.web.client

Software Heritage Datasets#

This page lists the different public datasets and periodic data dumps of the archive produced and released by Software Heritage.

The Software Heritage Graph Dataset

the entire graph of Software Heritage in a fully-deduplicated Merkle DAG representation.

Contents:

  • Software Heritage Graph Dataset
    • Dataset
    • Relational schema
    • Setup on Amazon Athena
    • Setup on Azure Databricks
  • Exporting a dataset
  • Exporting a subdataset

previous

swh-counters

next

Software Heritage Graph Dataset

Edit this page
Show Source

© Copyright 2015-2023 The Software Heritage developers.

Built with the PyData Sphinx Theme 0.12.0.

Created using Sphinx 5.3.0.