Software Heritage - Indexer#

Tools to compute multiple indexes on SWH’s raw contents:

An indexer is in charge of:

There are multiple indexers working on different object types:

content indexer: works with content sha1 hashes

revision indexer: works with revision sha1 hashes

origin indexer: works with origin identifiers

Indexation procedure:

Current content indexers:

mimetype (queue swh_indexer_content_mimetype): detect the encoding and mimetype
fossology-license (queue swh_indexer_fossology_license): compute the license
metadata: translate file from an ecosystem-specific formats to JSON-LD (using schema.org/CodeMeta vocabulary)

Current origin indexers:

metadata: translate file from an ecosystem-specific formats to JSON-LD (using schema.org/CodeMeta and ForgeFed vocabularies)