Custom indexers and metadata mappings#

Indexers#

New indexers can be added by implementing a new indexer class and declaring it in the swh.indexer.classes entry point of your package’s pyproject.toml file, e.g.:

[project.entry-points."swh.indexer.classes"]
"my_indexer" = "swh.mypkg.indexer:MyIndexer"

An indexer class should inherit from the swh.indexer.indexer.BaseIndexer class, and usually more specifically inherit from either swh.indexer.indexer.ContentIndexer, swh.indexer.indexer.OriginIndexer or swh.indexer.indexer.DirectoryIndexer.

For metadata indexer, you should probably only need to implement a Metadata mapping (see below).

See Metadata workflow for a better view of the metadata handling architecture.

Metadata mappings#

Metadata indexers use mappings to convert a given source metadata format to the internal metadata format, JSON-LD with Codemeta and ForgeFed vocabularies.

A metadata mapping is a class inheriting from either swh.indexer.metadata_mapping.base.BaseExtrinsicMapping or swh.indexer.metadata_mapping.base.BaseIntrinsicMapping.

Each mapping class should be declared in the swh.indexer.metadata_mappings entry point group, in the pyproject.toml package file, e.g.:

[project.entry-points."swh.indexer.metadata_mappings"]
"MyMapping" = "swh.mypkg.mymapping:MyMapping"

Intrinsic mappings#

Intrinsic mappings are used by an intrinsic metadata indexer (currently the Origin-Head Indexer, the Directory and Content Metadata Indexers or the Origin Metadata Indexer). Adding intrinsic metadata mappings allows these indexers to handle more metadata formats.

An intrinsic mapping is a class inheriting from swh.indexer.metadata_mapping.base.BaseIntrinsicMapping and implementing at least the following methods:

  • swh.indexer.metadata_mapping.base.BaseIntrinsicMapping.detect_metadata_file(file_entries)() this is a class method used to filter files this mapping can handle.

  • swh.indexer.metadata_mapping.base.BaseIntrinsicMapping.translate(raw_content)() the actual method doing the mapping from the original format to known format (Codemeta).

Extrinsic mappings#

Extrinsic mappings are used to convert extrinsic metadata, like forge project metadata. Adding extrinsic metadata mappings allows the extrinsic metadata indexer to handle more extrinsic metadata formats.

An extrinsic mapping is a class inheriting from swh.indexer.metadata_mapping.base.BaseExtrinsicMapping and implementing at least the following methods: