Custom indexers and metadata mappings#
Indexers#
New indexers can be added by implementing a new indexer class and declaring it
in the swh.indexer.classes entry point of your package’s pyproject.toml
file, e.g.:
[project.entry-points."swh.indexer.classes"]
"my_indexer" = "swh.mypkg.indexer:MyIndexer"
An indexer class should inherit from the
swh.indexer.indexer.BaseIndexer class, and usually more
specifically inherit from either
swh.indexer.indexer.ContentIndexer,
swh.indexer.indexer.OriginIndexer or
swh.indexer.indexer.DirectoryIndexer.
For metadata indexer, you should probably only need to implement a Metadata mapping (see below).
See Metadata workflow for a better view of the metadata handling architecture.
Metadata mappings#
Metadata indexers use mappings to convert a given source metadata format to the internal metadata format, JSON-LD with Codemeta and ForgeFed vocabularies.
A metadata mapping is a class inheriting from either
swh.indexer.metadata_mapping.base.BaseExtrinsicMapping or
swh.indexer.metadata_mapping.base.BaseIntrinsicMapping.
Each mapping class should be declared in the swh.indexer.metadata_mappings
entry point group, in the pyproject.toml package file, e.g.:
[project.entry-points."swh.indexer.metadata_mappings"]
"MyMapping" = "swh.mypkg.mymapping:MyMapping"
Intrinsic mappings#
Intrinsic mappings are used by an intrinsic metadata indexer (currently the Origin-Head Indexer, the Directory and Content Metadata Indexers or the Origin Metadata Indexer). Adding intrinsic metadata mappings allows these indexers to handle more metadata formats.
An intrinsic mapping is a class inheriting from
swh.indexer.metadata_mapping.base.BaseIntrinsicMapping and
implementing at least the following methods:
swh.indexer.metadata_mapping.base.BaseIntrinsicMapping.detect_metadata_file(file_entries)()this is a class method used to filter files this mapping can handle.swh.indexer.metadata_mapping.base.BaseIntrinsicMapping.translate(raw_content)()the actual method doing the mapping from the original format to known format (Codemeta).
Extrinsic mappings#
Extrinsic mappings are used to convert extrinsic metadata, like forge project metadata. Adding extrinsic metadata mappings allows the extrinsic metadata indexer to handle more extrinsic metadata formats.
An extrinsic mapping is a class inheriting from
swh.indexer.metadata_mapping.base.BaseExtrinsicMapping
and implementing at least the following methods:
swh.indexer.metadata_mapping.base.BaseExtrinsicMapping.extrinsic_metadata_formats()this is a class method returning extrinsic metadata formats supported by this class.swh.indexer.metadata_mapping.base.BaseIntrinsicMapping.translate(raw_content)()the actual method doing the mapping from the original format to known format (Codemeta).