swh.indexer.metadata_dictionary.base module

class swh.indexer.metadata_dictionary.base.BaseMapping(log_suffix='')[source]

Bases: object

Base class for mappings to inherit from

To implement a new mapping:

  • inherit this class

  • override translate function

abstract property name

A name of this mapping, used as an identifier in the indexer storage.

abstract classmethod detect_metadata_files(files)[source]

Detects files potentially containing metadata

Parameters

file_entries (list) – list of files

Returns

list of sha1 (possibly empty)

Return type

list

abstract translate(file_content)[source]
normalize_translation(metadata)[source]
class swh.indexer.metadata_dictionary.base.SingleFileMapping(log_suffix='')[source]

Bases: swh.indexer.metadata_dictionary.base.BaseMapping

Base class for all mappings that use a single file as input.

abstract property filename

The .json file to extract metadata from.

classmethod detect_metadata_files(file_entries)[source]

Detects files potentially containing metadata

Parameters

file_entries (list) – list of files

Returns

list of sha1 (possibly empty)

Return type

list

class swh.indexer.metadata_dictionary.base.DictMapping(log_suffix='')[source]

Bases: swh.indexer.metadata_dictionary.base.BaseMapping

Base class for mappings that take as input a file that is mostly a key-value store (eg. a shallow JSON dict).

string_fields = []

List of fields that are simple strings, and don’t need any normalization.

abstract property mapping

A translation dict to map dict keys into a canonical name.

classmethod supported_terms()[source]
class swh.indexer.metadata_dictionary.base.JsonMapping(log_suffix='')[source]

Bases: swh.indexer.metadata_dictionary.base.DictMapping, swh.indexer.metadata_dictionary.base.SingleFileMapping

Base class for all mappings that use a JSON file as input.

translate(raw_content)[source]

Translates content by parsing content from a bytestring containing json data and translating with the appropriate mapping

Parameters

raw_content (bytes) – raw content to translate

Returns

translated metadata in json-friendly form needed for the indexer

Return type

dict