swh.indexer.metadata_dictionary.codemeta module#

class swh.indexer.metadata_dictionary.codemeta.CodemetaMapping(log_suffix='')[source]#

Bases: SingleFileIntrinsicMapping

dedicated class for CodeMeta (codemeta.json) mapping and translation

name = 'codemeta'#
filename: bytes | Pattern[bytes] = b'codemeta.json'#
string_fields = None#
classmethod supported_terms() List[str][source]#
translate(content: bytes) Dict[str, Any] | None[source]#

Translates content by parsing content from a bytestring containing mapping-specific data and translating with the appropriate mapping to JSON-LD using the Codemeta and ForgeFed vocabularies.

Parameters:

raw_content – raw content to translate

Returns:

translated metadata in JSON friendly form needed for the content if parseable, None otherwise.

class swh.indexer.metadata_dictionary.codemeta.SwordCodemetaMapping(log_suffix='')[source]#

Bases: BaseExtrinsicMapping

dedicated class for mapping and translation from JSON-LD statements embedded in SWORD documents, optionally using Codemeta contexts, as described in the Protocol reference.

name = 'sword-codemeta'#
classmethod extrinsic_metadata_formats() Tuple[str, ...][source]#

Returns the list of extrinsic metadata formats which can be translated by this mapping

classmethod supported_terms() List[str][source]#
xml_to_jsonld(e: Element) str | Dict[str, Any][source]#
translate(content: bytes) Dict[str, Any] | None[source]#

Translates content by parsing content from a bytestring containing mapping-specific data and translating with the appropriate mapping to JSON-LD using the Codemeta and ForgeFed vocabularies.

Parameters:

raw_content – raw content to translate

Returns:

translated metadata in JSON friendly form needed for the content if parseable, None otherwise.

normalize_translation(metadata: Dict[str, Any]) Dict[str, Any][source]#
swh.indexer.metadata_dictionary.codemeta.iter_keys(d)[source]#

Recursively iterates on dictionary keys

class swh.indexer.metadata_dictionary.codemeta.JsonSwordCodemetaMapping(log_suffix='')[source]#

Bases: SwordCodemetaMapping

Variant of SwordCodemetaMapping that reads the legacy sword-v2-atom-codemeta-v2-in-json format and converts it back to sword-v2-atom-codemeta-v2 XML

name = 'json-sword-codemeta'#
classmethod extrinsic_metadata_formats() Tuple[str, ...][source]#

Returns the list of extrinsic metadata formats which can be translated by this mapping

translate(content: bytes) Dict[str, Any] | None[source]#

Translates content by parsing content from a bytestring containing mapping-specific data and translating with the appropriate mapping to JSON-LD using the Codemeta and ForgeFed vocabularies.

Parameters:

raw_content – raw content to translate

Returns:

translated metadata in JSON friendly form needed for the content if parseable, None otherwise.

swh.indexer.metadata_dictionary.codemeta.load_and_compact_notification(content: bytes | str) dict[str, Any] | None[source]#

Load and compact a notification from the REMS.

Errors logs will be written if something went wrong in the process.

Parameters:

content – the expanded COAR Notification

Returns:

The compacted form of the COAR Notification or None if we weren’t able to read it

swh.indexer.metadata_dictionary.codemeta.validate_mention(notification: dict[str, Any]) bool[source]#

Validate minimal notification’s requirements before indexation.

Parameters:

notification – a compact form of a COAR Notification

Returns:

False if the we can’t find required props in the notification

class swh.indexer.metadata_dictionary.codemeta.CoarNotifyMentionMapping(log_suffix='')[source]#

Bases: BaseExtrinsicMapping

Map & translate a COAR Notify software mention in a CodeMeta format.

COAR Notify mentions are received by swh-coarnotify and saved expanded. Mentions contains metadata on a scientific paper that cites a software.

name = 'coarnotify-mention-codemeta'#
classmethod supported_terms() list[str][source]#
classmethod extrinsic_metadata_formats() tuple[str, ...][source]#

Returns the list of extrinsic metadata formats which can be translated by this mapping

translate(content: bytes) dict[str, Any] | None[source]#

Parse JSON and compact the payload to access the mention.

The whole context of the AnnounceRelationship notification will be indexed as it contains metadata about the scientific paper citing the software.

TODO: At some point we might need to fetch metadata from the paper URL as COAR Notifications are not made to contain all the metadata but to indicate where we should find them.

TODO: We will need to handle cancellations of a mention if it was made by mistake. Maybe we could use the original notification id and an empty context to overwrite the previous citation when merging documents ? It is with this in mind that the notification ID is added to the citation.

Parameters:

content – the raw expanded COAR Notification

Returns:

A CodeMeta citation if the notification was valid or None