swh.indexer.codemeta module#

swh.indexer.codemeta.make_absolute_uri(local_name)[source]#

Parses codemeta.jsonld, and returns the @id of terms it defines.

>>> make_absolute_uri("name")
'http://schema.org/name'
>>> make_absolute_uri("downloadUrl")
'http://schema.org/downloadUrl'
>>> make_absolute_uri("referencePublication")
'https://codemeta.github.io/terms/referencePublication'
swh.indexer.codemeta.read_crosstable(fd: TextIO) Tuple[Set[str], Dict[str, Dict[str, URIRef]]][source]#

Given a file-like object to a CodeMeta crosswalk table (either the main cross-table with all columns, or an auxiliary table with just the CodeMeta column and one ecosystem-specific table); returns a list of all CodeMeta terms, and a dictionary {ecosystem: {ecosystem_term: codemeta_term}}

swh.indexer.codemeta.compact(doc: Dict[str, Any], forgefed: bool, resolve_unknown_context_url: bool = False) Dict[str, Any][source]#

Same as pyld.jsonld.compact, but in the context of CodeMeta.

Parameters:
  • doc – parsed codemeta.json file

  • forgefed – Whether to add ForgeFed and ActivityStreams as compact URIs. This is typically used for extrinsic metadata documents, which frequently use properties from these namespaces.

  • resolve_unknown_context_url – if const:True unknown JSON-LD context URL will be fetched using requests instead of raising an exception, False by default as it can lead sending requests to arbitrary URLs so use with caution

Returns:

A compacted JSON-LD document.

swh.indexer.codemeta.expand(doc: Dict[str, Any], resolve_unknown_context_url: bool = False) Dict[str, Any][source]#

Same as pyld.jsonld.expand, but in the context of CodeMeta.

Parameters:
  • doc – parsed codemeta.json file

  • resolve_unknown_context_url – if const:True unknown JSON-LD context URL will be fetched using requests instead of raising an exception, False by default as it can lead sending requests to arbitrary URLs so use with caution

Returns:

An expanded JSON-LD document.

swh.indexer.codemeta.merge_documents(documents)[source]#

Takes a list of metadata dicts, each generated from a different metadata file, and merges them.

Removes duplicates, if any.