swh.indexer.metadata_dictionary.python module#

class swh.indexer.metadata_dictionary.python.LinebreakPreservingEmailPolicy(**kw)[source]#

Bases: EmailPolicy

Create new Policy, possibly overriding some defaults.

See class docstring for a list of overridable attributes.

header_fetch_parse(name, value)[source]#

Given the header name and the value from the model, return the value to be returned to the application program that is requesting that header. The value passed in by the email package may contain surrogateescaped binary data if the lines were parsed by a BytesParser. The returned value should not contain any surrogateescaped data.

If the value has a ‘name’ attribute, it is returned to unmodified. Otherwise the name and the value with any linesep characters removed are passed to the header_factory method, and the resulting custom header object is returned. Any surrogateescaped bytes get turned into the unicode unknown-character glyph.

class swh.indexer.metadata_dictionary.python.PythonPkginfoMapping(log_suffix='')[source]#

Bases: DictMapping, SingleFileIntrinsicMapping

Dedicated class for Python’s PKG-INFO mapping and translation.

https://www.python.org/dev/peps/pep-0314/

name = 'pkg-info'#
filename: bytes | Pattern[bytes] = b'PKG-INFO'#
mapping = {'author': rdflib.term.URIRef('http://schema.org/author'), 'author-email': rdflib.term.URIRef('http://schema.org/email'), 'description': rdflib.term.URIRef('http://schema.org/description'), 'download-url': rdflib.term.URIRef('http://schema.org/downloadUrl'), 'home-page': rdflib.term.URIRef('http://schema.org/url'), 'keywords': rdflib.term.URIRef('http://schema.org/keywords'), 'license': rdflib.term.URIRef('http://schema.org/license'), 'name': rdflib.term.URIRef('http://schema.org/name'), 'summary': rdflib.term.URIRef('http://schema.org/description'), 'version': rdflib.term.URIRef('http://schema.org/version')}#
string_fields: List[str] = ['name', 'version', 'description', 'summary', 'author', 'author-email']#

List of fields that are simple strings, and don’t need any normalization.

translate(content)[source]#

Translates content by parsing content from a bytestring containing mapping-specific data and translating with the appropriate mapping to JSON-LD using the Codemeta and ForgeFed vocabularies.

Parameters:

raw_content – raw content to translate

Returns:

translated metadata in JSON friendly form needed for the content if parseable, None otherwise.

extra_translation(graph, root, d)[source]#

Called at the end of the translation process, and may add arbitrary triples to graph based on the input dictionary (passed as d).

normalize_home_page(urls)[source]#
normalize_keywords(keywords)[source]#
normalize_license(licenses)[source]#