swh.indexer.mimetype module#

swh.indexer.mimetype.compute_mimetype_encoding(raw_content: bytes) → Dict[str, str][source]#

Determine mimetype and encoding from the raw content.

class swh.indexer.mimetype.MixinMimetypeIndexer(*args, **kwargs)[source]#

Mixin mimetype indexer.

index(id: ObjId, data: bytes | None = None, **kwargs) → List[ContentMimetypeRow][source]#

Index sha1s’ content and store result.

Parameters:

Returns:

content’s mimetype; dict keys being

Return type:

dict

persist_index_computations(results: List[ContentMimetypeRow]) → Dict[str, int][source]#

Persist the results in storage.

class swh.indexer.mimetype.MimetypeIndexer(*args, **kwargs)[source]#

Mimetype Indexer working on list of content identifiers.

It:

Prepare and check that the indexer is ready to run.

filter(ids: List[ObjId])[source]#: Filter out known sha1s and return only missing ones.