swh.indexer.mimetype module#
- swh.indexer.mimetype.compute_mimetype_encoding(raw_content: bytes) Dict[str, str] [source]#
Determine mimetype and encoding from the raw content.
- Parameters:
raw_content – content’s raw data
- Returns:
mimetype and encoding key and corresponding values.
- Return type:
- class swh.indexer.mimetype.MixinMimetypeIndexer(*args, **kwargs)[source]#
Bases:
object
Mixin mimetype indexer.
See
MimetypeIndexer
- idx_storage: IndexerStorageInterface#
- index(id: CompositeObjId, data: bytes | None = None, **kwargs) List[ContentMimetypeRow] [source]#
Index sha1s’ content and store result.
- Parameters:
id – content’s identifier
data – raw content in bytes
- Returns:
content’s mimetype; dict keys being
id: content’s identifier (sha1)
mimetype: mimetype in bytes
encoding: encoding in bytes
- Return type:
- class swh.indexer.mimetype.MimetypeIndexer(*args, **kwargs)[source]#
Bases:
MixinMimetypeIndexer
,ContentIndexer
[ContentMimetypeRow
]Mimetype Indexer working on list of content identifiers.
It:
(optionally) filters out content already indexed (cf.
filter()
)reads content from objstorage per the content’s id (sha1)
computes {mimetype, encoding} from that content
stores result in storage
Prepare and check that the indexer is ready to run.
- filter(ids: List[CompositeObjId])[source]#
Filter out known sha1s and return only missing ones.