swh.indexer.fossology_license module#
- swh.indexer.fossology_license.compute_license(path) Dict [source]#
Determine license from file at path.
- Parameters:
path – filepath to determine the license
- Returns:
A dict with the following keys:
licenses ([str]): associated detected licenses to path
path (bytes): content filepath
- Return type:
- class swh.indexer.fossology_license.MixinFossologyLicenseIndexer(*args, **kwargs)[source]#
Bases:
object
Mixin fossology license indexer.
See
FossologyLicenseIndexer
andFossologyLicensePartitionIndexer
- idx_storage: IndexerStorageInterface#
- index(id: bytes, data: Optional[bytes] = None, **kwargs) List[ContentLicenseRow] [source]#
Index sha1s’ content and store result.
- Parameters:
- Returns:
A dict, representing a content_license, with keys:
id (bytes): content’s identifier (sha1)
license (bytes): license in bytes
path (bytes): path
indexer_configuration_id (int): tool used to compute the output
- Return type:
- class swh.indexer.fossology_license.FossologyLicenseIndexer(*args, **kwargs)[source]#
Bases:
MixinFossologyLicenseIndexer
,ContentIndexer
[ContentLicenseRow
]Indexer in charge of:
filtering out content already indexed
reading content from objstorage per the content’s id (sha1)
computing {license, encoding} from that content
store result in storage
Prepare and check that the indexer is ready to run.
- idx_storage: IndexerStorageInterface#
- class swh.indexer.fossology_license.FossologyLicensePartitionIndexer(*args, **kwargs)[source]#
Bases:
MixinFossologyLicenseIndexer
,ContentPartitionIndexer
[ContentLicenseRow
]FossologyLicense Range Indexer working on range/partition of content identifiers.
filters out the non textual content
(optionally) filters out content already indexed (cf
indexed_contents_in_partition()
)reads content from objstorage per the content’s id (sha1)
computes {mimetype, encoding} from that content
stores result in storage
Prepare and check that the indexer is ready to run.
- indexed_contents_in_partition(partition_id: int, nb_partitions: int, page_token: Optional[str] = None) Iterable[bytes] [source]#
Retrieve indexed content id within the partition id
- Parameters:
partition_id – Index of the partition to fetch
nb_partitions – Total number of partitions to split into
page_token – opaque token used for pagination
- idx_storage: IndexerStorageInterface#