swh.storage.migrate_extrinsic_metadata module#

This is an executable script to migrate extrinsic revision metadata from the revision table to the new extrinsic metadata storage.

This is designed to be as conservative as possible, following this principle: for each revision the script reads (in “handle_row”), it will read some of the fields, write them directly to the metadata storage, and remove them. Then it checks all the remaining fields are in a hardcoded list of fields that are known not to require migration.

This means that every field that isn’t migrated was explicitly reviewed while writing this script.

Additionally, this script contains many assertions to prevent false positives in its heuristics.

swh.storage.migrate_extrinsic_metadata.pypi_project_from_filename(filename)[source]#
swh.storage.migrate_extrinsic_metadata.pypi_origin_from_project_name(project_name: str) str[source]#
swh.storage.migrate_extrinsic_metadata.pypi_origin_from_filename(storage, rev_id: bytes, filename: str) str | None[source]#
swh.storage.migrate_extrinsic_metadata.cran_package_from_url(filename)[source]#
swh.storage.migrate_extrinsic_metadata.npm_package_from_source_url(package_source_url)[source]#
swh.storage.migrate_extrinsic_metadata.remove_atom_codemeta_metadata_with_xmlns(metadata)[source]#

Removes all known Atom and Codemeta metadata fields from the dict, assuming this is a dict generated by xmltodict without expanding namespaces.

swh.storage.migrate_extrinsic_metadata.remove_atom_codemeta_metadata_without_xmlns(metadata)[source]#

Removes all known Atom and Codemeta metadata fields from the dict, assuming this is a dict generated by xmltodict with expanded namespaces.

swh.storage.migrate_extrinsic_metadata.debian_origins_from_row(row, storage)[source]#

Guesses a Debian origin from a row. May return an empty list if it cannot reliably guess it, but all results are guaranteed to be correct.

swh.storage.migrate_extrinsic_metadata.assert_origin_exists(storage, origin)[source]#
swh.storage.migrate_extrinsic_metadata.check_origin_exists(storage, origin)[source]#
swh.storage.migrate_extrinsic_metadata.load_metadata(storage, revision_id, directory_id, discovery_date: datetime, metadata: Dict[str, Any], format: str, authority: MetadataAuthority, origin: str | None, dry_run: bool)[source]#

Does the actual loading to swh-storage.

swh.storage.migrate_extrinsic_metadata.handle_deposit_row(row, discovery_date: datetime | None, origin, storage, deposit_cur, dry_run: bool)[source]#

Loads metadata from the deposit database (which is more reliable as the metadata on the revision object, as some versions of the deposit loader were a bit lossy; and they used very different format for the field in the revision table).

swh.storage.migrate_extrinsic_metadata.handle_row(row: Dict[str, Any], storage, deposit_cur, dry_run: bool)[source]#
swh.storage.migrate_extrinsic_metadata.create_fetchers(db)[source]#
swh.storage.migrate_extrinsic_metadata.iter_revision_rows(storage_dbconn: str, first_id: bytes)[source]#
swh.storage.migrate_extrinsic_metadata.main(storage_dbconn, storage_url, deposit_dbconn, first_id, limit, dry_run)[source]#