Using swh-alter#

Services provided by this component are available through the swh alter command line tool.

The key feature is available through the remove sub-command and allow removal of data from the archive. Before this happens, a recovery bundle will be created that allows reverting the operation.

Because of their potential sensitivity, data in recovery bundles is stored encrypted. The system is designed so that extracting content or restoring from a recovery bundle will require a pre-determined set of stakeholders to get together before proceeding.

Dependencies#

swh-alter requires the rage, rage-keygen and optionally age-plugin-yubikey commands to be available in the PATH.

See their respective documentation on how to install them:

age-plugin-yubikey also requires the pcscd service to be installed and running. On Debian systems, the service is available from the package with the same name.

Configuration#

The tools will not work without a configuration file. It can be created as ~/.config/swh/alter.yml containing for example:

storage:
  cls: remote
  url: https://storage-cassandra-ro.softwareheritage.org

graph:
  url: "http://granet.internal.softwareheritage.org:5009/graph"

restoration_storage:
  cls: remote
  url: https://storage-rw.softwareheritage.org

removal_searches:
  main:
    cls: elasticsearch
    hosts:
     - elasticsearch:9200

removal_storages:
  old_primary:
    cls: postgresql
    db: "service=swh"
  new_primary:
    cls: cassandra
    hosts:
    - cassandra-seed
    keyspace: swh

removal_objstorages:
  main:
    cls: remote
    url: https://objstorage.softwareheritage.org
  azure:
    cls: azure-prefixed
    accounts:
      "0":
        account_name: testswh0
        api_secret_key: supersecret
        container_name: contents

removal_journals:
  main_journal:
    cls: kafka
    brokers:
    - kafka1.internal.softwareheritage.org
    prefix: swh.journal.objects
    client_id: swh.alter.removals

recovery_bundles:
  secret_sharing:
    minimum_required_groups: 2
    groups:
      legal:
        minimum_required_shares: 1
        recipient_keys:
            DPO: age169k6jwg7e2jqjzzsfvqh5v06h56tkss9fx5vmp8xr400272zjq5qux74m5
            CLO: age1gdar6q9spzz5d3lul5ng5sf30xt7r2htsx8n5espl0pun6wvv4yqjapdma
      sysadmins:
        minimum_required_shares: 2
        recipient_keys:
            YubiKey serial 4245067 slot 1: |-
              age1yubikey1q0ucnwg558zcwrc752evk3620q2t4mkwz6a0lq9u3clsfmealsmlz330kz2
            YubiKey serial 5229836 slot 1: |-
              age1yubikey1qt2p377vq6qg58l8gaframp9yggvsysddraa72aehma5mw623r8rqk0mlgu
            YubiKey serial 5254231 slot 2: |-
              age1yubikey1q0ucnwg558zcwrc752evk3620q2t4mkwz6a0lq9u3clsfmealsmlz330kz2

See the configuration reference for general information about the Software Heritage configuration file. storage, restoration_storage and entries in the removal_storages map uses the storage configuration. For graph, see the graph section. The entries in the removal_searches map are following the format defined by swh-search. The entries in the removal_objstorages map are used by swh-objstorage. Finally the entries in the removal_journals map follow the journal format.

In most cases, multiple storages have to be configured:

  • The storage section defines the storage from which information will be read. It is used to determine which objects can be removed from the archive and create recovery bundles. For the latter, it needs to be able to retrieve data from Content objects (through an objstorage).

  • The restoration_storage section defines the storage which will be written to in case recovery bundles need to be restored. Usually, this should be the same configuration as used for loaders. Write access is required. For the restoration to fully work, it also needs to be configured to write to an objstorage and a journal.

  • removal_storages contains storages (identified by an arbitrary key) from which objects will be removed (when using swh alter remove).

Likewise, removal_objstorages and removal_journals defines objstorages and journals from which data and messages will be removed by swh alter remove.

The graph section is used to determine which objects can be safely removed from the archive.

In addition, the organization of the secret sharing process needs to be defined in secret_sharing.

Note

The example above requires people from two groups to decrypt recovery bundles: the legal team and the system administration team. For the legal team, either the Data Protection Officer or the Chief Legal Officer will need to provide an identity file with their secret key. For system administrators, at least two of the specified YubiKeys will need to be present.

In the groups section, each group is keyed with an arbitrary identifier. In each group: - recipient_keys associate an identifier for the holder and an age public key - minimum_requred_shares set the threshold of holders required for this group.

The minimum amount of valid groups that are required to recover the decryption key is set in minimum_required_groups.

age public key can be created using the age-keygen or rage-keygen command (depending on your implementation), or by calling age-plugin-yubikey to store the private key on a YubiKey.

When using YubiKeys, the secret holder identifier needs to be specified in the form YubiKey serial ####### slot #. The required numbers are visible in the identity file created by age-plugin-yubikey or by running age-plugin-yubikey --list after plugging in the YubiKeys.

Hint

When using YubiKeys, swh alter does not need any external files to be stored on the system. Connecting the right YubiKey is all that is required.

Otherwise, the age secret key will need to be provided manually as an identity file. Such files should be stored with care. Being 74 characters long, age secret keys are fairly easy to archive on paper.

Removing objects from the archive#

swh alter remove will remove a given set of origins, and any objects they reference (as long as it not referenced elsewhere), from the archive.

$ export SWH_CONFIG_FILENAME=~/config/swh.alter.yml
$ swh alter remove \
      --identifier "takedown-notice-2023-07-14-01" \
      --recovery-bundle tdn-2023-07-14-01.swh-recovery-bundle \
      https://gitlab.softwareheritage.org/swh/devel/swh-alter.git \
      https://gitlab.softwareheritage.org/swh/devel/swh-py-template.git

Objects will be removed from entries in removal_searches, removal_storages, removal_journals, removal_objstorages defined in the configuration.

If during the removal process a reference is added to one of the removed objects, the process will be rolled back: the recovery bundle will be used to restore objects as they were to restoration_storage. This will also be the case if any error happens during the process. The recovery bundle will be left intact. The process can be retried using swh alter recovery-bundle resume-removal command, using the decryption key printed on the output for this purpose.

Options:

--dry-run

Get a list of objects that would be removed and exit.

--identifier IDENTIFIER (required)

An arbitrary identifier for this removal operation. Stored in recovery bundles.

--recovery-bundle PATH (required)

Location of the recovery bundle that will be created before removing objects from the archive.

--reason REASON

Reason for this removal operation.

--expire YYYY-MM-DD

Date when the recovery bundle should be removed.

Resuming a removal from a recovery bundle#

swh alter recovery-bundle resume-removal will remove from the archive all objects contained in a recovery bundle. This can be useful after using swh alter remove --dry-run=stop-before-removal or in case of failures from external resources during the removal operation.

$ swh alter recovery-bundle resume-removal tdn-2023-07-14-01.swh-recovery-bundle

A prompt will ask for the decryption key if it has not been specified via the relevant environment variable or option.

Options:

--decryption-key AGE_SECRET_KEY

Use the given decryption key to access the objects stored in the bundle. The environment variable SWH_BUNDLE_DECRYPTION_KEY can be used instead.

Restoring from a recovery bundle#

swh alter recovery-bundle restore will restore all objects contained in a recovery bundle to the storage defined in restoration_storage. In order to proceed, this command requires enough shared secrets to be recovered. Alternatively, the bundle decryption key can be provided.

This command also requires the appropriate permissions needed to update Software Heritage storage, journal and object storage.

$ swh alter recovery-bundle restore tdn-2023-07-14-01.swh-recovery-bundle

Options:

--decryption-key AGE_SECRET_KEY

Use the given decryption key instead of the bundle shared secrets (see Operating recovery bundles remotely).

--secret MNEMONIC

Known shared secret. May be repeated.

--identity IDENTITY

Path to an age identity file holding a secret key. May be repeated.

Getting information from a recovery bundle#

swh alter recovery-bundle info will output information on a given recovery bundle.

This will display the identifier provided during the removal operation, the date of creation, reason for the removal, expiration date, the identifier of the secret share holders, and the SWHIDs of stored objects.

$ swh alter recovery-bundle info tdn-2023-07-14-01.swh-recovery-bundle

Options:

--dump-manifest

Show raw manifest in YAML format.

--show-encrypted-secrets

Show encrypted secrets for each share holder. This allows for out of band recovery of the shared secret by providing the encrypted payload to the secret holder (see also Operating recovery bundles remotely).

Extracting content stored in a recovery bundle#

swh alter recovery-bundle extract-content will extract data from a Content object stored in a recovery bundle. In order to proceed, this command requires enough shared secrets to be recovered. Alternatively, the bundle decryption key can be provided.

See Getting information from a recovery bundle on how to get a list of objects stored in recovery bundle.

$ swh alter recovery-bundle extract-content \
      --output requirements.txt \
      tdn-2023-07-14-01.swh-recovery-bundle \
      swh:1:cnt:3d65be4c62d36aac611260b47555ac9d51cd5515

Options:

--output PATH (required)

Path of the file that will be written with the extracted content. - can be used to print the content to the standard output.

--decryption-key AGE_SECRET_KEY

Use the given decryption key instead of the bundle shared secrets (see Operating recovery bundles remotely).

--secret MNEMONIC

Known shared secret. May be repeated.

--identity IDENTITY

Path to an age identity file holding a secret key. May be repeated.

Operating recovery bundles remotely#

Operations that require to decrypt objects from recovery bundle all offer a --decryption-key option. It can be used to directly provide the age secret key that decrypts objects contained in the bundle.

This option enables remote operations. In the case not all secret share holders can physically work on the same computer, or if the system having the right permission to update the Software Heritage archive is only available remotely, this decryption key can first be recovered in one or more separate steps.

swh alter recovery-bundle recover-decryption-key will help to recover the secret key protected by the shared secrets. It supports several situations:

  • If all secret share holders can work on the same computer, then the decryption key can be recovered directly:

    $ swh alter recovery-bundle recover-decryption-key \
        --identity age-identity-dpo.txt \
        tdn-2023-07-14-01.swh-recovery-bundle
    
    🚸 The following secret shares will not be decrypted: CFO
    
    🔐 Please insert YubiKey serial 4245067 slot 1, YubiKey serial 5229836 slot 1
    or YubiKey serial 5254231 slot 2 and press Enter…
    
    🔧 Decrypting share using YubiKey serial 4245067 slot 1…
    💭 You might need to tap the right YubiKey when it blinks.
    
    🔧 Decrypting share using YubiKey serial 5254231 slot 2…
    💭 You might need to tap the right YubiKey when it blinks.
    
    🔓 Recovered decryption key:
    AGE-SECRET-KEY-15PQHAGKV59TFK9TCCWLQZZ7XVV0FADVX5TSCDWVZSEWZ4L2SMARSJAAR0W
    
  • If secret share holders are distributed, they will first need to separately recover their shared secret. For example, for the example configuration given above, the DPO would run:

    $ swh alter recovery-bundle recover-decryption-key \
        --show-recovered-secrets \
        --identity age-identity-dpo.txt \
        tdn-2023-07-14-01.swh-recovery-bundle
    
    🔑 Recovered shared secret from DPO:
    [takedown-notice-2023-07-14-01] union echo acrobat easy actress desert decrease
    surprise armed force river insect pencil debut unhappy desktop lungs viral
    sister client ocean wisdom friar year formal knit mild endless breathe benefit
    obesity kidney decrease
    
    🚸 The following secret shares will not be decrypted: CFO
    
    🔐 Please insert YubiKey serial 4245067 slot 1, YubiKey serial 5229836 slot 1
    or YubiKey serial 5254231 slot 2 and press Enter…
    
    [Ctrl+C]
    

    It is also possible to decrypt the secret without requiring swh-alter. One can retrieve the encrypted payload of a shared secret holder by running:

    $ swh alter recovery-bundle info \
        --show-encrypted-secrets \
        tdn-2023-07-14-01.swh-recovery-bundle
    […]
    - DPO
    -----BEGIN AGE ENCRYPTED FILE-----
    YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBDNkRoR1FtSnNaRENpWTlP
    […]
    -----END AGE ENCRYPTED FILE-----
    

    After receiving the encrypted payload, the DPO can then the following command on their own computer to recover their secret:

    $ rage --decrypt --identity age-identity-dpo.txt
    -----BEGIN AGE ENCRYPTED FILE-----
    YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBDNkRoR1FtSnNaRENpWTlP
    […]
    -----END AGE ENCRYPTED FILE-----
    [Ctrl+D]
    [takedown-notice-2023-07-14-01] union echo acrobat easy actress desert decrease
    surprise armed force river insect pencil debut unhappy desktop lungs viral
    sister client ocean wisdom friar year formal knit mild endless breathe benefit
    obesity kidney decrease
    

    The legal group only requires one secret, so this is enough. Meanwhile, two system administrators use their YubiKeys to recover the required amount of secrets for their group:

    $ swh alter recovery-bundle recover-decryption-key \
        --show-recovered-secrets \
        tdn-2023-07-14-01.swh-recovery-bundle
    
    🚸 The following secret shares will not be decrypted: DPO, CFO
    
    🔐 Please insert YubiKey serial 4245067 slot 1, YubiKey serial 5229836 slot 1
    or YubiKey serial 5254231 slot 2 and press Enter…
    
    🔧 Decrypting share using YubiKey serial 4245067 slot 1…
    💭 You might need to tap the right YubiKey when it blinks.
    🔑 Recovered shared secret from YubiKey serial 4245067 slot 1:
    union echo beard entrance alien photo cage mailman cleanup society petition
    craft script snapshot that step estate watch detailed dryer cause hanger
    deploy calcium idea sack venture bundle training famous endorse permit crowd
    
    🔧 Decrypting share using YubiKey serial 5254231 slot 2…
    💭 You might need to tap the right YubiKey when it blinks.
    🔑 Recovered shared secret from YubiKey serial 5254231 slot 2:
    union echo beard email anatomy install leader coal window pencil depict either
    kitchen decorate cylinder auction expect beam alien sympathy image failure diminish
    impact round bike mayor ting painting often zero manual enforce
    
    🔐 Please insert YubiKey serial 5229836 slot 1 and press Enter…
    
    [Ctrl+C]
    

    The decryption key can then be recovered by providing these secrets:

    $ swh alter recovery-bundle recover-decryption-key \
        --secret "union echo acrobat easy […] crowd" \
        --secret "union echo beard entrance […] crowd" \
        --secret "union echo beard email […] enforce" \
        tdn-2023-07-14-01.swh-recovery-bundle
    
    🚸 The following secret shares will not be decrypted: DPO, CFO
    
    🔓 Recovered decryption key:
    AGE-SECRET-KEY-15PQHAGKV59TFK9TCCWLQZZ7XVV0FADVX5TSCDWVZSEWZ4L2SMARSJAAR0W
    

    Note

    The shared secrets should be 33 words long. They have been elided here for clarity. All shared secrets should have the same first two words. All shared secrets from a given group should also have same first third word.

It is possible to both provide shared secrets on the command line and use identity files or YubiKeys for the missing ones. This applies to all commands needing data stored in a bundle. For example:

$ swh alter recovery-bundle recover-decryption-key \
      --secret "union echo beard entrance […] crowd" \
      --secret "union echo beard email […] enforce" \
      --identity age-identity-dpo.txt \
      tdn-2023-07-14-01.swh-recovery-bundle

Options for swh alter recovery-bundle recover-decryption-key:

--secret MNEMONIC

Known shared secret. May be repeated.

--identity IDENTITY

Path to an age identity file holding a secret key. May be repeated.

--show-recovered-secrets

Show recovered shared secrets from YubiKeys are identity files.

Shared secrets rollover#

swh alter recovery-bundle rollover enables to switch existing recovery bundle to a new secret sharing configuration. First, configure the new organization. Then, the command can be used as such:

$ swh alter recovery-bundle rollover \
      tdn-2023-07-14-01.swh-recovery-bundle \
      tdn-2023-08-15-01.swh-recovery-bundle

In order to proceed, this command requires enough shared secrets to be recovered. Alternatively, when operating on a single bundle, the decryption key can be provided. A confirmation will be required before proceeding as the recovery bundles are updated in place.

Options:

--decryption-key AGE_SECRET_KEY

Use the given decryption key instead of the bundle shared secrets (see Operating recovery bundles remotely). If used, only one recovery bundle should be provided at time.

--secret MNEMONIC

Known shared secret. May be repeated.

--identity IDENTITY

Path to an age identity file holding a secret key. May be repeated.