Using swh-alter#
Services provided by this component are available through the swh alter
command
line tool.
The key feature is available through the remove
sub-command and allow
removal of data from the archive. Before this happens, a recovery bundle will be
created that allows reverting the operation.
Because of their potential sensitivity, data in recovery bundles is stored encrypted. The system is designed so that extracting content or restoring from a recovery bundle will require a pre-determined set of stakeholders to get together before proceeding.
Dependencies#
swh-alter
requires the rage
, rage-keygen
and optionally
age-plugin-yubikey
commands to be available in the PATH
.
See their respective documentation on how to install them:
rage installation (also provides
rage-keygen
)
age-plugin-yubikey
also requires the pcscd
service to be installed and
running. On Debian systems, the service is available from the package with the
same name.
Configuration#
The tools will not work without a configuration file. It can be created as
~/.config/swh/alter.yml
containing for example:
storage:
cls: remote
url: https://storage-cassandra-ro.softwareheritage.org
graph:
url: "http://granet.internal.softwareheritage.org:5009/graph"
restoration_storage:
cls: remote
url: https://storage-rw.softwareheritage.org
removal_searches:
main:
cls: elasticsearch
hosts:
- elasticsearch:9200
removal_storages:
old_primary:
cls: postgresql
db: "service=swh"
new_primary:
cls: cassandra
hosts:
- cassandra-seed
keyspace: swh
removal_objstorages:
main:
cls: remote
url: https://objstorage.softwareheritage.org
azure:
cls: azure-prefixed
accounts:
"0":
account_name: testswh0
api_secret_key: supersecret
container_name: contents
removal_journals:
main_journal:
cls: kafka
brokers:
- kafka1.internal.softwareheritage.org
prefix: swh.journal.objects
client_id: swh.alter.removals
recovery_bundles:
secret_sharing:
minimum_required_groups: 2
groups:
legal:
minimum_required_shares: 1
recipient_keys:
DPO: age169k6jwg7e2jqjzzsfvqh5v06h56tkss9fx5vmp8xr400272zjq5qux74m5
CLO: age1gdar6q9spzz5d3lul5ng5sf30xt7r2htsx8n5espl0pun6wvv4yqjapdma
sysadmins:
minimum_required_shares: 2
recipient_keys:
YubiKey serial 4245067 slot 1: |-
age1yubikey1q0ucnwg558zcwrc752evk3620q2t4mkwz6a0lq9u3clsfmealsmlz330kz2
YubiKey serial 5229836 slot 1: |-
age1yubikey1qt2p377vq6qg58l8gaframp9yggvsysddraa72aehma5mw623r8rqk0mlgu
YubiKey serial 5254231 slot 2: |-
age1yubikey1q0ucnwg558zcwrc752evk3620q2t4mkwz6a0lq9u3clsfmealsmlz330kz2
See the configuration reference for general information
about the Software Heritage configuration file. storage
,
restoration_storage
and entries in the removal_storages
map uses the
storage configuration. For graph
, see the
graph section. The entries in the removal_searches
map are following the format defined by swh-search
. The entries in the
removal_objstorages
map are used by swh-objstorage
. Finally the entries
in the removal_journals
map follow the journal
format.
In most cases, multiple storages have to be configured:
The
storage
section defines the storage from which information will be read. It is used to determine which objects can be removed from the archive and create recovery bundles. For the latter, it needs to be able to retrieve data from Content objects (through an objstorage).The
restoration_storage
section defines the storage which will be written to in case recovery bundles need to be restored. Usually, this should be the same configuration as used for loaders. Write access is required. For the restoration to fully work, it also needs to be configured to write to an objstorage and a journal.removal_storages
contains storages (identified by an arbitrary key) from which objects will be removed (when usingswh alter remove
).
Likewise, removal_objstorages
and removal_journals
defines objstorages
and journals from which data and messages will be removed by swh alter
remove
.
The graph
section is used to determine which objects can be safely removed
from the archive.
In addition, the organization of the secret sharing process needs to be defined
in secret_sharing
.
Note
The example above requires people from two groups to decrypt recovery bundles: the legal team and the system administration team. For the legal team, either the Data Protection Officer or the Chief Legal Officer will need to provide an identity file with their secret key. For system administrators, at least two of the specified YubiKeys will need to be present.
In the groups section, each group is keyed with an arbitrary identifier. In
each group:
- recipient_keys
associate an identifier for the holder and an
age public key
- minimum_requred_shares
set the threshold of holders required for this group.
The minimum amount of valid groups that are required to recover the decryption
key is set in minimum_required_groups
.
age public key can be created using the age-keygen
or rage-keygen
command (depending on your implementation), or by calling age-plugin-yubikey
to store the private key on a YubiKey.
When using YubiKeys, the secret holder identifier needs to be specified in the
form YubiKey serial ####### slot #
. The required numbers are visible in the
identity file created by age-plugin-yubikey
or by running
age-plugin-yubikey --list
after plugging in the YubiKeys.
Hint
When using YubiKeys, swh alter
does not need any external files to be stored
on the system. Connecting the right YubiKey is all that is required.
Otherwise, the age secret key will need to be provided manually as an identity file. Such files should be stored with care. Being 74 characters long, age secret keys are fairly easy to archive on paper.
Removing objects from the archive#
swh alter remove
will remove a given set of origins, and any objects they
reference (as long as it not referenced elsewhere), from the archive.
$ export SWH_CONFIG_FILENAME=~/config/swh.alter.yml
$ swh alter remove \
--identifier "takedown-notice-2023-07-14-01" \
--recovery-bundle tdn-2023-07-14-01.swh-recovery-bundle \
https://gitlab.softwareheritage.org/swh/devel/swh-alter.git \
https://gitlab.softwareheritage.org/swh/devel/swh-py-template.git
Objects will be removed from entries in removal_searches
,
removal_storages
, removal_journals
, removal_objstorages
defined in
the configuration.
If during the removal process a reference is added to one of the removed
objects, the process will be rolled back: the recovery bundle will be used to
restore objects as they were to restoration_storage
. This will also be the
case if any error happens during the process. The recovery bundle will be left
intact. The process can be retried using
swh alter recovery-bundle resume-removal
command, using the decryption key
printed on the output for this purpose.
Options:
--dry-run
Get a list of objects that would be removed and exit.
--identifier IDENTIFIER
(required)An arbitrary identifier for this removal operation. Stored in recovery bundles.
--recovery-bundle PATH
(required)Location of the recovery bundle that will be created before removing objects from the archive.
--reason REASON
Reason for this removal operation.
--expire YYYY-MM-DD
Date when the recovery bundle should be removed.
Resuming a removal from a recovery bundle#
swh alter recovery-bundle resume-removal
will remove from the archive
all objects contained in a recovery bundle. This can be useful after
using swh alter remove --dry-run=stop-before-removal
or in case
of failures from external resources during the removal operation.
$ swh alter recovery-bundle resume-removal tdn-2023-07-14-01.swh-recovery-bundle
A prompt will ask for the decryption key if it has not been specified via the relevant environment variable or option.
Options:
--decryption-key AGE_SECRET_KEY
Use the given decryption key to access the objects stored in the bundle. The environment variable
SWH_BUNDLE_DECRYPTION_KEY
can be used instead.
Restoring from a recovery bundle#
swh alter recovery-bundle restore
will restore all objects contained in a
recovery bundle to the storage defined in restoration_storage
. In order to
proceed, this command requires enough shared secrets to be recovered.
Alternatively, the bundle decryption key can be provided.
This command also requires the appropriate permissions needed to update Software Heritage storage, journal and object storage.
$ swh alter recovery-bundle restore tdn-2023-07-14-01.swh-recovery-bundle
Options:
--decryption-key AGE_SECRET_KEY
Use the given decryption key instead of the bundle shared secrets (see Operating recovery bundles remotely).
--secret MNEMONIC
Known shared secret. May be repeated.
--identity IDENTITY
Path to an age identity file holding a secret key. May be repeated.
Getting information from a recovery bundle#
swh alter recovery-bundle info
will output information on a given recovery bundle.
This will display the identifier provided during the removal operation, the date of creation, reason for the removal, expiration date, the identifier of the secret share holders, and the SWHIDs of stored objects.
$ swh alter recovery-bundle info tdn-2023-07-14-01.swh-recovery-bundle
Options:
--dump-manifest
Show raw manifest in YAML format.
--show-encrypted-secrets
Show encrypted secrets for each share holder. This allows for out of band recovery of the shared secret by providing the encrypted payload to the secret holder (see also Operating recovery bundles remotely).
Extracting content stored in a recovery bundle#
swh alter recovery-bundle extract-content
will extract data from a
Content object stored in a recovery bundle. In order to proceed, this command
requires enough shared secrets to be recovered. Alternatively, the bundle
decryption key can be provided.
See Getting information from a recovery bundle on how to get a list of objects stored in recovery bundle.
$ swh alter recovery-bundle extract-content \
--output requirements.txt \
tdn-2023-07-14-01.swh-recovery-bundle \
swh:1:cnt:3d65be4c62d36aac611260b47555ac9d51cd5515
Options:
--output PATH
(required)Path of the file that will be written with the extracted content.
-
can be used to print the content to the standard output.--decryption-key AGE_SECRET_KEY
Use the given decryption key instead of the bundle shared secrets (see Operating recovery bundles remotely).
--secret MNEMONIC
Known shared secret. May be repeated.
--identity IDENTITY
Path to an age identity file holding a secret key. May be repeated.
Operating recovery bundles remotely#
Operations that require to decrypt objects from recovery bundle all offer a
--decryption-key
option. It can be used to directly provide the
age secret key that decrypts objects contained in the bundle.
This option enables remote operations. In the case not all secret share holders can physically work on the same computer, or if the system having the right permission to update the Software Heritage archive is only available remotely, this decryption key can first be recovered in one or more separate steps.
swh alter recovery-bundle recover-decryption-key
will help to recover the
secret key protected by the shared secrets. It supports several situations:
If all secret share holders can work on the same computer, then the decryption key can be recovered directly:
$ swh alter recovery-bundle recover-decryption-key \ --identity age-identity-dpo.txt \ tdn-2023-07-14-01.swh-recovery-bundle 🚸 The following secret shares will not be decrypted: CFO 🔐 Please insert YubiKey serial 4245067 slot 1, YubiKey serial 5229836 slot 1 or YubiKey serial 5254231 slot 2 and press Enter… 🔧 Decrypting share using YubiKey serial 4245067 slot 1… 💭 You might need to tap the right YubiKey when it blinks. 🔧 Decrypting share using YubiKey serial 5254231 slot 2… 💭 You might need to tap the right YubiKey when it blinks. 🔓 Recovered decryption key: AGE-SECRET-KEY-15PQHAGKV59TFK9TCCWLQZZ7XVV0FADVX5TSCDWVZSEWZ4L2SMARSJAAR0W
If secret share holders are distributed, they will first need to separately recover their shared secret. For example, for the example configuration given above, the DPO would run:
$ swh alter recovery-bundle recover-decryption-key \ --show-recovered-secrets \ --identity age-identity-dpo.txt \ tdn-2023-07-14-01.swh-recovery-bundle 🔑 Recovered shared secret from DPO: [takedown-notice-2023-07-14-01] union echo acrobat easy actress desert decrease surprise armed force river insect pencil debut unhappy desktop lungs viral sister client ocean wisdom friar year formal knit mild endless breathe benefit obesity kidney decrease 🚸 The following secret shares will not be decrypted: CFO 🔐 Please insert YubiKey serial 4245067 slot 1, YubiKey serial 5229836 slot 1 or YubiKey serial 5254231 slot 2 and press Enter… [Ctrl+C]
It is also possible to decrypt the secret without requiring swh-alter. One can retrieve the encrypted payload of a shared secret holder by running:
$ swh alter recovery-bundle info \ --show-encrypted-secrets \ tdn-2023-07-14-01.swh-recovery-bundle […] - DPO -----BEGIN AGE ENCRYPTED FILE----- YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBDNkRoR1FtSnNaRENpWTlP […] -----END AGE ENCRYPTED FILE-----
After receiving the encrypted payload, the DPO can then the following command on their own computer to recover their secret:
$ rage --decrypt --identity age-identity-dpo.txt -----BEGIN AGE ENCRYPTED FILE----- YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBDNkRoR1FtSnNaRENpWTlP […] -----END AGE ENCRYPTED FILE----- [Ctrl+D] [takedown-notice-2023-07-14-01] union echo acrobat easy actress desert decrease surprise armed force river insect pencil debut unhappy desktop lungs viral sister client ocean wisdom friar year formal knit mild endless breathe benefit obesity kidney decrease
The legal group only requires one secret, so this is enough. Meanwhile, two system administrators use their YubiKeys to recover the required amount of secrets for their group:
$ swh alter recovery-bundle recover-decryption-key \ --show-recovered-secrets \ tdn-2023-07-14-01.swh-recovery-bundle 🚸 The following secret shares will not be decrypted: DPO, CFO 🔐 Please insert YubiKey serial 4245067 slot 1, YubiKey serial 5229836 slot 1 or YubiKey serial 5254231 slot 2 and press Enter… 🔧 Decrypting share using YubiKey serial 4245067 slot 1… 💭 You might need to tap the right YubiKey when it blinks. 🔑 Recovered shared secret from YubiKey serial 4245067 slot 1: union echo beard entrance alien photo cage mailman cleanup society petition craft script snapshot that step estate watch detailed dryer cause hanger deploy calcium idea sack venture bundle training famous endorse permit crowd 🔧 Decrypting share using YubiKey serial 5254231 slot 2… 💭 You might need to tap the right YubiKey when it blinks. 🔑 Recovered shared secret from YubiKey serial 5254231 slot 2: union echo beard email anatomy install leader coal window pencil depict either kitchen decorate cylinder auction expect beam alien sympathy image failure diminish impact round bike mayor ting painting often zero manual enforce 🔐 Please insert YubiKey serial 5229836 slot 1 and press Enter… [Ctrl+C]
The decryption key can then be recovered by providing these secrets:
$ swh alter recovery-bundle recover-decryption-key \ --secret "union echo acrobat easy […] crowd" \ --secret "union echo beard entrance […] crowd" \ --secret "union echo beard email […] enforce" \ tdn-2023-07-14-01.swh-recovery-bundle 🚸 The following secret shares will not be decrypted: DPO, CFO 🔓 Recovered decryption key: AGE-SECRET-KEY-15PQHAGKV59TFK9TCCWLQZZ7XVV0FADVX5TSCDWVZSEWZ4L2SMARSJAAR0W
Note
The shared secrets should be 33 words long. They have been elided here for clarity. All shared secrets should have the same first two words. All shared secrets from a given group should also have same first third word.
It is possible to both provide shared secrets on the command line and use identity files or YubiKeys for the missing ones. This applies to all commands needing data stored in a bundle. For example:
$ swh alter recovery-bundle recover-decryption-key \
--secret "union echo beard entrance […] crowd" \
--secret "union echo beard email […] enforce" \
--identity age-identity-dpo.txt \
tdn-2023-07-14-01.swh-recovery-bundle
Options for swh alter recovery-bundle recover-decryption-key
:
--secret MNEMONIC
Known shared secret. May be repeated.
--identity IDENTITY
Path to an age identity file holding a secret key. May be repeated.
--show-recovered-secrets
Show recovered shared secrets from YubiKeys are identity files.