How to process takedown requests#
Intended audience
Operation/Sysadm staff members
Information#
The cli used in the following page is documented in the project documentation <https://docs.softwareheritage.org/devel/swh-alter/usage.html>.
Deployment#
This occurs in the main infrastructure, so deployed in kubernetes.
The pods named alter-$UUID are toolbox pod like.
We need an operator/sysadm to connect to it to trigger the swh alter remove cli call.
Those pods use a ceph persistent volume. That makes their output artifacts stored in /srv/recovery-bundles persistent across restarts.
The configuration of swh-alter in the different environments is managed in the repository swh-charts.
A pod alter is deployed and ready to be used in each environment.
The alter configuration uses dedicated deletion allowed ingress endpoints.
How to perform a takedown request#
In the following, we will see how to process a takedown request from the reception up to the response after having processed the requests.
Prerequisite#
Received an email in tdn tech mailbox from management asking for the removal of: - one or several origins - a SWHID to a specific object
A running swh-graph instance
A storage database (postgresql) with the reference tables populated since the last swh-graph update
Procedure for the SWH environment#
Prior to the actual removal, it’s preferable to clone the pod. The removal process can be long, so this will avoid the pod being redeployed if some new version is deploying in the infra during the removal process. In the same way, that process is interactive. It checks what needs to be removed and asks for your validation to trigger the removal.
Clone the current alter pod
CONTEXT=archive-production-rke2
NAMESPACE=swh-cassandra
CLONE_NAME=$(id -un)-alter
kubectl debug --context $CONTEXT -n $NAMESPACE \
$(kubectl --context $CONTEXT -n $NAMESPACE get pods -l app=alter -o name | head -1) \
--container=alter --copy-to=$CLONE_NAME -- sleep infinity
kubectl wait --timeout=3600s --context $CONTEXT -n $NAMESPACE --for=condition=Ready pod/$CLONE_NAME
kubectl --context $CONTEXT -n $NAMESPACE exec pod/$CLONE_NAME -it -- /bin/bash
Then connect to that pod with kubectl or k9s
Once connected, open a tmux session so can connect/disconnect from the pod without losing context
Activate the venv
source venv/bin/activate
Remove the content#
Commands will be launched from the (cloned) alter pod:
Define the request identifier, use requester-uniq-id which is the uuid from the alteration requests UI. The pattern matches the following https://archive.softwareheritage.org/admin/alteration/requester-uniq-id/
IDENTIFIER="YYYYDDMM-<requester-uniq-id>"
With just a few origin/swhid, call:
swh --log-level swh:INFO --log-level azure.core.pipeline.policies.http_logging_policy:WARNING \
alter remove \
--identifier $IDENTIFIER \
--recovery-bundle /srv/recovery-bundles/$IDENTIFIER.zip \
--reason 'Request from copyright owner' \
<origin|swhid> <origin|swhid> ... | tee /srv/recovery-bundles/$IDENTIFIER.log
...
Proceed with removing of XXXX SWHIDs [y/N] ?
With lots of origin/swhid, use an intermediary file, so the call becomes:
# With multiple origins, write origins/swhids to a file first to simplify the call
echo '<origin|swhid>\norigin|swhid>\n' > $IDENTIFIER.origins
# Then reuse that file when executing the alter command
swh --log-level swh:INFO --log-level azure.core.pipeline.policies.http_logging_policy:WARNING \
alter remove \
--identifier $IDENTIFIER \
--recovery-bundle /srv/recovery-bundles/$IDENTIFIER.zip \
--reason 'Request from copyright owner' \
$(cat $IDENTIFIER.origins) | tee /srv/recovery-bundles/$IDENTIFIER.log
Proceed with removing of XXXX SWHIDs [y/N] ?
The process will output a age key, copy it alongside the output bundle:
# Temporary during the test period
# Copy the key (logged in the output of the previous call) and save it close to the
# recovery-bundle
echo AGE-SECRET-KEY-XXXX > /srv/recovery-bundles/$IDENTIFIER.key
Note:
The number of SWHIDs is only informational. If no errors are logged during the object search, just proceed to the removal.
At the end of the process, a search of potential new references to the removed objects is done. If a new reference is detected (that is, an object has been added to the archive that points to one of the removed objects), the bundle is restored and the removal must be restarted
Response#
We use the alteration requests UI, open the existing request uuid page https://archive.softwareheritage.org/admin/alteration/<request-uuid>/
Then click on send a message, select Support and then write the content of what has been done:
Keep the summary of what has been processed relevant and minimal. You can drop the irrevant mentions (i.e. if no blocked origins, no need for that entry).
Other commands#
We focused on the take down process. Some other tools under swh alter cli can be used. They are shown for documentation purposes.
Unless specified otherwise, like the previous command, they should be executed in the alter pod.
Test a recovery bundle#
swh alter recovery-bundle info /srv/recovery-bundle/$IDENTIFIER.zip
Restore a recovery bundle#
swh alter recovery-bundle restore \
--decryption-key $(cat /srv/recovery-bundles/$IDENTIFIER.key) \
/srv/recovery-bundles/$IDENTIFIER.zip
Blocking any future ingestion of an origin#
A couple of options are available to interact with blocking requests:
The blocking commands are available in the swh-toolbox pod.
We can block origins while waiting for the takedown request to be validated by data officer:
export SWH_CONFIG_FILENAME=/etc/swh/config-blocking.yml
swh storage blocking new-request $IDENTIFIER
If the blocking request is related to a takedown request, the same identifier can be used. A text editor is opened to ask for a reason (usually provided in the alteration requests ui). For example, ‘outdated personal information’, ‘copyright violation’.
Updating a blocked origin#
swh storage blocking update-objects $IDENTIFIER blocked
Enter the list of origins to block on stdin and CTRL+d to end. A “commit” message is asked to explain the operation for example “added origins”.
Unblocking an origin#
A request can be completely disabled with:
swh storage blocking clear-requests $IDENTIFIER
If a specific origin must be removed in a request:
swh storage blocking list-requests
swh storage blocking update-objects $IDENTIFIER non-blocked