Technical specifications¶
Requirements¶
one dedicated database to store the deposit’s state - swh-deposit
one dedicated temporary storage to store archives before loading
one client to test the communication with SWORD protocol
Deposit reception schema¶
SWORD imposes the use of basic authentication, so we need a way to authenticate client. Also, a client can access collections:
deposit_client table:
id (bigint): Client’s identifier
username (str): Client’s username
password (pass): Client’s encrypted password
collections ([id]): List of collections the client can access
Collections group deposits together:
deposit_collection table:
id (bigint): Collection’s identifier
name (str): Collection’s human readable name
A deposit is the main object the repository is all about:
deposit table:
id (bigint): deposit’s identifier
reception_date (date): First deposit’s reception date
complete_data (date): Date when the deposit is deemed complete and ready for loading
collection (id): The collection the deposit belongs to
external id (text): client’s internal identifier (e.g hal’s id, etc…).
client_id (id) : Client which did the deposit
swh_id (str) : swh identifier result once the loading is complete
status (enum): The deposit’s current status
As mentioned, a deposit can have a status, whose possible values are:
'partial', -- the deposit is new or partially received since it -- can be done in multiple requests 'expired', -- deposit has been there too long and is now deemed -- ready to be garbage collected 'deposited' -- deposit complete, it is ready to be checked to ensure data consistency 'verified', -- deposit is fully received, checked, and ready for loading 'loading', -- loading is ongoing on swh's side 'done', -- loading is successful 'failed' -- loading is a failure
A deposit is stateful and can be made in multiple requests:
deposit_request table:
id (bigint): identifier
type (id): deposit request’s type (possible values: ‘archive’, ‘metadata’)
deposit_id (id): deposit whose request belongs to
metadata: metadata associated to the request
date (date): date of the requests
Information sent along a request are stored in a
deposit_request
row.They can be either of type
metadata
(atom entry, multipart’s atom entry part) or of typearchive
(binary upload, multipart’s binary upload part).When the deposit is complete (status
deposited
), thosemetadata
andarchive
deposit requests will be read and aggregated. They will then be sent as parameters to the loading routine.During loading, some of those metadata are kept in the
origin_metadata
table and some other are stored in therevision
table (see metadata loading).The only update actions occurring on the deposit table are in regards of:
status changes (see figure below):
partial
-> {expired
/deposited
},deposited
-> {rejected
/verified
},verified
->loading
loading
-> {done
/failed
}
complete_date
when the deposit is finalized (when the status is changed todeposited
)swh-id
is populated once we have the loading result