swh.vault package


swh.vault.backend module

exception swh.vault.backend.NotFoundExc[source]

Bases: Exception

Bundle was not found.

__module__ = 'swh.vault.backend'

list of weak references to the object (if defined)

class swh.vault.backend.VaultBackend(db, cache, scheduler, storage=None, **config)[source]

Bases: object

Backend for the Software Heritage vault.

__init__(db, cache, scheduler, storage=None, **config)[source]

Initialize self. See help(type(self)) for accurate signature.

task_info(obj_type, obj_id, db=None, cur=None)[source]

Fetch information from a bundle


Send a cooking task to the celery scheduler

create_task(obj_type, obj_id, sticky=False, db=None, cur=None)[source]

Create and send a cooking task

add_notif_email(obj_type, obj_id, email, db=None, cur=None)[source]

Add an e-mail address to notify when a given bundle is ready

cook_request(obj_type, obj_id, *, sticky=False, email=None, db=None, cur=None)[source]

Main entry point for cooking requests. This starts a cooking task if needed, and add the given e-mail to the notify list

batch_cook(batch, db=None, cur=None)[source]

Cook a batch of bundles and returns the cooking id.

batch_info(batch_id, db=None, cur=None)[source]

Fetch information from a batch of bundles

is_available(obj_type, obj_id, db=None, cur=None)[source]

Check whether a bundle is available for retrieval

fetch(obj_type, obj_id, db=None, cur=None)[source]

Retrieve a bundle from the cache

update_access_ts(obj_type, obj_id, db=None, cur=None)[source]

Update the last access timestamp of a bundle

set_status(obj_type, obj_id, status, db=None, cur=None)[source]

Set the cooking status of a bundle

set_progress(obj_type, obj_id, progress, db=None, cur=None)[source]

Set the cooking progress of a bundle

send_all_notifications(obj_type, obj_id, db=None, cur=None)[source]

Send all the e-mails in the notification list of a bundle

send_notification(n_id, email, obj_type, obj_id, status, progress_msg=None, db=None, cur=None)[source]

Send the notification of a bundle to a specific e-mail

_cache_expire(cond, *args, db=None, cur=None)[source]

Low-level expiration method, used by cache_expire_* methods

cache_expire_oldest(n=1, by='last_access', db=None, cur=None)[source]

Expire the n oldest bundles

cache_expire_until(date, by='last_access', db=None, cur=None)[source]

Expire all the bundles until a certain date

__dict__ = mappingproxy({'cook_request': <function VaultBackend.cook_request>, 'task_info': <function VaultBackend.task_info>, 'fetch': <function VaultBackend.fetch>, 'get_db': <function VaultBackend.get_db>, '_cache_expire': <function VaultBackend._cache_expire>, '_send_task': <function VaultBackend._send_task>, '__init__': <function VaultBackend.__init__>, 'cache_expire_oldest': <function VaultBackend.cache_expire_oldest>, '_smtp_send': <function VaultBackend._smtp_send>, 'set_progress': <function VaultBackend.set_progress>, 'set_status': <function VaultBackend.set_status>, '__doc__': '\n Backend for the Software Heritage vault.\n ', '__module__': 'swh.vault.backend', 'cache_expire_until': <function VaultBackend.cache_expire_until>, 'update_access_ts': <function VaultBackend.update_access_ts>, 'create_task': <function VaultBackend.create_task>, '__dict__': <attribute '__dict__' of 'VaultBackend' objects>, 'send_all_notifications': <function VaultBackend.send_all_notifications>, '__weakref__': <attribute '__weakref__' of 'VaultBackend' objects>, 'put_db': <function VaultBackend.put_db>, 'batch_info': <function VaultBackend.batch_info>, 'add_notif_email': <function VaultBackend.add_notif_email>, 'batch_cook': <function VaultBackend.batch_cook>, 'send_notification': <function VaultBackend.send_notification>, 'is_available': <function VaultBackend.is_available>})
__module__ = 'swh.vault.backend'

list of weak references to the object (if defined)

swh.vault.cache module

class swh.vault.cache.VaultCache(**objstorage)[source]

Bases: object

The Vault cache is an object storage that stores Vault bundles.

This implementation computes sha1(‘<bundle_type>:<object_id>’) as the internal identifiers used in the underlying objstorage.


Initialize self. See help(type(self)) for accurate signature.

add(obj_type, obj_id, content)[source]
get(obj_type, obj_id)[source]
delete(obj_type, obj_id)[source]
add_stream(obj_type, obj_id, content_iter)[source]
get_stream(obj_type, obj_id)[source]
is_cached(obj_type, obj_id)[source]
_get_internal_id(obj_type, obj_id)[source]
__dict__ = mappingproxy({'get': <function VaultCache.get>, '__doc__': "The Vault cache is an object storage that stores Vault bundles.\n\n This implementation computes sha1('<bundle_type>:<object_id>') as the\n internal identifiers used in the underlying objstorage.\n ", '__module__': 'swh.vault.cache', '__init__': <function VaultCache.__init__>, 'is_cached': <function VaultCache.is_cached>, 'add_stream': <function VaultCache.add_stream>, 'delete': <function VaultCache.delete>, 'get_stream': <function VaultCache.get_stream>, '__dict__': <attribute '__dict__' of 'VaultCache' objects>, 'add': <function VaultCache.add>, '__weakref__': <attribute '__weakref__' of 'VaultCache' objects>, '_get_internal_id': <function VaultCache._get_internal_id>})
__module__ = 'swh.vault.cache'

list of weak references to the object (if defined)

swh.vault.cli module


swh.vault.cooking_tasks module

swh.vault.to_disk module

swh.vault.to_disk.get_filtered_files_content(storage, files_data)[source]

Retrieve the files specified by files_data and apply filters for skipped and missing contents.

  • storage – the storage from which to retrieve the objects
  • files_data – list of file entries as returned by directory_ls()

The entries given in files_data with a new ‘content’ key that points to the file content in bytes.

The contents can be replaced by a specific message to indicate that they could not be retrieved (either due to privacy policy or because their sizes were too big for us to archive it).

swh.vault.to_disk.apply_chunked(func, input_list, chunk_size)[source]

Apply func on input_list divided in chunks of size chunk_size

class swh.vault.to_disk.DirectoryBuilder(storage, root, dir_id)[source]

Bases: object

Reconstructs the on-disk representation of a directory in the storage.

__init__(storage, root, dir_id)[source]

Initialize the directory builder.

  • storage – the storage object
  • root – the path where the directory should be reconstructed
  • dir_id – the identifier of the directory in the storage

Perform the reconstruction of the directory in the given root.


Create a directory tree from the given paths

The tree is created from root and each given directory in directories will be created.


Create the files in the tree and fetch their contents.


Create the revisions in the tree as broken symlinks to the target identifier.

_create_file(path, content, mode=33188)[source]

Create the given file and fill it with content.

__dict__ = mappingproxy({'__doc__': 'Reconstructs the on-disk representation of a directory in the storage.\n ', '__module__': 'swh.vault.to_disk', '__init__': <function DirectoryBuilder.__init__>, '_create_file': <function DirectoryBuilder._create_file>, '__dict__': <attribute '__dict__' of 'DirectoryBuilder' objects>, '__weakref__': <attribute '__weakref__' of 'DirectoryBuilder' objects>, 'build': <function DirectoryBuilder.build>, '_create_files': <function DirectoryBuilder._create_files>, '_create_revisions': <function DirectoryBuilder._create_revisions>, '_create_tree': <function DirectoryBuilder._create_tree>})
__module__ = 'swh.vault.to_disk'

list of weak references to the object (if defined)

Module contents

swh.vault.get_vault(cls='remote', args={})[source]

Get a vault object of class vault_class with arguments vault_args.

  • vault (dict) – dictionary with keys:
  • cls (-) – vault’s class, either ‘remote’
  • args (-) – dictionary with keys

an instance of VaultBackend (either local or remote)


ValueError if passed an unknown storage class.