swh.vault package

Submodules

swh.vault.backend module

exception swh.vault.backend.NotFoundExc[source]

Bases: Exception

Bundle was not found.

__module__ = 'swh.vault.backend'
__weakref__

list of weak references to the object (if defined)

swh.vault.backend.batch_to_bytes(batch)[source]
class swh.vault.backend.VaultBackend(db, cache, scheduler, storage=None, **config)[source]

Bases: object

Backend for the Software Heritage vault.

__init__(db, cache, scheduler, storage=None, **config)[source]

Initialize self. See help(type(self)) for accurate signature.

get_db()[source]
put_db(db)[source]
task_info(obj_type, obj_id, db=None, cur=None)[source]

Fetch information from a bundle

_send_task(*args)[source]

Send a cooking task to the celery scheduler

create_task(obj_type, obj_id, sticky=False, db=None, cur=None)[source]

Create and send a cooking task

add_notif_email(obj_type, obj_id, email, db=None, cur=None)[source]

Add an e-mail address to notify when a given bundle is ready

cook_request(obj_type, obj_id, *, sticky=False, email=None, db=None, cur=None)[source]

Main entry point for cooking requests. This starts a cooking task if needed, and add the given e-mail to the notify list

batch_cook(batch, db=None, cur=None)[source]

Cook a batch of bundles and returns the cooking id.

batch_info(batch_id, db=None, cur=None)[source]

Fetch information from a batch of bundles

is_available(obj_type, obj_id, db=None, cur=None)[source]

Check whether a bundle is available for retrieval

fetch(obj_type, obj_id, db=None, cur=None)[source]

Retrieve a bundle from the cache

update_access_ts(obj_type, obj_id, db=None, cur=None)[source]

Update the last access timestamp of a bundle

set_status(obj_type, obj_id, status, db=None, cur=None)[source]

Set the cooking status of a bundle

set_progress(obj_type, obj_id, progress, db=None, cur=None)[source]

Set the cooking progress of a bundle

send_all_notifications(obj_type, obj_id, db=None, cur=None)[source]

Send all the e-mails in the notification list of a bundle

send_notification(n_id, email, obj_type, obj_id, status, progress_msg=None, db=None, cur=None)[source]

Send the notification of a bundle to a specific e-mail

_smtp_send(msg)[source]
_cache_expire(cond, *args, db=None, cur=None)[source]

Low-level expiration method, used by cache_expire_* methods

cache_expire_oldest(n=1, by='last_access', db=None, cur=None)[source]

Expire the n oldest bundles

cache_expire_until(date, by='last_access', db=None, cur=None)[source]

Expire all the bundles until a certain date

__dict__ = mappingproxy({'get_db': <function VaultBackend.get_db>, 'add_notif_email': <function VaultBackend.add_notif_email>, '__weakref__': <attribute '__weakref__' of 'VaultBackend' objects>, 'task_info': <function VaultBackend.task_info>, '__dict__': <attribute '__dict__' of 'VaultBackend' objects>, 'fetch': <function VaultBackend.fetch>, '__init__': <function VaultBackend.__init__>, 'send_notification': <function VaultBackend.send_notification>, '__doc__': '\n Backend for the Software Heritage vault.\n ', '_send_task': <function VaultBackend._send_task>, '_smtp_send': <function VaultBackend._smtp_send>, 'set_progress': <function VaultBackend.set_progress>, 'batch_info': <function VaultBackend.batch_info>, 'cache_expire_oldest': <function VaultBackend.cache_expire_oldest>, '__module__': 'swh.vault.backend', 'cache_expire_until': <function VaultBackend.cache_expire_until>, 'set_status': <function VaultBackend.set_status>, 'update_access_ts': <function VaultBackend.update_access_ts>, '_cache_expire': <function VaultBackend._cache_expire>, 'create_task': <function VaultBackend.create_task>, 'put_db': <function VaultBackend.put_db>, 'is_available': <function VaultBackend.is_available>, 'send_all_notifications': <function VaultBackend.send_all_notifications>, 'cook_request': <function VaultBackend.cook_request>, 'batch_cook': <function VaultBackend.batch_cook>})
__module__ = 'swh.vault.backend'
__weakref__

list of weak references to the object (if defined)

swh.vault.cache module

class swh.vault.cache.VaultCache(**objstorage)[source]

Bases: object

The Vault cache is an object storage that stores Vault bundles.

This implementation computes sha1(‘<bundle_type>:<object_id>’) as the internal identifiers used in the underlying objstorage.

__init__(**objstorage)[source]

Initialize self. See help(type(self)) for accurate signature.

add(obj_type, obj_id, content)[source]
get(obj_type, obj_id)[source]
delete(obj_type, obj_id)[source]
add_stream(obj_type, obj_id, content_iter)[source]
get_stream(obj_type, obj_id)[source]
is_cached(obj_type, obj_id)[source]
_get_internal_id(obj_type, obj_id)[source]
__dict__ = mappingproxy({'_get_internal_id': <function VaultCache._get_internal_id>, '__doc__': "The Vault cache is an object storage that stores Vault bundles.\n\n This implementation computes sha1('<bundle_type>:<object_id>') as the\n internal identifiers used in the underlying objstorage.\n ", 'get': <function VaultCache.get>, '__weakref__': <attribute '__weakref__' of 'VaultCache' objects>, 'is_cached': <function VaultCache.is_cached>, 'add_stream': <function VaultCache.add_stream>, 'add': <function VaultCache.add>, '__dict__': <attribute '__dict__' of 'VaultCache' objects>, '__init__': <function VaultCache.__init__>, '__module__': 'swh.vault.cache', 'get_stream': <function VaultCache.get_stream>, 'delete': <function VaultCache.delete>})
__module__ = 'swh.vault.cache'
__weakref__

list of weak references to the object (if defined)

swh.vault.cli module

swh.vault.cli.main()[source]

swh.vault.cooking_tasks module

swh.vault.to_disk module

swh.vault.to_disk.get_filtered_files_content(storage, files_data)[source]

Retrieve the files specified by files_data and apply filters for skipped and missing contents.

Parameters:
  • storage – the storage from which to retrieve the objects
  • files_data – list of file entries as returned by directory_ls()
Yields:

The entries given in files_data with a new ‘content’ key that points to the file content in bytes.

The contents can be replaced by a specific message to indicate that they could not be retrieved (either due to privacy policy or because their sizes were too big for us to archive it).

swh.vault.to_disk.apply_chunked(func, input_list, chunk_size)[source]

Apply func on input_list divided in chunks of size chunk_size

class swh.vault.to_disk.DirectoryBuilder(storage, root, dir_id)[source]

Bases: object

Reconstructs the on-disk representation of a directory in the storage.

__init__(storage, root, dir_id)[source]

Initialize the directory builder.

Parameters:
  • storage – the storage object
  • root – the path where the directory should be reconstructed
  • dir_id – the identifier of the directory in the storage
build()[source]

Perform the reconstruction of the directory in the given root.

_create_tree(directories)[source]

Create a directory tree from the given paths

The tree is created from root and each given directory in directories will be created.

_create_files(files_data)[source]

Create the files in the tree and fetch their contents.

_create_revisions(revs_data)[source]

Create the revisions in the tree as broken symlinks to the target identifier.

_create_file(path, content, mode=33188)[source]

Create the given file and fill it with content.

__dict__ = mappingproxy({'build': <function DirectoryBuilder.build>, '_create_tree': <function DirectoryBuilder._create_tree>, '__doc__': 'Reconstructs the on-disk representation of a directory in the storage.\n ', '_create_file': <function DirectoryBuilder._create_file>, '_create_revisions': <function DirectoryBuilder._create_revisions>, '__module__': 'swh.vault.to_disk', '__weakref__': <attribute '__weakref__' of 'DirectoryBuilder' objects>, '__dict__': <attribute '__dict__' of 'DirectoryBuilder' objects>, '__init__': <function DirectoryBuilder.__init__>, '_create_files': <function DirectoryBuilder._create_files>})
__module__ = 'swh.vault.to_disk'
__weakref__

list of weak references to the object (if defined)

Module contents

swh.vault.get_vault(cls='remote', args={})[source]

Get a vault object of class vault_class with arguments vault_args.

Parameters:
  • vault (dict) – dictionary with keys:
  • cls (-) – vault’s class, either ‘remote’
  • args (-) – dictionary with keys
Returns:

an instance of VaultBackend (either local or remote)

Raises:

ValueError if passed an unknown storage class.