swh.perfecthash package#

Submodules#

Module contents#

class swh.perfecthash.ShardCreator(path: str, object_count: int)[source]#

Bases: object

Create a Shard.

The file at path will be truncated if it already exists.

object_count must match the number of objects that will be added using the write() method. A RuntimeError will be raised on finalize() in case of inconsistencies.

Ideally this should be done using a with statement, as such:

with ShardCreator("shard", len(objects)) as shard:
    for key, object in objects.items():
        shard.write(key, object)

Otherwise, prepare(), write() and finalize() must be called in sequence.

Parameters:
  • path – path to the Shard file or device that will be written.

  • object_count – number of objects that will be written to the Shard.

prepare() None[source]#

Initialize the shard.

Raises:

RuntimeError – something went wrong while creating the Shard.

finalize() None[source]#

Finalize the Shard.

Write the index and the perfect hash table that will be used to find the content of the objects from their key.

Raises:

RuntimeError – if the number of written objects does not match object_count, or if something went wrong while saving.

write(key: Key, object: bytes) None[source]#

Add the key/object pair to the Read Shard.

Parameters:
  • key – the unique key associated with the object.

  • object – the object

Raises:
  • ValueError – if the key length is wrong, or if enough objects have already been written.

  • RuntimeError – if something wrong happens when writing the object.

class swh.perfecthash.Shard(path: str)[source]#

Bases: object

Files storing objects indexed with a perfect hash table.

This class allows creating a Read Shard by adding key/object pairs and looking up the content of an object when given the key.

This class can act as a context manager, like so:

with Shard("shard") as shard:
    return shard.lookup(key)

Open an existing Read Shard.

Parameters:

path – path to an existing Read Shard file or device

close() None[source]#
static key_len()[source]#
lookup(key: Key) bytes[source]#

Fetch the object matching the key in the Read Shard.

Fetching an object is O(1): one lookup in the index to obtain the offset of the object in the Read Shard and one read to get the payload.

Parameters:

key – the key associated with the object to retrieve.

Returns:

the object as bytes.

Raises:
static delete(path: str, key: Key)[source]#

Open the Shard file and delete the given key.

The object size and data will be overwritten by zeros. The Shard file size and offsets are not changed for safety.

Parameters:

key – the key associated with the object to retrieve.

Raises: