swh.core.utils module#

swh.core.utils.cwd(path)[source]#

Contextually change the working directory to do thy bidding. Then gets back to the original location.

swh.core.utils.grouper(iterable, n)[source]#

Collect data into fixed-length size iterables. The last block might contain less elements as it will hold only the remaining number of elements.

The invariant here is that the number of elements in the input iterable and the sum of the number of elements of all iterables generated from this function should be equal.

If iterable is an iterable of bytes or strings that you need to join later, then iter_chunks`() is preferable, as it avoids this join by slicing directly.

Parameters:
  • iterable (Iterable) – an iterable

  • n (int) – size of block to slice the iterable into

Yields:

fixed-length blocks as iterables. As mentioned, the last iterable might be less populated.

swh.core.utils.iter_chunks(iterable: Iterable[TStr], chunk_size: int, *, remainder: bool = False) Iterable[TStr][source]#

Reads bytes objects (resp. str objects) from the iterable, and yields them as chunks of exactly chunk_size bytes (resp. characters).

iterable is typically obtained by repeatedly calling a method like io.RawIOBase.read(); which does only guarantees an upper bound on the size; whereas this function returns chunks of exactly the size.

Parameters:
  • iterable – the input data

  • chunk_size – the exact size of chunks to return

  • remainder – if True, a last chunk with size strictly smaller than chunk_size may be returned, if the data stream from the iterable had a length that is not a multiple of chunk_size

swh.core.utils.backslashescape_errors(exception)[source]#
swh.core.utils.encode_with_unescape(value)[source]#

Encode an unicode string containing x<hex> backslash escapes

swh.core.utils.decode_with_escape(value)[source]#

Decode a bytestring as utf-8, escaping the bytes of invalid utf-8 sequences as x<hex value>. We also escape NUL bytes as they are invalid in JSON strings.

swh.core.utils.commonname(path0, path1, as_str=False)[source]#

Compute the commonname between the path0 and path1.

swh.core.utils.numfile_sortkey(fname: str) Tuple[int, str][source]#

Simple function to sort filenames of the form:

nnxxx.ext

where nn is a number according to the numbers.

Returns a tuple (order, remaining), where ‘order’ is the numeric (int) value extracted from the file name, and ‘remaining’ is the remaining part of the file name.

Typically used to sort sql/nn-swh-xxx.sql files.

Unmatched file names will return 999999 as order value.

swh.core.utils.basename_sortkey(fname: str) Tuple[int, str][source]#

like numfile_sortkey but on basenames