swh.core.api.serializers module#

swh.core.api.serializers.encode_datetime(dt: datetime) str[source]#

Wrapper of datetime.datetime.isoformat() that forbids naive datetimes.

swh.core.api.serializers.exception_to_dict(exception: BaseException) Dict[str, Any][source]#
swh.core.api.serializers.dict_to_exception(exc_dict: Dict[str, Any]) Exception[source]#
swh.core.api.serializers.encode_timedelta(td: timedelta) Dict[str, int][source]#
swh.core.api.serializers.get_encoders(extra_encoders: List[Tuple[Type, str, Callable]] | None, with_json: bool = False) List[Tuple[Type, str, Callable]][source]#
swh.core.api.serializers.get_decoders(extra_decoders: Dict[str, Callable] | None, with_json: bool = False) Dict[str, Callable][source]#
class swh.core.api.serializers.MsgpackExtTypeCodes(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: Enum

swh.core.api.serializers.encode_data_client(data: Any, extra_encoders=None) bytes[source]#
swh.core.api.serializers.decode_response(response: Response, extra_decoders=None) Any[source]#
class swh.core.api.serializers.SWHJSONEncoder(extra_encoders=None, **kwargs)[source]#

Bases: JSONEncoder

JSON encoder for data structures generated by Software Heritage.

This JSON encoder extends the default Python JSON encoder and adds awareness for the following specific types:

  • bytes (get encoded as a Base85 string);

  • datetime.datetime (get encoded as an ISO8601 string).

Non-standard types get encoded as a a dictionary with two keys:

  • swhtype with value ‘bytes’ or ‘datetime’;

  • d containing the encoded value.

SWHJSONEncoder also encodes arbitrary iterables as a list (allowing serialization of generators).

Caveats: Limitations in the JSONEncoder extension mechanism prevent us from “escaping” dictionaries that only contain the swhtype and d keys, and therefore arbitrary data structures can’t be round-tripped through SWHJSONEncoder and SWHJSONDecoder.

Constructor for JSONEncoder, with sensible defaults.

If skipkeys is false, then it is a TypeError to attempt encoding of keys that are not str, int, float or None. If skipkeys is True, such items are simply skipped.

If ensure_ascii is true, the output is guaranteed to be str objects with all incoming non-ASCII characters escaped. If ensure_ascii is false, the output can contain non-ASCII characters.

If check_circular is true, then lists, dicts, and custom encoded objects will be checked for circular references during encoding to prevent an infinite recursion (which would cause an RecursionError). Otherwise, no such check takes place.

If allow_nan is true, then NaN, Infinity, and -Infinity will be encoded as such. This behavior is not JSON specification compliant, but is consistent with most JavaScript based encoders and decoders. Otherwise, it will be a ValueError to encode such floats.

If sort_keys is true, then the output of dictionaries will be sorted by key; this is useful for regression tests to ensure that JSON serializations can be compared on a day-to-day basis.

If indent is a non-negative integer, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0 will only insert newlines. None is the most compact representation.

If specified, separators should be an (item_separator, key_separator) tuple. The default is (’, ‘, ‘: ‘) if indent is None and (‘,’, ‘: ‘) otherwise. To get the most compact JSON representation, you should specify (‘,’, ‘:’) to eliminate whitespace.

If specified, default is a function that gets called for objects that can’t otherwise be serialized. It should return a JSON encodable version of the object or raise a TypeError.

default(o: Any) Dict[str, Dict[str, int] | str] | list[source]#

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
        iterable = iter(o)
    except TypeError:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)
class swh.core.api.serializers.SWHJSONDecoder(extra_decoders=None, **kwargs)[source]#

Bases: JSONDecoder

JSON decoder for data structures encoded with SWHJSONEncoder.

This JSON decoder extends the default Python JSON decoder, allowing the decoding of:

  • bytes (encoded as a Base85 string);

  • datetime.datetime (encoded as an ISO8601 string).

Non-standard types must be encoded as a a dictionary with exactly two keys:

  • swhtype with value ‘bytes’ or ‘datetime’;

  • d containing the encoded value.

To limit the impact our encoding, if the swhtype key doesn’t contain a known value, the dictionary is decoded as-is.

object_hook, if specified, will be called with the result of every JSON object decoded and its return value will be used in place of the given dict. This can be used to provide custom deserializations (e.g. to support JSON-RPC class hinting).

object_pairs_hook, if specified will be called with the result of every JSON object decoded with an ordered list of pairs. The return value of object_pairs_hook will be used instead of the dict. This feature can be used to implement custom decoders. If object_hook is also defined, the object_pairs_hook takes priority.

parse_float, if specified, will be called with the string of every JSON float to be decoded. By default this is equivalent to float(num_str). This can be used to use another datatype or parser for JSON floats (e.g. decimal.Decimal).

parse_int, if specified, will be called with the string of every JSON int to be decoded. By default this is equivalent to int(num_str). This can be used to use another datatype or parser for JSON integers (e.g. float).

parse_constant, if specified, will be called with one of the following strings: -Infinity, Infinity, NaN. This can be used to raise an exception if invalid JSON numbers are encountered.

If strict is false (true is the default), then control characters will be allowed inside strings. Control characters in this context are those with character codes in the 0-31 range, including '\t' (tab), '\n', '\r' and '\0'.

decode_data(o: Any) Any[source]#
raw_decode(s: str, idx: int = 0) Tuple[Any, int][source]#

Decode a JSON document from s (a str beginning with a JSON document) and return a 2-tuple of the Python representation and the index in s where the document ended.

This can be used to decode a JSON document from a string that may have extraneous data at the end.

swh.core.api.serializers.json_dumps(data: Any, extra_encoders=None) str[source]#
swh.core.api.serializers.json_loads(data: str, extra_decoders=None) Any[source]#
swh.core.api.serializers.msgpack_dumps(data: Any, extra_encoders=None) bytes[source]#

Write data as a msgpack stream

swh.core.api.serializers.msgpack_loads(data: bytes, extra_decoders=None) Any[source]#

Read data as a msgpack stream.


This function is used by swh.journal to decode the contents of the journal. This function must be kept backwards-compatible.