swh.dataset.exporter module#

class swh.dataset.exporter.Exporter(config: Dict[str, Any], export_path, *args: Any, **kwargs: Any)[source]#

Bases: object

Base class for all the exporters.

Each export can have multiple exporters, so we can read the journal a single time, then export the objects we read in different formats without having to re-read them every time.

Override this class with the behavior for an export in a specific export format. You have to overwrite process_object() to make it write to the appropriate export files.

You can also put setup and teardown logic in __enter__ and __exit__, and it will be called automatically.

process_object(object_type: str, obj: Dict[str, Any]) None[source]#

Process a SWH object to export.

Override this with your custom exporter.

get_unique_file_id() str[source]#

Return a unique random file id for the current process.

If config[‘test_unique_file_id’] is set, it will be used instead.

class swh.dataset.exporter.ExporterDispatch(config: Dict[str, Any], export_path, *args: Any, **kwargs: Any)[source]#

Bases: Exporter

Like Exporter, but dispatches each object type to a different function (e.g you can override process_origin(self, object) to process origins.)

process_object(object_type: str, obj: Dict[str, Any]) None[source]#

Process a SWH object to export.

Override this with your custom exporter.