swh.dataset.exporters.orc module#
- class swh.dataset.exporters.orc.SWHTimestampConverter[source]#
Bases:
object
This is an ORCConverter compatible class to convert timestamps from/to ORC files
timestamps in python are given as a couple (seconds, microseconds) and are serialized as a couple (seconds, nanoseconds) in the ORC file.
Reimplemented because we do not want the Python object to be converted as ORC timestamp to be Python datatime objects, since swh.model’s Timestamp cannot be converted without loss a Python datetime objects.
- class swh.dataset.exporters.orc.ORCExporter(*args, **kwargs)[source]#
Bases:
ExporterDispatch
Implementation of an exporter which writes the entire graph dataset as ORC files. Useful for large scale processing, notably on cloud instances (e.g BigQuery, Amazon Athena, Azure).