swh.dataset.athena module#
This module implements the “athena” subcommands for the CLI. It can install and query a remote AWS Athena database.
- swh.dataset.athena.query(client, query_string, *, desc='Querying', delay_secs=0.5, silent=False)[source]#
- swh.dataset.athena.create_tables(database_name, dataset_location, output_location=None, replace=False)[source]#
Create the Software Heritage Dataset tables on AWS Athena.
Athena works on external columnar data stored in S3, but requires a schema for each table to run queries. This creates all the necessary tables remotely by using the relational schemas in swh.dataset.relational.
- swh.dataset.athena.human_size(n, units=['bytes', 'KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB'])[source]#
Returns a human readable string representation of bytes