swh.scheduler.cli.origin_utils module#
Defines the <swh scheduler origin send-origins-from-file-to-celery> cli utility functions. This uses a list of origins read from the standard input or file, massage them into scheduler tasks to send directly to celery to a queue (according to a task type specified).
The list of origins has been extracted by other means (e.g. sentry extract, combination of various shell scripts, …). Then, a human operator provides the list to the cli so it’s consumed by standard swh queues (understand scheduler configured backend).
- swh.scheduler.cli.origin_utils.get_scheduler_task_type(scheduler: SchedulerInterface, task_type_name: str) TaskType [source]#
Retrieve a TaskType instance for a task type name from the scheduler.
- Parameters:
scheduler – Scheduler instance to lookup data from
task_type_name – The task type name to lookup
- Raises:
ValueError when task_type_name or its fallback are not found. –
- Returns:
Information about the task type
- swh.scheduler.cli.origin_utils.lines_to_task_args(lines: Iterable[str], columns: List[str] = ['url'], postprocess: Callable[[Dict[str, Any]], Dict[str, Any]] | None = None, **kwargs) Iterator[Dict[str, Any]] [source]#
Iterate over the lines and convert them into celery tasks ready to be sent.
- Parameters:
lines – Line read from a file or stdin
columns – structure of the lines to be read (usually only the url column)
postprocess – An optional callable to enrich the task with
**kwargs – extra static arguments to enrich the task with
- Yields:
task ready to be sent to celery