swh.scheduler.cli.origin_utils module#

Defines the <swh scheduler origin send-origins-from-file-to-celery> cli utility functions. This uses a list of origins read from the standard input or file, massage them into scheduler tasks to send directly to celery to a queue (according to a task type specified).

The list of origins has been extracted by other means (e.g. sentry extract, combination of various shell scripts, …). Then, a human operator provides the list to the cli so it’s consumed by standard swh queues (understand scheduler configured backend).

swh.scheduler.cli.origin_utils.get_scheduler_task_info(scheduler: SchedulerInterface, task_type: str) Dict[source]#

Retrieve information on task_type from the scheduler.

Parameters:
  • scheduler – Scheduler instance to lookup data from

  • task_type – The task type to lookup

Raises:

ValueError when task_type and its fallback is not found.

Returns:

Dict of information for the task type

swh.scheduler.cli.origin_utils.lines_to_task_args(lines: Iterable[str], columns: List[str] = ['url'], postprocess: Callable[[Dict[str, Any]], Dict[str, Any]] | None = None, **kwargs) Iterator[Dict[str, Any]][source]#

Iterate over the lines and convert them into celery tasks ready to be sent.

Parameters:
  • lines – Line read from a file or stdin

  • columns – structure of the lines to be read (usually only the url column)

  • postprocess – An optional callable to enrich the task with

  • **kwargs – extra static arguments to enrich the task with

Yields:

task ready to be sent to celery