swh.storage.proxies.blocking.db module#

class swh.storage.proxies.blocking.db.BlockingState(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: Enum

Value recording “how much” an url associated to a blocking request is blocked


The origin url can be ingested/updated


Ingestion from origin url is temporarily blocked until the request is reviewed


Ingestion from origin url is permanently blocked

class swh.storage.proxies.blocking.db.BlockingStatus(state: BlockingState, request: UUID)[source]#

Bases: object

Return value when requesting if an origin url ingestion is blocked

Method generated by attrs for class BlockingStatus.

class swh.storage.proxies.blocking.db.BlockingRequest(id: UUID, slug: str, date: datetime, reason: str)[source]#

Bases: object

A request for blocking a set of origins from being ingested

Method generated by attrs for class BlockingRequest.


Unique id for the request (will be returned to requesting clients)


Unique, human-readable id for the request (for administrative interactions)


Date the request was received


Why the request was made

class swh.storage.proxies.blocking.db.RequestHistory(request: UUID, date: datetime, message: str)[source]#

Bases: object

Method generated by attrs for class RequestHistory.


id of the blocking request


Date the history entry has been added


Free-form history information (e.g. “policy decision made”)

class swh.storage.proxies.blocking.db.BlockingLogEntry(url: str, url_match: str, request: UUID, date: datetime, state: BlockingState)[source]#

Bases: object

Method generated by attrs for class BlockingLogEntry.


origin url that have been blocked


url matching pattern that caused the blocking of the origin url


id of the blocking request


Date the blocking event occurred


Blocking state responsible for the blocking event

class swh.storage.proxies.blocking.db.BlockedOrigin(request_slug: str, url_pattern: str, state: BlockingState)[source]#

Bases: object

Method generated by attrs for class BlockedOrigin.

class swh.storage.proxies.blocking.db.BlockingDb(*args, **kwargs)[source]#

Bases: BaseDb

create a DB proxy

  • conn – psycopg2 connection to the SWH DB

  • pool – psycopg2 pool of connections

current_version = 1#
swh.storage.proxies.blocking.db.get_urls_to_check(url: str) Tuple[List[str], List[str]][source]#

Get the entries to check in the database for the given url, in order.

Exact matching is done on the following strings, in order:
  • the url with any trailing slashes removed (the so-called “trimmed url”);

  • the url passed exactly;

  • if the trimmed url ends with a dot and one of the KNOWN_SUFFIXES, the url with this suffix stripped.

The prefix matching is done by splitting the path part of the URL on slashes, and successively removing the last elements.


A tuple with a list of exact matches, and a list of prefix matches

class swh.storage.proxies.blocking.db.BlockingAdmin(*args, **kwargs)[source]#

Bases: BlockingDb

create a DB proxy

  • conn – psycopg2 connection to the SWH DB

  • pool – psycopg2 pool of connections

create_request(slug: str, reason: str) BlockingRequest[source]#

Record a new blocking request

  • slug – human-readable unique identifier for the request

  • reason – free-form text recording why the request was made


DuplicateRequest when the slug already exists

find_request(slug: str) BlockingRequest | None[source]#

Find a blocking request using its slug

Returns: None if a request with the given slug doesn’t exist

find_request_by_id(id: UUID) BlockingRequest | None[source]#

Find a blocking request using its id

Returns: None if a request with the given request doesn’t exist

get_requests(include_cleared_requests: bool = False) List[Tuple[BlockingRequest, int]][source]#

Get known requests

  • include_cleared_requests – also include requests with no associated

  • states (blocking)

set_origins_state(request_id: UUID, new_state: BlockingState, urls: List[str])[source]#

Within the request with the given id, record the state of the given objects as new_state.

This creates entries or updates them as appropriate.

Raises: RequestNotFound if the request is not found.

get_states_for_request(request_id: UUID) Dict[str, BlockingState][source]#

Get the state of urls associated with the given request.

Raises RequestNotFound if the request is not found.

find_blocking_states(urls: List[str]) List[BlockedOrigin][source]#

Lookup the blocking state and associated requests for the given urls (exact match).

delete_blocking_states(request_id: UUID) None[source]#

Remove all blocking states for the given request.

Raises: RequestNotFound if the request is not found.

record_history(request_id: UUID, message: str) RequestHistory[source]#

Add an entry to the history of the given request.

Raises: RequestNotFound if the request is not found.

get_history(request_id: UUID) List[RequestHistory][source]#

Get the history of a given request.

Raises: RequestNotFound if the request if not found.

get_log(request_id: UUID | None = None, url: str | None = None) List[BlockingLogEntry][source]#
class swh.storage.proxies.blocking.db.BlockingQuery(*args, **kwargs)[source]#

Bases: BlockingDb

create a DB proxy

  • conn – psycopg2 connection to the SWH DB

  • pool – psycopg2 pool of connections

origins_are_blocked(urls: List[str], all_statuses=False) Dict[str, BlockingStatus][source]#

Return the blocking status for eeach origin url given in urls

If all_statuses is False, do not return urls whose blocking status is defined as NON_BLOCKING (so only return actually blocked urls). Otherwise, return all matching blocking status.

origin_is_blocked(url: str) BlockingStatus | None[source]#

Checks if the origin URL should be blocked.

If the given url matches a set of registered blocking rules, return the most appropriate one. Otherwise, return None.

Log the blocking event in the database (log only a matching events).