swh.storage.proxies.blocking.db module#

class swh.storage.proxies.blocking.db.BlockingState(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: Enum

Value recording “how much” an url associated to a blocking request is blocked

NON_BLOCKED = 1#

The origin url can be ingested/updated

DECISION_PENDING = 2#

Ingestion from origin url is temporarily blocked until the request is reviewed

BLOCKED = 3#

Ingestion from origin url is permanently blocked

class swh.storage.proxies.blocking.db.BlockingStatus(state: BlockingState, request: UUID)[source]#

Bases: object

Return value when requesting if an origin url ingestion is blocked

Method generated by attrs for class BlockingStatus.

class swh.storage.proxies.blocking.db.BlockingRequest(id: UUID, slug: str, date: datetime, reason: str)[source]#

Bases: object

A request for blocking a set of origins from being ingested

Method generated by attrs for class BlockingRequest.

id#

Unique id for the request (will be returned to requesting clients)

slug#

Unique, human-readable id for the request (for administrative interactions)

date#

Date the request was received

reason#

Why the request was made

class swh.storage.proxies.blocking.db.RequestHistory(request: UUID, date: datetime, message: str)[source]#

Bases: object

Method generated by attrs for class RequestHistory.

request#

id of the blocking request

date#

Date the history entry has been added

message#

Free-form history information (e.g. “policy decision made”)

class swh.storage.proxies.blocking.db.BlockingLogEntry(url: str, url_match: str, request: UUID, date: datetime, state: BlockingState)[source]#

Bases: object

Method generated by attrs for class BlockingLogEntry.

url#

origin url that have been blocked

url_match#

url matching pattern that caused the blocking of the origin url

request#

id of the blocking request

date#

Date the blocking event occurred

state#

Blocking state responsible for the blocking event

class swh.storage.proxies.blocking.db.BlockedOrigin(request_slug: str, url_pattern: str, state: BlockingState)[source]#

Bases: object

Method generated by attrs for class BlockedOrigin.

class swh.storage.proxies.blocking.db.BlockingDb(*args, **kwargs)[source]#

Bases: BaseDb

create a DB proxy

Parameters:
  • conn – psycopg2 connection to the SWH DB

  • pool – psycopg2 pool of connections

current_version = 1#
swh.storage.proxies.blocking.db.get_urls_to_check(url: str) Tuple[List[str], List[str]][source]#

Get the entries to check in the database for the given url, in order.

Exact matching is done on the following strings, in order:
  • the url with any trailing slashes removed (the so-called “trimmed url”);

  • the url passed exactly;

  • if the trimmed url ends with a dot and one of the KNOWN_SUFFIXES, the url with this suffix stripped.

The prefix matching is done by splitting the path part of the URL on slashes, and successively removing the last elements.

Returns:

A tuple with a list of exact matches, and a list of prefix matches

class swh.storage.proxies.blocking.db.BlockingAdmin(*args, **kwargs)[source]#

Bases: BlockingDb

create a DB proxy

Parameters:
  • conn – psycopg2 connection to the SWH DB

  • pool – psycopg2 pool of connections

create_request(slug: str, reason: str) BlockingRequest[source]#

Record a new blocking request

Parameters:
  • slug – human-readable unique identifier for the request

  • reason – free-form text recording why the request was made

Raises:

DuplicateRequest when the slug already exists

find_request(slug: str) BlockingRequest | None[source]#

Find a blocking request using its slug

Returns: None if a request with the given slug doesn’t exist

find_request_by_id(id: UUID) BlockingRequest | None[source]#

Find a blocking request using its id

Returns: None if a request with the given request doesn’t exist

get_requests(include_cleared_requests: bool = False) List[Tuple[BlockingRequest, int]][source]#

Get known requests

Parameters:
  • include_cleared_requests – also include requests with no associated

  • states (blocking)

set_origins_state(request_id: UUID, new_state: BlockingState, urls: List[str])[source]#

Within the request with the given id, record the state of the given objects as new_state.

This creates entries or updates them as appropriate.

Raises: RequestNotFound if the request is not found.

get_states_for_request(request_id: UUID) Dict[str, BlockingState][source]#

Get the state of urls associated with the given request.

Raises RequestNotFound if the request is not found.

find_blocking_states(urls: List[str]) List[BlockedOrigin][source]#

Lookup the blocking state and associated requests for the given urls (exact match).

delete_blocking_states(request_id: UUID) None[source]#

Remove all blocking states for the given request.

Raises: RequestNotFound if the request is not found.

record_history(request_id: UUID, message: str) RequestHistory[source]#

Add an entry to the history of the given request.

Raises: RequestNotFound if the request is not found.

get_history(request_id: UUID) List[RequestHistory][source]#

Get the history of a given request.

Raises: RequestNotFound if the request if not found.

get_log(request_id: UUID | None = None, url: str | None = None) List[BlockingLogEntry][source]#
class swh.storage.proxies.blocking.db.BlockingQuery(*args, **kwargs)[source]#

Bases: BlockingDb

create a DB proxy

Parameters:
  • conn – psycopg2 connection to the SWH DB

  • pool – psycopg2 pool of connections

origins_are_blocked(urls: List[str], all_statuses=False) Dict[str, BlockingStatus][source]#

Return the blocking status for eeach origin url given in urls

If all_statuses is False, do not return urls whose blocking status is defined as NON_BLOCKING (so only return actually blocked urls). Otherwise, return all matching blocking status.

origin_is_blocked(url: str) BlockingStatus | None[source]#

Checks if the origin URL should be blocked.

If the given url matches a set of registered blocking rules, return the most appropriate one. Otherwise, return None.

Log the blocking event in the database (log only a matching events).