swh.storage.proxies.blocking.db module#
- class swh.storage.proxies.blocking.db.BlockingState(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Bases:
Enum
Value recording “how much” an url associated to a blocking request is blocked
- NON_BLOCKED = 1#
The origin url can be ingested/updated
- DECISION_PENDING = 2#
Ingestion from origin url is temporarily blocked until the request is reviewed
- BLOCKED = 3#
Ingestion from origin url is permanently blocked
- class swh.storage.proxies.blocking.db.BlockingStatus(state: BlockingState, request: UUID)[source]#
Bases:
object
Return value when requesting if an origin url ingestion is blocked
Method generated by attrs for class BlockingStatus.
- class swh.storage.proxies.blocking.db.BlockingRequest(id: UUID, slug: str, date: datetime, reason: str)[source]#
Bases:
object
A request for blocking a set of origins from being ingested
Method generated by attrs for class BlockingRequest.
- id#
Unique id for the request (will be returned to requesting clients)
- slug#
Unique, human-readable id for the request (for administrative interactions)
- date#
Date the request was received
- reason#
Why the request was made
- class swh.storage.proxies.blocking.db.RequestHistory(request: UUID, date: datetime, message: str)[source]#
Bases:
object
Method generated by attrs for class RequestHistory.
- request#
id of the blocking request
- date#
Date the history entry has been added
- message#
Free-form history information (e.g. “policy decision made”)
- class swh.storage.proxies.blocking.db.BlockingLogEntry(url: str, url_match: str, request: UUID, date: datetime, state: BlockingState)[source]#
Bases:
object
Method generated by attrs for class BlockingLogEntry.
- url#
origin url that have been blocked
- url_match#
url matching pattern that caused the blocking of the origin url
- request#
id of the blocking request
- date#
Date the blocking event occurred
- state#
Blocking state responsible for the blocking event
- class swh.storage.proxies.blocking.db.BlockedOrigin(request_slug: str, url_pattern: str, state: BlockingState)[source]#
Bases:
object
Method generated by attrs for class BlockedOrigin.
- class swh.storage.proxies.blocking.db.BlockingDb(*args, **kwargs)[source]#
Bases:
BaseDb
create a DB proxy
- Parameters:
conn – psycopg2 connection to the SWH DB
pool – psycopg2 pool of connections
- current_version = 1#
- swh.storage.proxies.blocking.db.get_urls_to_check(url: str) Tuple[List[str], List[str]] [source]#
Get the entries to check in the database for the given url, in order.
- Exact matching is done on the following strings, in order:
the url with any trailing slashes removed (the so-called “trimmed url”);
the url passed exactly;
if the trimmed url ends with a dot and one of the
KNOWN_SUFFIXES
, the url with this suffix stripped.
The prefix matching is done by splitting the path part of the URL on slashes, and successively removing the last elements.
- Returns:
A tuple with a list of exact matches, and a list of prefix matches
- class swh.storage.proxies.blocking.db.BlockingAdmin(*args, **kwargs)[source]#
Bases:
BlockingDb
create a DB proxy
- Parameters:
conn – psycopg2 connection to the SWH DB
pool – psycopg2 pool of connections
- create_request(slug: str, reason: str) BlockingRequest [source]#
Record a new blocking request
- Parameters:
slug – human-readable unique identifier for the request
reason – free-form text recording why the request was made
- Raises:
DuplicateRequest when the slug already exists –
- find_request(slug: str) BlockingRequest | None [source]#
Find a blocking request using its slug
Returns:
None
if a request with the given slug doesn’t exist
- find_request_by_id(id: UUID) BlockingRequest | None [source]#
Find a blocking request using its id
Returns:
None
if a request with the given request doesn’t exist
- get_requests(include_cleared_requests: bool = False) List[Tuple[BlockingRequest, int]] [source]#
Get known requests
- Parameters:
include_cleared_requests – also include requests with no associated
states (blocking)
- set_origins_state(request_id: UUID, new_state: BlockingState, urls: List[str])[source]#
Within the request with the given id, record the state of the given objects as
new_state
.This creates entries or updates them as appropriate.
Raises:
RequestNotFound
if the request is not found.
- get_states_for_request(request_id: UUID) Dict[str, BlockingState] [source]#
Get the state of urls associated with the given request.
Raises
RequestNotFound
if the request is not found.
- find_blocking_states(urls: List[str]) List[BlockedOrigin] [source]#
Lookup the blocking state and associated requests for the given urls (exact match).
- delete_blocking_states(request_id: UUID) None [source]#
Remove all blocking states for the given request.
Raises:
RequestNotFound
if the request is not found.
- record_history(request_id: UUID, message: str) RequestHistory [source]#
Add an entry to the history of the given request.
Raises:
RequestNotFound
if the request is not found.
- get_history(request_id: UUID) List[RequestHistory] [source]#
Get the history of a given request.
Raises:
RequestNotFound
if the request if not found.
- class swh.storage.proxies.blocking.db.BlockingQuery(*args, **kwargs)[source]#
Bases:
BlockingDb
create a DB proxy
- Parameters:
conn – psycopg2 connection to the SWH DB
pool – psycopg2 pool of connections
- origins_are_blocked(urls: List[str], all_statuses=False) Dict[str, BlockingStatus] [source]#
Return the blocking status for eeach origin url given in urls
If all_statuses is False, do not return urls whose blocking status is defined as NON_BLOCKING (so only return actually blocked urls). Otherwise, return all matching blocking status.
- origin_is_blocked(url: str) BlockingStatus | None [source]#
Checks if the origin URL should be blocked.
If the given url matches a set of registered blocking rules, return the most appropriate one. Otherwise, return None.
Log the blocking event in the database (log only a matching events).