swh.web.common package

Submodules

swh.web.common.apps module

class swh.web.common.apps.SwhWebCommonConfig(app_name, app_module)[source]

Bases: django.apps.config.AppConfig

name = 'swh.web.common'
label = 'swh.web.common'

swh.web.common.converters module

swh.web.common.converters.fmap(f, data)[source]

Map f to data at each level.

This must keep the origin data structure type: - map -> map - dict -> dict - list -> list - None -> None

Parameters:
  • f – function that expects one argument.
  • data – data to traverse to apply the f function. list, map, dict or bare value.
Returns:

The same data-structure with modified values by the f function.

swh.web.common.converters.from_swh(dict_swh, hashess={}, bytess={}, dates={}, blacklist={}, removables_if_empty={}, empty_dict={}, empty_list={}, convert={}, convert_fn=<function <lambda>>)[source]

Convert from a swh dictionary to something reasonably json serializable.

Parameters:
  • dict_swh – the origin dictionary needed to be transformed
  • hashess – list/set of keys representing hashes values (sha1, sha256, sha1_git, etc…) as bytes. Those need to be transformed in hexadecimal string
  • bytess – list/set of keys representing bytes values which needs to be decoded
  • blacklist – set of keys to filter out from the conversion
  • convert – set of keys whose associated values need to be converted using convert_fn
  • convert_fn – the conversion function to apply on the value of key in ‘convert’

The remaining keys are copied as is in the output.

Returns:dictionary equivalent as dict_swh only with its keys converted.
swh.web.common.converters.from_origin(origin)[source]

Convert from a swh origin to an origin dictionary.

swh.web.common.converters.from_release(release)[source]

Convert from a swh release to a json serializable release dictionary.

Parameters:
  • release (dict) –

    dictionary with keys:

    • id: identifier of the revision (sha1 in bytes)
    • revision: identifier of the revision the release points to (sha1 in bytes)
  • comment – release’s comment message (bytes)
  • name – release’s name (string)
  • author – release’s author identifier (swh’s id)
  • synthetic – the synthetic property (boolean)
Returns:

Release dictionary with the following keys:

  • id: hexadecimal sha1 (string)
  • revision: hexadecimal sha1 (string)
  • comment: release’s comment message (string)
  • name: release’s name (string)
  • author: release’s author identifier (swh’s id)
  • synthetic: the synthetic property (boolean)

Return type:

dict

class swh.web.common.converters.SWHMetadataEncoder(skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]

Bases: json.encoder.JSONEncoder

Special json encoder for metadata field which can contain bytes encoded value.

default(obj)[source]

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)
swh.web.common.converters.convert_revision_metadata(metadata)[source]

Convert json specific dict to a json serializable one.

swh.web.common.converters.from_revision(revision)[source]

Convert from a swh revision to a json serializable revision dictionary.

Parameters:revision (dict) –

dict with keys:

  • id: identifier of the revision (sha1 in bytes)
  • directory: identifier of the directory the revision points to (sha1 in bytes)
  • author_name, author_email: author’s revision name and email
  • committer_name, committer_email: committer’s revision name and email
  • message: revision’s message
  • date, date_offset: revision’s author date
  • committer_date, committer_date_offset: revision’s commit date
  • parents: list of parents for such revision
  • synthetic: revision’s property nature
  • type: revision’s type (git, tar or dsc at the moment)
  • metadata: if the revision is synthetic, this can reference dynamic properties.
Returns:Revision dictionary with the same keys as inputs, except:
  • sha1s are in hexadecimal strings (id, directory)
  • bytes are decoded in string (author_name, committer_name, author_email, committer_email)

Remaining keys are left as is

Return type:dict
swh.web.common.converters.from_content(content)[source]

Convert swh content to serializable content dictionary.

swh.web.common.converters.from_person(person)[source]

Convert swh person to serializable person dictionary.

swh.web.common.converters.from_origin_visit(visit)[source]

Convert swh origin_visit to serializable origin_visit dictionary.

swh.web.common.converters.from_snapshot(snapshot)[source]

Convert swh snapshot to serializable snapshot dictionary.

swh.web.common.converters.from_directory_entry(dir_entry)[source]

Convert swh person to serializable person dictionary.

swh.web.common.converters.from_filetype(content_entry)[source]

Convert swh person to serializable person dictionary.

swh.web.common.exc module

exception swh.web.common.exc.BadInputExc[source]

Bases: ValueError

Wrong request to the api.

Example: Asking a content with the wrong identifier format.

exception swh.web.common.exc.NotFoundExc[source]

Bases: Exception

Good request to the api but no result were found.

Example: Asking a content with the right identifier format but that content does not exist.

exception swh.web.common.exc.ForbiddenExc[source]

Bases: Exception

Good request to the api, forbidden result to return due to enforce
policy.

Example: Asking for a raw content which exists but whose mimetype is not text.

swh.web.common.exc.swh_handle400(request)[source]

Custom Django HTTP error 400 handler for swh-web.

swh.web.common.exc.swh_handle403(request)[source]

Custom Django HTTP error 403 handler for swh-web.

swh.web.common.exc.swh_handle404(request)[source]

Custom Django HTTP error 404 handler for swh-web.

swh.web.common.exc.swh_handle500(request)[source]

Custom Django HTTP error 500 handler for swh-web.

swh.web.common.exc.handle_view_exception(request, exc, html_response=True)[source]

Function used to generate an error page when an exception was raised inside a swh-web browse view.

swh.web.common.highlightjs module

swh.web.common.highlightjs.get_hljs_language_from_filename(filename)[source]

Function that tries to associate a language supported by highlight.js from a filename.

Parameters:filename – input filename
Returns:highlight.js language id or None if no correspondence has been found
swh.web.common.highlightjs.get_hljs_language_from_mime_type(mime_type)[source]

Function that tries to associate a language supported by highlight.js from a mime type.

Parameters:mime_type – input mime type
Returns:highlight.js language id or None if no correspondence has been found

swh.web.common.middlewares module

class swh.web.common.middlewares.HtmlPrettifyMiddleware(get_response)[source]

Bases: object

Django middleware for prettifying generated HTML in development mode.

class swh.web.common.middlewares.HtmlMinifyMiddleware(get_response=None)[source]

Bases: object

Django middleware for minifying generated HTML in production mode.

class swh.web.common.middlewares.ThrottlingHeadersMiddleware(get_response=None)[source]

Bases: object

Django middleware for inserting rate limiting related headers in HTTP response.

swh.web.common.models module

class swh.web.common.models.SaveAuthorizedOrigin(*args, **kwargs)[source]

Bases: django.db.models.base.Model

Model table holding origin urls authorized to be loaded into the archive.

url

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

exception DoesNotExist

Bases: django.core.exceptions.ObjectDoesNotExist

exception MultipleObjectsReturned

Bases: django.core.exceptions.MultipleObjectsReturned

id

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

objects = <django.db.models.manager.Manager object>
class swh.web.common.models.SaveUnauthorizedOrigin(*args, **kwargs)[source]

Bases: django.db.models.base.Model

Model table holding origin urls not authorized to be loaded into the archive.

url

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

exception DoesNotExist

Bases: django.core.exceptions.ObjectDoesNotExist

exception MultipleObjectsReturned

Bases: django.core.exceptions.MultipleObjectsReturned

id

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

objects = <django.db.models.manager.Manager object>
class swh.web.common.models.SaveOriginRequest(*args, **kwargs)[source]

Bases: django.db.models.base.Model

Model table holding all the save origin requests issued by users.

id

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

request_date

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

origin_type

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

origin_url

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

status

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

loading_task_id

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

visit_date

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

loading_task_status

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

exception DoesNotExist

Bases: django.core.exceptions.ObjectDoesNotExist

exception MultipleObjectsReturned

Bases: django.core.exceptions.MultipleObjectsReturned

get_loading_task_status_display(**morekwargs)
get_next_by_request_date(**morekwargs)
get_previous_by_request_date(**morekwargs)
get_status_display(**morekwargs)
objects = <django.db.models.manager.Manager object>

swh.web.common.origin_save module

swh.web.common.origin_save.get_origin_save_authorized_urls()[source]

Get the list of origin url prefixes authorized to be immediately loaded into the archive (whitelist).

Returns:The list of authorized origin url prefix
Return type:list
swh.web.common.origin_save.get_origin_save_unauthorized_urls()[source]

Get the list of origin url prefixes forbidden to be loaded into the archive (blacklist).

Returns:the list of unauthorized origin url prefix
Return type:list
swh.web.common.origin_save.can_save_origin(origin_url)[source]

Check if a software origin can be saved into the archive.

Based on the origin url, the save request will be either:

  • immediately accepted if the url is whitelisted
  • rejected if the url is blacklisted
  • put in pending state for manual review otherwise
Parameters:origin_url (str) – the software origin url to check
Returns:the origin save request status, either accepted, rejected or pending
Return type:str
swh.web.common.origin_save.get_savable_origin_types()[source]
swh.web.common.origin_save.create_save_origin_request(origin_type, origin_url)[source]

Create a loading task to save a software origin into the archive.

This function aims to create a software origin loading task trough the use of the swh-scheduler component.

First, some checks are performed to see if the origin type and url are valid but also if the the save request can be accepted. If those checks passed, the loading task is then created. Otherwise, the save request is put in pending or rejected state.

All the submitted save requests are logged into the swh-web database to keep track of them.

Parameters:
  • origin_type (str) – the type of origin to save (currently only git but svn and hg will soon be available)
  • origin_url (str) – the url of the origin to save
Raises:
  • BadInputExc – the origin type or url is invalid
  • ForbiddenExc – the provided origin url is blacklisted
Returns:

A dict describing the save request with the following keys:

  • origin_type: the type of the origin to save
  • origin_url: the url of the origin
  • save_request_date: the date the request was submitted
  • save_request_status: the request status, either accepted, rejected or pending
  • save_task_status: the origin loading task status, either not created, not yet scheduled, scheduled, succeed or failed

Return type:

dict

swh.web.common.origin_save.get_save_origin_requests_from_queryset(requests_queryset)[source]

Get all save requests from a SaveOriginRequest queryset.

Parameters:requests_queryset (django.db.models.QuerySet) – input SaveOriginRequest queryset
Returns:A list of save origin requests dict as described in swh.web.common.origin_save.create_save_origin_request()
Return type:list
swh.web.common.origin_save.get_save_origin_requests(origin_type, origin_url)[source]

Get all save requests for a given software origin.

Parameters:
  • origin_type (str) – the type of the origin
  • origin_url (str) – the url of the origin
Raises:

BadInputExc – the origin type or url is invalid

Returns:

A list of save origin requests dict as described in swh.web.common.origin_save.create_save_origin_request()

Return type:

list

swh.web.common.origin_visits module

swh.web.common.origin_visits.get_origin_visits(origin_info)[source]

Function that returns the list of visits for a swh origin. That list is put in cache in order to speedup the navigation in the swh web browse ui.

Parameters:origin_id (int) – the id of the swh origin to fetch visits from
Returns:A list of dict describing the origin visits with the following keys:
  • date: UTC visit date in ISO format,
  • origin: the origin id
  • status: the visit status, either full, partial or ongoing
  • visit: the visit id
Return type:list
Raises:NotFoundExc – if the origin is not found
swh.web.common.origin_visits.get_origin_visit(origin_info, visit_ts=None, visit_id=None, snapshot_id=None)[source]

Function that returns information about a visit for a given origin. The visit is retrieved from a provided timestamp. The closest visit from that timestamp is selected.

Parameters:
  • origin_info (dict) – a dict filled with origin information (id, url, type)
  • visit_ts (int or str) – an ISO date string or Unix timestamp to parse
Returns:

A dict containing the visit info as described below:

{'origin': 2,
 'date': '2017-10-08T11:54:25.582463+00:00',
 'metadata': {},
 'visit': 25,
 'status': 'full'}

swh.web.common.query module

swh.web.common.query.parse_hash(q)[source]

Detect the hash type of a user submitted query string.

Parameters:
  • string with the following format (query) – “[HASH_TYPE:]HEX_CHECKSUM”,
  • HASH_TYPE is optional, defaults to "sha1", and can be one of (where) –
  • swh.model.hashutil.ALGORITHMS
Returns:

A pair (hash_algorithm, byte hash value)

Raises:
  • ValueError if the given query string does not correspond to a valid
  • hash value
swh.web.common.query.parse_hash_with_algorithms_or_throws(q, accepted_algo, error_msg)[source]

Parse a query but only accepts accepted_algo. Otherwise, raise the exception with message error_msg.

Parameters:
  • q (-) – query string with the following format: “[HASH_TYPE:]HEX_CHECKSUM”
  • HASH_TYPE is optional, defaults to "sha1", and can be one of (where) –
  • swh.model.hashutil.ALGORITHMS.
  • accepted_algo (-) – array of strings representing the names of accepted
  • algorithms.
  • error_msg (-) – error message to raise as BadInputExc if the algo of
  • query does not match. (the) –
Returns:

A pair (hash_algorithm, byte hash value)

Raises:
  • BadInputExc when the inputs is invalid or does not
  • validate the accepted algorithms.
swh.web.common.query.parse_uuid4(uuid)[source]

Parse an uuid 4 from a string.

Parameters:uuid – String representing an uuid.
Returns:The uuid as is if everything is ok.
Raises:BadInputExc – if the uuid is invalid.

swh.web.common.service module

swh.web.common.service.lookup_multiple_hashes(hashes)[source]

Lookup the passed hashes in a single DB connection, using batch processing.

Parameters:array of {filename (An) – X, sha1: Y}, string X, hex sha1 string Y.
Returns:The same array with elements updated with elem[‘found’] = true if the hash is present in storage, elem[‘found’] = false if not.
swh.web.common.service.lookup_expression(expression, last_sha1, per_page)[source]

Lookup expression in raw content.

Parameters:
  • expression (str) – An expression to lookup through raw indexed
  • content
  • last_sha1 (str) – Last sha1 seen
  • per_page (int) – Number of results per page
Yields:

ctags whose content match the expression

swh.web.common.service.lookup_hash(q)[source]

Checks if the storage contains a given content checksum

Args: query string of the form <hash_algo:hash>

Returns: Dict with key found containing the hash info if the hash is present, None if not.

swh.web.common.service.search_hash(q)[source]

Checks if the storage contains a given content checksum

Args: query string of the form <hash_algo:hash>

Returns: Dict with key found to True or False, according to
whether the checksum is present or not
swh.web.common.service.lookup_content_ctags(q)[source]

Return ctags information from a specified content.

Parameters:q – query string of the form <hash_algo:hash>
Yields:ctags information (dict) list if the content is found.
swh.web.common.service.lookup_content_filetype(q)[source]

Return filetype information from a specified content.

Parameters:q – query string of the form <hash_algo:hash>
Yields:filetype information (dict) list if the content is found.
swh.web.common.service.lookup_content_language(q)[source]

Return language information from a specified content.

Parameters:q – query string of the form <hash_algo:hash>
Yields:language information (dict) list if the content is found.
swh.web.common.service.lookup_content_license(q)[source]

Return license information from a specified content.

Parameters:q – query string of the form <hash_algo:hash>
Yields:license information (dict) list if the content is found.
swh.web.common.service.lookup_origin(origin)[source]

Return information about the origin matching dict origin.

Parameters:
  • origin – origin’s dict with keys either ‘id’ or
  • AND 'url') (('type') –
Returns:

origin information as dict.

swh.web.common.service.search_origin(url_pattern, offset=0, limit=50, regexp=False, with_visit=False)[source]

Search for origins whose urls contain a provided string pattern or match a provided regular expression.

Parameters:
  • url_pattern – the string pattern to search for in origin urls
  • offset – number of found origins to skip before returning results
  • limit – the maximum number of found origins to return
Returns:

list of origin information as dict.

swh.web.common.service.search_origin_metadata(fulltext, limit=50)[source]

Search for origins whose metadata match a provided string pattern.

Parameters:
  • fulltext – the string pattern to search for in origin metadata
  • offset – number of found origins to skip before returning results
  • limit – the maximum number of found origins to return
Returns:

list of origin metadata as dict.

swh.web.common.service.lookup_person(person_id)[source]

Return information about the person with id person_id.

Parameters:as string (person_id) –
Returns:person information as dict.
Raises:NotFoundExc if there is no person with the provided id.
swh.web.common.service.lookup_directory(sha1_git)[source]

Return information about the directory with id sha1_git.

Parameters:as string (sha1_git) –
Returns:directory information as dict.
swh.web.common.service.lookup_directory_with_path(sha1_git, path_string)[source]

Return directory information for entry with path path_string w.r.t. root directory pointed by directory_sha1_git

Parameters:
  • directory_sha1_git (-) – sha1_git corresponding to the directory
  • which we append paths to (to) –
  • the relative path to the entry starting from the directory pointed by (-) –
  • directory_sha1_git
Raises:

NotFoundExc if the directory entry is not found

swh.web.common.service.lookup_release(release_sha1_git)[source]

Return information about the release with sha1 release_sha1_git.

Parameters:release_sha1_git – The release’s sha1 as hexadecimal
Returns:Release information as dict.
Raises:ValueError if the identifier provided is not of sha1 nature.
swh.web.common.service.lookup_release_multiple(sha1_git_list)[source]

Return information about the revisions identified with their sha1_git identifiers.

Parameters:sha1_git_list – A list of revision sha1_git identifiers
Returns:Release information as dict.
Raises:ValueError if the identifier provided is not of sha1 nature.
swh.web.common.service.lookup_revision(rev_sha1_git)[source]

Return information about the revision with sha1 revision_sha1_git.

Parameters:

revision_sha1_git – The revision’s sha1 as hexadecimal

Returns:

Revision information as dict.

Raises:
  • ValueError if the identifier provided is not of sha1 nature.
  • NotFoundExc if there is no revision with the provided sha1_git.
swh.web.common.service.lookup_revision_multiple(sha1_git_list)[source]

Return information about the revisions identified with their sha1_git identifiers.

Parameters:sha1_git_list – A list of revision sha1_git identifiers
Returns:Generator of revisions information as dict.
Raises:ValueError if the identifier provided is not of sha1 nature.
swh.web.common.service.lookup_revision_message(rev_sha1_git)[source]

Return the raw message of the revision with sha1 revision_sha1_git.

Parameters:

revision_sha1_git – The revision’s sha1 as hexadecimal

Returns:

<the_message>}

Return type:

Decoded revision message as dict {‘message’

Raises:
  • ValueError if the identifier provided is not of sha1 nature.
  • NotFoundExc if the revision is not found, or if it has no message
swh.web.common.service.lookup_revision_by(origin_id, branch_name='HEAD', timestamp=None)[source]

Lookup revision by origin id, snapshot branch name and visit timestamp.

If branch_name is not provided, lookup using ‘HEAD’ as default. If timestamp is not provided, use the most recent.

Parameters:
  • origin_id (int) – origin of the revision
  • branch_name (str) – snapshot branch name
  • timestamp (str/int) – origin visit time frame
Returns:

The revision matching the criterions

Return type:

dict

Raises:

NotFoundExc if no revision corresponds to the criterion

swh.web.common.service.lookup_revision_log(rev_sha1_git, limit)[source]

Lookup revision log by revision id.

Parameters:
  • rev_sha1_git (str) – The revision’s sha1 as hexadecimal
  • limit (int) – the maximum number of revisions returned
Returns:

Revision log as list of revision dicts

Return type:

list

Raises:
  • ValueError – if the identifier provided is not of sha1 nature.
  • NotFoundExc – if there is no revision with the provided sha1_git.
swh.web.common.service.lookup_revision_log_by(origin_id, branch_name, timestamp, limit)[source]

Lookup revision by origin id, snapshot branch name and visit timestamp.

Parameters:
  • origin_id (int) – origin of the revision
  • branch_name (str) – snapshot branch
  • timestamp (str/int) – origin visit time frame
  • limit (int) – the maximum number of revisions returned
Returns:

Revision log as list of revision dicts

Return type:

list

Raises:

NotFoundExc – if no revision corresponds to the criterion

swh.web.common.service.lookup_revision_with_context_by(origin_id, branch_name, timestamp, sha1_git, limit=100)[source]

Return information about revision sha1_git, limited to the sub-graph of all transitive parents of sha1_git_root. sha1_git_root being resolved through the lookup of a revision by origin_id, branch_name and ts.

In other words, sha1_git is an ancestor of sha1_git_root.

Parameters:
  • origin_id (-) – origin of the revision.
  • branch_name (-) – revision’s branch.
  • timestamp (-) – revision’s time frame.
  • sha1_git (-) – one of sha1_git_root’s ancestors.
  • limit (-) – limit the lookup to 100 revisions back.
Returns:

Pair of (root_revision, revision). Information on sha1_git if it is an ancestor of sha1_git_root including children leading to sha1_git_root

Raises:
    • BadInputExc in case of unknown algo_hash or bad hash.
    • NotFoundExc if either revision is not found or if sha1_git is not an
  • ancestor of sha1_git_root.
swh.web.common.service.lookup_revision_with_context(sha1_git_root, sha1_git, limit=100)[source]

Return information about revision sha1_git, limited to the sub-graph of all transitive parents of sha1_git_root.

In other words, sha1_git is an ancestor of sha1_git_root.

Parameters:
  • sha1_git_root – latest revision. The type is either a sha1 (as an hex
  • or a non converted dict. (string)) –
  • sha1_git – one of sha1_git_root’s ancestors
  • limit – limit the lookup to 100 revisions back
Returns:

Information on sha1_git if it is an ancestor of sha1_git_root including children leading to sha1_git_root

Raises:
  • BadInputExc in case of unknown algo_hash or bad hash
  • NotFoundExc if either revision is not found or if sha1_git is not an
  • ancestor of sha1_git_root
swh.web.common.service.lookup_directory_with_revision(sha1_git, dir_path=None, with_data=False)[source]

Return information on directory pointed by revision with sha1_git. If dir_path is not provided, display top level directory. Otherwise, display the directory pointed by dir_path (if it exists).

Parameters:
  • sha1_git – revision’s hash.
  • dir_path – optional directory pointed to by that revision.
  • with_data – boolean that indicates to retrieve the raw data if the path
  • to a content. Default to False (resolves) –
Returns:

Information on the directory pointed to by that revision.

Raises:
  • BadInputExc in case of unknown algo_hash or bad hash.
  • NotFoundExc either if the revision is not found or the path referenced
  • does not exist.
  • NotImplementedError in case of dir_path exists but do not reference a
  • type ‘dir’ or ‘file’.
swh.web.common.service.lookup_content(q)[source]

Lookup the content designed by q.

Parameters:q – The release’s sha1 as hexadecimal
Raises:NotFoundExc if the requested content is not found
swh.web.common.service.lookup_content_raw(q)[source]

Lookup the content defined by q.

Parameters:

q – query string of the form <hash_algo:hash>

Returns:

dict with ‘sha1’ and ‘data’ keys. data representing its raw data decoded.

Raises:
  • NotFoundExc if the requested content is not found or
  • if the content bytes are not available in the storage
swh.web.common.service.stat_counters()[source]

Return the stat counters for Software Heritage

Returns:A dict mapping textual labels to integer values.
swh.web.common.service.lookup_origin_visits(origin_id, last_visit=None, per_page=10)[source]

Yields the origin origin_ids’ visits.

Parameters:origin_id – origin to list visits for
Yields:Dictionaries of origin_visit for that origin
swh.web.common.service.lookup_origin_visit(origin_id, visit_id)[source]

Return information about visit visit_id with origin origin_id.

Parameters:
  • origin_id – origin concerned by the visit
  • visit_id – the visit identifier to lookup
Yields:

The dict origin_visit concerned

swh.web.common.service.lookup_snapshot_size(snapshot_id)[source]

Count the number of branches in the snapshot with the given id

Parameters:snapshot_id (str) – sha1 identifier of the snapshot
Returns:A dict whose keys are the target types of branches and values their corresponding amount
Return type:dict
swh.web.common.service.lookup_snapshot(snapshot_id, branches_from='', branches_count=1000, target_types=None)[source]

Return information about a snapshot, aka the list of named branches found during a specific visit of an origin.

Parameters:
  • snapshot_id (str) – sha1 identifier of the snapshot
  • branches_from (str) – optional parameter used to skip branches whose name is lesser than it before returning them
  • branches_count (int) – optional parameter used to restrain the amount of returned branches
  • target_types (list) – optional parameter used to filter the target types of branch to return (possible values that can be contained in that list are ‘content’, ‘directory’, ‘revision’, ‘release’, ‘snapshot’, ‘alias’)
Returns:

A dict filled with the snapshot content.

swh.web.common.service.lookup_latest_origin_snapshot(origin_id, allowed_statuses=None)[source]

Return information about the latest snapshot of an origin.

Warning

At most 1000 branches contained in the snapshot will be returned for performance reasons.

Parameters:
  • origin_id – integer identifier of the origin
  • allowed_statuses – list of visit statuses considered to find the latest snapshot for the visit. For instance, allowed_statuses=['full'] will only consider visits that have successfully run to completion.
Returns:

A dict filled with the snapshot content.

swh.web.common.service.lookup_revision_through(revision, limit=100)[source]

Retrieve a revision from the criterion stored in revision dictionary.

Parameters:
  • revision – Dictionary of criterion to lookup the revision with.
  • are the supported combination of possible values (Here) –
  • origin_id, branch_name, ts, sha1_git (-) –
  • origin_id, branch_name, ts (-) –
  • sha1_git_root, sha1_git (-) –
  • sha1_git (-) –
Returns:

None if the revision is not found or the actual revision.

swh.web.common.service.lookup_directory_through_revision(revision, path=None, limit=100, with_data=False)[source]

Retrieve the directory information from the revision.

Parameters:
  • revision – dictionary of criterion representing a revision to lookup
  • path – directory’s path to lookup.
  • limit – optional query parameter to limit the revisions log (default to 100). For now, note that this limit could impede the transitivity conclusion about sha1_git not being an ancestor of.
  • with_data – indicate to retrieve the content’s raw data if path resolves to a content.
Returns:

The directory pointing to by the revision criterions at path.

swh.web.common.service.vault_cook(obj_type, obj_id, email=None)[source]

Cook a vault bundle.

swh.web.common.service.vault_fetch(obj_type, obj_id)[source]

Fetch a vault bundle.

swh.web.common.service.vault_progress(obj_type, obj_id)[source]

Get the current progress of a vault bundle.

swh.web.common.service.diff_revision(rev_id)[source]

Get the list of file changes (insertion / deletion / modification / renaming) for a particular revision.

swh.web.common.service.get_revisions_walker(rev_walker_type, rev_start, *args, **kwargs)[source]

Utility function to instantiate a revisions walker of a given type, see swh.storage.algos.revisions_walker.

Parameters:
  • rev_walker_type (str) – the type of revisions walker to return, possible values are: committer_date, dfs, dfs_post, bfs and path
  • rev_start (str) – hexadecimal representation of a revision identifier
  • args (list) – position arguments to pass to the revisions walker constructor
  • kwargs (dict) – keyword arguments to pass to the revisions walker constructor

swh.web.common.swh_templatetags module

class swh.web.common.swh_templatetags.NoHeaderHTMLTranslator(document)[source]

Bases: docutils.writers.html4css1.HTMLTranslator

Docutils translator subclass to customize the generation of HTML from reST-formatted docstrings

visit_bullet_list(node)[source]
swh.web.common.swh_templatetags.safe_docstring_display(docstring)[source]

Utility function to htmlize reST-formatted documentation in browsable api.

Utility function for decorating api links in browsable api.

Parameters:
  • text – whose content matching links should be transformed into
  • API or Browse html links. (contextual) –
Returns
The text transformed if any link is found. The text as is otherwise.

Utility function for decorating headers links in browsable api.

Args
text: Text whose content contains Link header value
Returns:The text transformed with html link if any link is found. The text as is otherwise.
swh.web.common.swh_templatetags.jsonify(obj)[source]

Utility function for converting a django template variable to JSON in order to use it in script tags.

Args
obj: Any django template context variable
Returns:JSON representation of the variable.
swh.web.common.swh_templatetags.sub(value, arg)[source]

Django template filter for subtracting two numbers

Parameters:
  • value (int/float) – the value to subtract from
  • arg (int/float) – the value to subtract to
Returns:

The subtraction result

Return type:

int/float

swh.web.common.swh_templatetags.mul(value, arg)[source]

Django template filter for multiplying two numbers

Parameters:
  • value (int/float) – the value to multiply from
  • arg (int/float) – the value to multiply with
Returns:

The multiplication result

Return type:

int/float

swh.web.common.swh_templatetags.key_value(dict, key)[source]

Django template filter to get a value in a dictionary.

Parameters:
  • dict (dict) – a dictionary
  • key (str) – the key to lookup value
Returns:

The requested value in the dictionary

swh.web.common.swh_templatetags.origin_type_savable(origin_type)[source]

Django template filter to check if a save request can be created for a given origin type.

Args:
origin_type (str): the type of software origin
Returns:
If the origin type is savable or not

swh.web.common.throttling module

class swh.web.common.throttling.SwhWebRateThrottle[source]

Bases: rest_framework.throttling.ScopedRateThrottle

Custom request rate limiter for DRF enabling to exempt specific networks specified in swh-web configuration.

Requests are grouped into scopes. It enables to apply different requests rate limiting based on the scope name but also the input HTTP request types.

To associate a scope to requests, one must add a ‘throttle_scope’ attribute when using a class based view, or call the ‘throttle_scope’ decorator when using a function based view. By default, requests do not have an associated scope and are not rate limited.

Rate limiting can also be configured according to the type of the input HTTP requests for fine grained tuning.

For instance, the following YAML configuration section sets a rate of:
  • 1 per minute for POST requests
  • 60 per minute for other request types

for the ‘swh_api’ scope while exempting those coming from the 127.0.0.0/8 ip network.

throttling:
    scopes:
        swh_api:
            limiter_rate:
                default: 60/m
                POST: 1/m
            exempted_networks:
                - 127.0.0.0/8
scope = None
get_exempted_networks(scope_name)[source]
allow_request(request, view)[source]

Implement the check to see if the request should be throttled.

On success calls throttle_success. On failure calls throttle_failure.

swh.web.common.throttling.throttle_scope(scope)[source]

Decorator that allows the throttle scope of a DRF function based view to be set:

@api_view(['GET', ])
@throttle_scope('scope')
def view(request):
    ...

swh.web.common.urlsindex module

class swh.web.common.urlsindex.UrlsIndex[source]

Bases: object

Simple helper class for centralizing url patterns of a Django web application.

Derived classes should override the ‘scope’ class attribute otherwise all declared patterns will be grouped under the default one.

scope = 'default'
classmethod add_url_pattern(url_pattern, view, view_name)[source]

Class method that adds an url pattern to the current scope.

Parameters:
  • url_pattern – regex describing a Django url
  • view – function implementing the Django view
  • view_name – name of the view used to reverse the url
classmethod get_url_patterns()[source]

Class method that returns the list of url pattern associated to the current scope.

Returns:The list of url patterns associated to the current scope

swh.web.common.utils module

swh.web.common.utils.reverse(viewname, url_args=None, query_params=None, current_app=None, urlconf=None)[source]

An override of django reverse function supporting query parameters.

Parameters:
  • viewname (str) – the name of the django view from which to compute a url
  • url_args (dict) – dictionary of url arguments indexed by their names
  • query_params (dict) – dictionary of query parameters to append to the reversed url
  • current_app (str) – the name of the django app tighten to the view
  • urlconf (str) – url configuration module
Returns:

the url of the requested view with processed arguments and query parameters

Return type:

str

swh.web.common.utils.datetime_to_utc(date)[source]

Returns datetime in UTC without timezone info

Parameters:date (datetime.datetime) – input datetime with timezone info
Returns:datetime in UTC without timezone info
Return type:datetime.datetime
swh.web.common.utils.parse_timestamp(timestamp)[source]

Given a time or timestamp (as string), parse the result as UTC datetime.

Returns:
a timezone-aware datetime representing the
parsed value or None if the parsing fails.
Return type:datetime.datetime
Samples:
  • 2016-01-12
  • 2016-01-12T09:19:12+0100
  • Today is January 1, 2047 at 8:21:00AM
  • 1452591542
swh.web.common.utils.shorten_path(path)[source]

Shorten the given path: for each hash present, only return the first 8 characters followed by an ellipsis

swh.web.common.utils.format_utc_iso_date(iso_date, fmt='%d %B %Y, %H:%M UTC')[source]

Turns a string representation of an ISO 8601 date string to UTC and format it into a more human readable one.

For instance, from the following input string: ‘2017-05-04T13:27:13+02:00’ the following one is returned: ‘04 May 2017, 11:27 UTC’. Custom format string may also be provided as parameter

Parameters:
  • iso_date (str) – a string representation of an ISO 8601 date
  • fmt (str) – optional date formatting string
Returns:

a formatted string representation of the input iso date

Return type:

str

swh.web.common.utils.gen_path_info(path)[source]

Function to generate path data navigation for use with a breadcrumb in the swh web ui.

For instance, from a path /folder1/folder2/folder3, it returns the following list:

[{'name': 'folder1', 'path': 'folder1'},
 {'name': 'folder2', 'path': 'folder1/folder2'},
 {'name': 'folder3', 'path': 'folder1/folder2/folder3'}]
Parameters:path – a filesystem path
Returns:a list of path data for navigation as illustrated above.
Return type:list
swh.web.common.utils.get_swh_persistent_id(object_type, object_id, scheme_version=1)[source]

Returns the persistent identifier for a swh object based on:

  • the object type
  • the object id
  • the swh identifiers scheme version
Parameters:
  • object_type (str) – the swh object type (content/directory/release/revision/snapshot)
  • object_id (str) – the swh object id (hexadecimal representation of its hash value)
  • scheme_version (int) – the scheme version of the swh persistent identifiers
Returns:

the swh object persistent identifier

Return type:

str

Raises:

BadInputExc – if the provided parameters do not enable to generate a valid identifier

swh.web.common.utils.resolve_swh_persistent_id(swh_id, query_params=None)[source]

Try to resolve a Software Heritage persistent id into an url for browsing the pointed object.

Parameters:
  • swh_id (str) – a Software Heritage persistent identifier
  • query_params (django.http.QueryDict) – optional dict filled with query parameters to append to the browse url
Returns:

a dict with the following keys:

  • swh_id_parsed (swh.model.identifiers.PersistentId): the parsed identifier
  • browse_url (str): the url for browsing the pointed object

Return type:

dict

Raises:

BadInputExc – if the provided identifier can not be parsed

swh.web.common.utils.parse_rst(text, report_level=2)[source]

Parse a reStructuredText string with docutils.

Parameters:
  • text (str) – string with reStructuredText markups in it
  • report_level (int) – level of docutils report messages to print (1 info 2 warning 3 error 4 severe 5 none)
Returns:

a parsed docutils document

Return type:

docutils.nodes.document

swh.web.common.utils.get_client_ip(request)[source]

Return the client IP address from an incoming HTTP request.

Parameters:request (django.http.HttpRequest) – the incoming HTTP request
Returns:The client IP address
Return type:str
swh.web.common.utils.is_recaptcha_valid(request, recaptcha_response)[source]

Verify if the response for Google reCAPTCHA is valid.

Parameters:
  • request (django.http.HttpRequest) – the incoming HTTP request
  • recaptcha_response (str) – the reCAPTCHA response
Returns:

Whether the reCAPTCHA response is valid or not

Return type:

bool

swh.web.common.utils.context_processor(request)[source]

Django context processor used to inject variables in all swh-web templates.

Module contents