swh.lister.cgit.lister module

class swh.lister.cgit.lister.CGitLister(url=None, instance=None, override_config=None)[source]

Bases: swh.lister.core.lister_base.ListerBase

Lister class for CGit repositories.

This lister will retrieve the list of published git repositories by parsing the HTML page(s) of the index retrieved at url.

For each found git repository, a query is made at the given url found in this index to gather published “Clone” URLs to be used as origin URL for that git repo.

If several “Clone” urls are provided, prefer the http/https one, if any, otherwise fall bak to the first one.

A loader task is created for each git repository:

    Type: load-git
    Policy: recurring


    Type: load-git
    Policy: recurring

alias of swh.lister.cgit.models.CGitModel

DEFAULT_URL = 'https://git.savannah.gnu.org/cgit/'
LISTER_NAME = 'cgit'
url_prefix_present = True
run() → Dict[str, str][source]
get_repos() → Generator[str, None, None][source]

Generate git ‘project’ URLs found on the current CGit server

build_model(repo_url: str) → Optional[Dict[str, Any]][source]

Given the URL of a git repo project page on a CGit server, return the repo description (dict) suitable for insertion in the db.

get_and_parse(url: str) → bs4.BeautifulSoup[source]

Get the given url and parse the retrieved HTML using BeautifulSoup