swh.lister.core.indexing_lister module

class swh.lister.core.indexing_lister.IndexingLister(override_config=None)[source]

Bases: swh.lister.core.lister_base.ListerBase

Lister* intermediate class for any service that follows the pattern:

  • The service must report at least one stable unique identifier, known herein as the UID value, for every listed repository.

  • If the service splits the list of repositories into sublists, it must report at least one stable and sorted index identifier for every listed repository, known herein as the indexable value, which can be used as part of the service endpoint query to request a sublist beginning from that index. This might be the UID if the UID is monotonic.

  • Client sends a request to list repositories starting from a given index.

  • Client receives structured (json/xml/etc) response with information about a sequential series of repositories starting from that index and, if necessary/available, some indication of the URL or index for fetching the next series of repository data.

See swh.lister.core.lister_base.ListerBase for more details.

This class cannot be instantiated. To create a new Lister for a source code listing service that follows the model described above, you must subclass this class and provide the required overrides in addition to any unmet implementation/override requirements of this class’s base. (see parent class and member docstrings for details)

Required Overrides:

def get_next_target_from_response
flush_packet_db = 20

Number of iterations in-between write flushes of lister repositories to db (see fn:run).

default_min_bound = ''

Default initialization value for the minimum boundary index to use when undefined (see fn:run).

abstract get_next_target_from_response(response)[source]

Find the next server endpoint identifier given the entire response.

Implementation of this method depends on the server API spec and the shape of the network response object returned by the transport_request method.

Parameters

response (transport response) – response page from the server

Returns

index of next page, possibly extracted from a next href url

filter_before_inject(models_list)[source]

Overrides ListerBase.filter_before_inject

Bounds query results by this Lister’s set max_index.

db_query_range(start, end)[source]
Look in the db for a range of repositories with indexable

values in the range [start, end]

Parameters
  • start (model indexable type) – start of desired indexable range

  • end (model indexable type) – end of desired indexable range

Returns

a list of sqlalchemy.ext.declarative.declarative_base objects

with indexable values within the given range

db_partition_indices(partition_size)[source]
Describe an index-space compartmentalization of the db table

in equal sized chunks. This is used to describe min&max bounds for parallelizing fetch tasks.

Parameters

partition_size (int) – desired size to make each partition

Returns

a list of tuples (begin, end) of indexable value that declare approximately equal-sized ranges of existing repos

db_first_index()[source]

Look in the db for the smallest indexable value

Returns

the smallest indexable value of all repos in the db

db_last_index()[source]

Look in the db for the largest indexable value

Returns

the largest indexable value of all repos in the db

disable_deleted_repo_tasks(start, end, keep_these)[source]

Disable tasks for repos that no longer exist between start and end.

Parameters
  • start – beginning of range to disable

  • end – end of range to disable

  • keep_these (uid list) – do not disable repos with uids in this list

run(min_bound=None, max_bound=None)[source]
Main entry function. Sequentially fetches repository data

from the service according to the basic outline in the class docstring, continually fetching sublists until either there is no next index reference given or the given next index is greater than the desired max_bound.

Parameters
  • min_bound (indexable type) – optional index to start from

  • max_bound (indexable type) – optional index to stop at

Returns

nothing

class swh.lister.core.indexing_lister.IndexingHttpLister(url=None, override_config=None)[source]

Bases: swh.lister.core.lister_transports.ListerHttpTransport, swh.lister.core.indexing_lister.IndexingLister

Convenience class for ensuring right lookup and init order when combining IndexingLister and ListerHttpTransport.