- swh.lister.nuget.lister module
- swh.lister.nuget.tasks module
The NuGet lister discover origins from nuget.org, NuGet is the package manager for .NET. As .NET packages mostly contains binaries, we keep only track of packages that have a Dvcs repository (GIT, SVN, Mercurial…) url usable as an origin.
The nuget.org/packages list 301,206 packages as of September 2022.
Origins retrieving strategy#
Nuget.org provides an http api with several endpoint to discover and list packages and versions.
The recommended way to retrieve all packages is to use the catalog api endpoint. It provides a catalog index endpoint that list all available pages. We then iterate to get content of related pages.
The lister is incremental following a cursor principle, based on the value of
commitTimeStamp from the catalog index endpoint. It retrieve only pages for which
commitTimeStamp``is greater than ``lister.state.last_listing_date.
Each page returns a list of packages which is the data of the response request.
Origins from page#
For each entry in a page listing we get related metadata through its package metadata http api endpoint. It returns uri for linked archives that contains binary, not the original source code. Our strategy is then to get a related GIT repository.
- We use another endpoint for each package to get its package manifest, a .nuspec file (xml
data) which may contains a GIT repository url. If we found one, it is used as origin.
Activate the virtualenv and run from within swh-lister directory:
pytest -s -vv --log-cli-level=DEBUG swh/lister/nuget/tests
Testing with Docker#
Change directory to swh/docker then launch the docker environment:
docker compose up -d
Then schedule a nuget listing task:
docker compose exec swh-scheduler swh scheduler task add -p oneshot list-nuget
You can follow lister execution by displaying logs of swh-lister service:
docker compose logs -f swh-lister