Citation workflow and architecture#
Quick reminder on metadata objects#
Metadata in Software Heritage is explained in detail in the document Metadata workflow and architecture. There are two types of metadata that are useful for citation, intrinsic and extrinsic. These two types can come from two sources: the archive itself (raw metadata) and the indexer (indexed metadata).
For each metadata type and metadata source, metadata can be extracted for a specific
object (snapshot
, release
, revision
, directory
,
content
), using its SWHID, or using the repository URL (origin
).
In the latter case, it will return the metadata for the latest version
(latest visit snapshot) of the repository root directory on the main
branch.
Citation use cases#
ID |
As a |
I can |
so that |
---|---|---|---|
UC1 (v1, v2) |
Researcher |
retrieve a citation or BibTeX export for a software artifact directly on SWH interface |
the software will be cited with correct attribution |
UC2 (v1, v2) |
Publisher (Episciences) |
retrieve a citation or BibTeX export for a software artifact programmatically |
expose BibTeX |
UC3 |
Aggregator (OpenAire) |
retrieve intrinsic metadata from SWH programmatically |
the software record will be enriched |
Citation v1: data flow#
In this version, Software Heritage can generate a citation in BibTeX
format from the raw intrinsic metadata available in the archive. The raw
intrinsic metadata used for citation will be a found codemeta.json
file or, alternatively, a found citation.cff
file in the repository.
As per metadata extraction:
When given an
origin
URL, the citation will be generated from the latest version of the repository root directory metadata on the main branch.When given a SWHID object of type
snapshot
,release
orrevision
, the citation will be generated from the repository root directory metadata, associated with that version.When given a
directory
object, if the SWHID is qualified with an anchor (explained in the document SoftWare Heritage persistent IDentifiers (SWHIDs), the citation will be generated from the repository root directory metadata, associated with the anchor version.
Warning
However, if no anchor was specified, it will be generated directly from the metadata found in that directory.
When given a
content
object, if the SWHID is qualified with an anchor, the citation will be generated from metadata of the repository root directory. If no anchor was specified, the citation cannot be generated due to a lack of information.
Citation v1: architecture#
Software Heritage provides a web API (through swh.web) to generate
a citation, given an origin
URL or a qualified SWHID.
The corresponding API endpoints are:
/api/1/raw-intrinsic-metadata/citation/origin/
(example:/api/1/raw-intrinsic-metadata/citation/origin/?citation_format=bibtex&origin_url=https://github.com/rdicosmo/parmap
)/api/1/raw-intrinsic-metadata/citation/swhid/SWHID/
(example:/api/1/raw-intrinsic-metadata/citation/swhid/?citation_format=bibtex&target_swhid=swh:1:dir:2dc0f462d191524530f5612d2935851505af41dd;origin=https://github.com/rdicosmo/parmap;visit=swh:1:snp:2128ed4f25f2d7ae7c8b7950a611d69cf4429063/
)
Currently, the only allowed citation format value is BibTeX
(citation_format=bibtex
).
This API uses intermediate utility methods:
in swh.web, to retrieve raw intrinsic metadata, given an
origin
URL or a qualified SWHID, which return originalcodemeta.json
andcitation.cff
files.in swh.indexer, to convert a
codemeta.json
or acitation.cff
file into a BibTeX citation.
Codemeta/citation.cff to BibTeX mapping#
A citation.cff
file will be first converted into a codemeta.json
document. The CFF
to CodeMeta
mapping can be found in the
codemeta
repository.
The CodeMeta
to BibTeX
mapping, used for the converter, is
currently under
review.
Note on BibTeX @software
, @softwareversion
and @codefragment
usage#
The generated BibTeX citation can be of type @software
,
@softwareversion
or @codefragment
. The rule is the following:
If SWHID is not specified,
And if version is specified, then it will be
@softwareversion
.Otherwise, it will be
@software
.
If SWHID is specified
And is of type
snapshot
, then it will be@software
.And is of type
release
,revision
ordirectory
, then it will be@softwareversion
.And is of type
content
, then it will be@codefragment
.
A generated BibTeX example#
@software{REPLACEME,
author = "Di Cosmo, Roberto and Danelutto, Marco",
organization = "Inria and University Paris Diderot and University of Pisa",
license = "LGPL-2.0-only",
date = "2011-07-18",
year = "2011",
month = "07",
repository = "git+https://github.com/rdicosmo/parmap.git",
title = "Parmap",
swhid = "swh:1:snp:01b2cc89f4c423f1bda4757edd86ae4013b919b0;origin=https://github.com/rdicosmo/parmap"
}
Citation v1: UI#
Citation should be available in the webapp through a new Citation tab under the Permalinks tab, that should open the Permalinks/Citation box.
Future#
In the current v1 version, citation is generated from raw intrinsic metadata, i.e. codemeta.json
or citation.cff
file.
quadrantChart title Metadata types and sources for citation generation x-axis Raw --> Indexed y-axis Extrinsic --> Intrinsic codemeta.json: [0.25, 0.9] citation.cff: [0.25, 0.75]
Metadata types and sources for citation generation v1
The next versions of the citation feature should include:
New supported citation formats.
Citation styles?
On the API/backend side:
v2: Generating citations from indexed intrinsic and extrinsic metadata (merging behaviour to be defined).
v3: Authorities.