Roadmap 2021#
(Version 1.0, last modified 05/04/2021)
This document provides an overview of the technical roadmap of Software Heritage for 2021.
The Kanban board is seen through our forge.
Collect#
Faster and more reliable save code now#
tags: openscience
task: T3082
lead: ardumont
effort: 1 PM
Includes work:
set up dedicated fast track pipeline for save code now
improve save code now monitoring (user and admin)
Improve deposit integration, management and display#
tags: openscience
task: T3128
lead: moranegg
effort: 3 PM
Includes work:
Save forge now#
tags: expand
task: T1538
lead: ardumont
effort: 1 PM - tooling & process
Admin tooling for takedown notices (URLs)#
tags: contract, compliance
task: T3087
lead: anlambert
effort: 2 PM
Includes work:
admin interface
journal of operations
web page with list of accepted TDN
Preserve#
Complete and up-to-date archive copy on S3#
tags: stability
task: T3085
lead: douardda
effort: 1 PM
Includes work:
live update of the objects
regular dumps of the (anonymized) Merkle graph
Scale-out graph storage in production#
tags: scalability
task: T2214
lead: vlorentz
effort: 3 PM
Includes work:
Cassandra: T1892 (maybe with external help)
Scale-out object storage prototype#
tags: stability, scalability, externalized
task: T3054
lead: dachary
effort: 3 PM
Cold storage archive in Vitam instance at CINES#
tags: contract
task: T3113
lead: douardda
effort: 4 PM
Mirrors#
tags: stability, scalability
depends: scale-out object storage
task: T3116
lead: douardda
effort: 3 PM
Includes work:
get up and running at least one mirror
SWHID v2#
tags: stability, evolution, datamodel
task: T3134
lead: zack
effort: 6 PM
Includes work:
complete on paper spec
align with new git hashes
including migration plan from v1
understand impact on internal microservice architecture
keep correspondence with v1 (there may be multiple v2 for one v1!)
reviewed by crypto experts
Integrity#
tags: stability, reliability
task: T3135
lead: olasd
effort: 2 PM
Includes work:
Organize#
Collect extrinsic metadata#
tags: compliance
task: T2202
lead: vlorentz
effort: 3 PM
Includesd work: - working pipeline - at least 1 instance running ClearlyDefined - forge metadata (info on the main page, etc.)
Provenance in production#
tags: contract, feature
task: T3112
lead: zack
effort: 6 PM
Prior art#
tags: compliance
depends: provenance | swh-graph in production
task: T3136
lead: zack
effort: 3 PM
Includes work:
pinpoint origin of selected source code artifacts
possibly integrated with swh-scanner
Measurement#
Efficient archive counters (HyperLogLog)#
tags: measure, comm
task: T2912
lead: vsellier
effort: 1 PM
Distribution of origins by forge#
tags: measure, comm
task: T3127
lead: anlambert
effort: 1 PM
Stats on regular crawling by forge#
tags: measure, comm
task: T1363
lead: olasd
effort: 1 PM
Includes work:
lag, periodicity, # of changes since last visit, etc.
View deposits per user (admin and user)#
tags: measure, support
task: T3128
lead: ardumont
effort: 1 PM
Reliable user-level monitoring of services#
tags: stability
task: T3129
lead: vsellier
effort: 2 PM
Includes work:
status.softwareheritage.org
Documentation#
Write use case-specific documentation#
tags: comm, web, doc
task: T2234
lead: moranegg
effort: 2 PM
Includes FAQ for: - users - ambassadors
Improve quality of code documentation#
tags: doc, externalized
task: TODO
lead: TBD
effort: 2PM
Includes work:
doc(string) audit - team training about doc writing
Documentation strategy#
tags: doc
task: T2624
lead: moranegg
effort: 1 PM
Includes work:
respective role of docs.s.o, wiki, www.s.o, etc.
Community#
Tooling for fundraising campaigns#
tags: web
task: T3077
lead: anlambert
effort: 1 PM
Dedicated page to list status of supported listers/loaders#
tags: web, doc
task: T3117
lead: anlambert
effort: 1 PM
Includes work:
design web page
process to maintain up to date
make clearly visible and link to Sloan subgrants
Tooling#
Migration to GitLab#
tags: forge, development
task: T2225
lead: olasd
effort: 1PM