Roadmap 2021
(Version 1.0, last modified 05/04/2021)
This document provides an overview of the technical roadmap of Software Heritage for 2021.
The Kanban board is seen through our forge.
Contents
Collect
Faster and more reliable save code now
tags: openscience
task: T3082
lead: ardumont
effort: 1 PM
Includes work:
set up dedicated fast track pipeline for save code now
improve save code now monitoring (user and admin)
Improve deposit integration, management and display
tags: openscience
task: T3128
lead: moranegg
effort: 3 PM
Includes work:
Save forge now
tags: expand
task: T1538
lead: ardumont
effort: 1 PM - tooling & process
Admin tooling for takedown notices (URLs)
tags: contract, compliance
task: T3087
lead: anlambert
effort: 2 PM
Includes work:
admin interface
journal of operations
web page with list of accepted TDN
Preserve
Complete and up-to-date archive copy on S3
tags: stability
task: T3085
lead: douardda
effort: 1 PM
Includes work:
live update of the objects
regular dumps of the (anonymized) Merkle graph
Scale-out graph storage in production
tags: scalability
task: T2214
lead: vlorentz
effort: 3 PM
Includes work:
Cassandra: T1892 (maybe with external help)
Scale-out object storage prototype
tags: stability, scalability, externalized
task: T3054
lead: dachary
effort: 3 PM
Cold storage archive in Vitam instance at CINES
tags: contract
task: T3113
lead: douardda
effort: 4 PM
Mirrors
tags: stability, scalability
depends: scale-out object storage
task: T3116
lead: douardda
effort: 3 PM
Includes work:
get up and running at least one mirror
SWHID v2
tags: stability, evolution, datamodel
task: T3134
lead: zack
effort: 6 PM
Includes work:
complete on paper spec
align with new git hashes
including migration plan from v1
understand impact on internal microservice architecture
keep correspondence with v1 (there may be multiple v2 for one v1!)
reviewed by crypto experts
Integrity
tags: stability, reliability
task: T3135
lead: olasd
effort: 2 PM
Includes work:
Organize
Collect extrinsic metadata
tags: compliance
task: T2202
lead: vlorentz
effort: 3 PM
Includesd work: - working pipeline - at least 1 instance running ClearlyDefined - forge metadata (info on the main page, etc.)
Provenance in production
tags: contract, feature
task: T3112
lead: zack
effort: 6 PM
Prior art
tags: compliance
depends: provenance | swh-graph in production
task: T3136
lead: zack
effort: 3 PM
Includes work:
pinpoint origin of selected source code artifacts
possibly integrated with swh-scanner
Measurement
Efficient archive counters (HyperLogLog)
tags: measure, comm
task: T2912
lead: vsellier
effort: 1 PM
Distribution of origins by forge
tags: measure, comm
task: T3127
lead: anlambert
effort: 1 PM
Stats on regular crawling by forge
tags: measure, comm
task: T1363
lead: olasd
effort: 1 PM
Includes work:
lag, periodicity, # of changes since last visit, etc.
View deposits per user (admin and user)
tags: measure, support
task: T3128
lead: ardumont
effort: 1 PM
Reliable user-level monitoring of services
tags: stability
task: T3129
lead: vsellier
effort: 2 PM
Includes work:
status.softwareheritage.org
Documentation
Write use case-specific documentation
tags: comm, web, doc
task: T2234
lead: moranegg
effort: 2 PM
Includes FAQ for: - users - ambassadors
Improve quality of code documentation
tags: doc, externalized
task: TODO
lead: TBD
effort: 2PM
Includes work:
doc(string) audit - team training about doc writing
Documentation strategy
tags: doc
task: T2624
lead: moranegg
effort: 1 PM
Includes work:
respective role of docs.s.o, wiki, www.s.o, etc.
Community
Tooling for fundraising campaigns
tags: web
task: T3077
lead: anlambert
effort: 1 PM
Dedicated page to list status of supported listers/loaders
tags: web, doc
task: T3117
lead: anlambert
effort: 1 PM
Includes work:
design web page
process to maintain up to date
make clearly visible and link to Sloan subgrants
Tooling
Migration to GitLab
tags: forge, development
task: T2225
lead: olasd
effort: 1PM