Elasticsearch

Software Heritage uses an Elasticsearch cluster for long-term log storage.

Hardware implementation

  • 3x Xeon E3v6 (Skylake) servers with 32GB of RAM and 3x 2TB of hard drives each

  • 2x gigabit switches

List of nodes

  • esnode1.internal.softwareheritage.org.

  • esnode2.internal.softwareheritage.org.

  • esnode3.internal.softwareheritage.org.

Architecture diagram

digraph {
	label="Elasticsearch relations"

	generic_host1 -> journalbeat
	generic_host2 -> journalbeat
	worker_host1 -> journalbeat
	generic_host1[label="generic_host", shape="box"]
	generic_host2[label="generic_host", shape="box"]

	logstash -> elasticsearch_cluster
	elasticsearch_cluster[shape="box",color="green"]

	moma_apache -> filebeat
	filebeat -> apache_logs_indices
	apache_logs_indices -> logstash
	moma_apache[shape="box"]

	journalbeat -> system_logs_indices
	journalbeat -> swh_workers_indices

	swh_scheduler -> swh_tasks_indices
	swh_tasks_indices -> elasticsearch_cluster

	system_logs_indices -> logstash
	swh_workers_indices -> logstash
	worker_host1[label="worker host", shape="box"]


	logstash[label="logstash\nlogstash0_vm"]
	elasticsearch_cluster -> kibana0

	apache_logs_indices[shape="note"]
	system_logs_indices[shape="note"]
	swh_tasks_indices[shape="note", label="swh-tasks indices"]
	swh_workers_indices[shape="note", label="swh_workers indices\nsystemd_unit = swh-worker "]

	{rank="same"; elasticsearch_cluster; kibana0}
	{rank="same"; swh_tasks_indices; apache_logs_indices}
}

Per-node storage

  • one root hard drive with a small filesystem

  • 3x 2TB hard drives in RAID0

  • xfs filesystem on this volume, mounted on /srv/elasticsearch

Remark

The root hard drive of the Elasticsearch nodes is also used to store an ext4 Kafka dedicated filesystem mounted on /srv/kafka .