Setup on a PostgreSQL instance¶
This tutorial will guide you through the steps required to setup the Software Heritage Graph Dataset in a PostgreSQL database.
PostgreSQL local setup¶
You need to have access to a running PostgreSQL instance to load the dataset. This section contains information on how to setup PostgreSQL for the first time.
If you already have a PostgreSQL server running on your machine, you can skip to the next section.
For Ubuntu and Debian:
sudo apt install postgresql
sudo pacman -S --needed postgresql sudo -u postgres initdb -D '/var/lib/postgres/data' sudo systemctl enable --now postgresql
Once PostgreSQL is running, you also need an user that will be able to create databases and run queries. The easiest way to achieve that is simply to create an account that has the same name as your username and that can create databases:
sudo -u postgres createuser --createdb $USER
Retrieving the dataset¶
You need to download the dataset in SQL format. Use the following command on your machine, after making sure that it has enough available space for the dataset you chose:
Loading the dataset¶
Once you have retrieved the dataset of your choice, create a database that will contain it, and load the database:
You can now run SQL queries on your database. Run
psql <database_name> to
start an interactive PostgreSQL console.