Database backup and restore command line interface
Infrahub provides a command-line interface tool to help backup and restore a full neo4j database. It is designed to be run from almost any host machine and has a much smaller set of requirements than running the full Infrahub application. We use docker containers to run commands against the Neo4j database to make this tool easier to install.
Please see this guide for concrete examples of running backup and restore commands.
Requirements
Machine running the CLI command
- Docker installed
- Python 3.8 or later
- Python docker SDK
Backup only
- Can connect to Infrahub database container (over port 6362 by default)
Restore only
- Can connect to the Docker daemon running the Infrahub database container.
- Database credentials specified in the
NEO4J_AUTH
environment variable in the format (username/password)
Infrahub database container
- Must be running a Neo4j Enterprise image
- Must have the following environment variables set
- "NEO4J_ACCEPT_LICENSE_AGREEMENT=yes"
- "NEO4J_server_backup_enabled=true"
# Can use a port other than 6362, but this is the default
- "NEO4J_server_backup_listen__address=0.0.0.0:6362"
Command-line tool
The command-line tool lives here. To use the tool locally, you can either clone the Infrahub repository or copy the content of the linked file into a local .py
file. If you choose the clone the repository, you can run the tool with either of the following commands
python -m utilities.db_backup neo4j [backup|restore] ...
python utilities/db_backup/__main__.py [backup|restore] ...
If you copied the content of the file locally, only the second method will work.
Backups
As long as your Infrahub database container meets the following requirements, you can run a backup at any time.
- Uses a
neo4j<version>-enterprise
image - Exposes port 6362 (or another port specified in the
NEO4J_server_backup_listen__address
environment variable)
You do not need to schedule downtime. The machine on which you are running this backup command must either be able to connect to the backup listening port of the Infrahub database container or the Docker daemon of the machine running the Infrahub database container. The --database-url
argument can be used to specify the host name or IP address of the machine hosting the Infrahub database container. If --database-url
is not set, this utility will attempt to connect to a Docker container with the label infrahub_role=database
using a Docker network.
If you choose to use a port other than 6362
for your database backup, you must specify the port both in the NEO4J_server_backup_listen__address
environment variable of the Infrahub database container and in the --database-backup-port
argument of this backup command.
Incremental backups
The Neo4j backup command, which is what this tool uses, can take an incremental backup and will track the ID of the latest transaction so that no data will be lost even if changes are being made during the backup process.
You must specify a directory to store the backups as an argument to the db_backup neo4j backup
command. This directory can contain previous .backup
files for your Infrahub Neo4j database. In fact, using the same directory for multiple backups over time allows the Neo4j backup
command to run more quickly because it will only take an incremental backup that covers the transactions from the end of the latest .backup
files to the latest transaction on the database for each Neo4j database.
By default, our backup command aggregates incremental backups into a single file each time it runs. If you prefer to keep separate timestamped backup files, you can disable the aggregation using the --no-aggregate
flag.
Backup in detail
This is extra detail on precisely how our backup utility runs for those who are interested. You do not need to read this to use the tool.
We pretty closely follow the guidance from Neo4j on how to run their backup tool through Docker.
- Start a helper Docker container based on the
neo4j-admin
image. This container uses a volume that links a directory within the container to thebackup_directory
specified in the command. - If the utility finds a running Docker container with a label of
infrahub_role=database
, then we assume that this is the Infrahub database container and connect the helper container to any networks that the database container is using. - The helper container runs the
neo4j-admin database backup
command with connection details specifying the address of the Infrahub database container. - By default, the helper container then runs the
neo4j-admin database aggregate-backup
command on thebackup_directory
to consolidate all backups present into a single backup per Neo4j database. This can be turned off using the--no-aggregate
flag on the command line. - The helper container is stopped and removed. This can be turned off using the
--keep-helper-container
option on the command line.
Restore
The db_backup neo4j restore
command is disruptive. It will stop and start all Neo4j databases (except for system
) on your Infrahub database container.
Unlike the backup
command, the restore
command requires a connection to the Docker daemon running the Infrahub database container because it needs to execute commands directly on that container. This means that you either need to execute the restore
command
- on the same machine that is running your Infrahub database container
- on a machine that can connect to the Docker daemon running the Infrahub database container
The first option is probably easier. The second option can be achieved by ensuring the dockerd
process on the remote machine is running on a TCP port that your local machine can access and that your local machine has the DOCKER_HOST
environment variable set to point at the Docker daemon port on the remote machine.
Backup files
You must specify the directory containing the backup files you wish to restore on the Infrahub database in the backup_directory
argument to the db_backup neo4j restore
command. This directory should contain .backup
files produced by the db_backup neo4j backup
command. The files must be of the format <database_name>-2024-02-07T22-12-16.backup
. Given the default database configuration, a backup will produce two .backup
files: one for the neo4j
database and one for the system
database.
The backup
utility produces a .backup
file for the system
database, but the restore
utility will not attempt to restore a system
database backup. This should not cause any problems for you unless your Infrahub database has suffered a catastrophic failure. Neo4j provides documentation on this process here, but it is beyond the scope of what this utility is intended to do.
Downtime
Restoring a database requires stopping the database, running the restore, starting the database, and then loading the metadata (user, roles, etc.) associated with the database. For a small database (less than 20,000 nodes), this whole process can take less than 30 seconds, but it could be much longer for a larger database. The Infrahub web server won't be able to handle any requests while the database is being restored and it could cause the restore to fail, so we recommend shutting down the Infrahub web application while restoring a database.
Restore in detail
This is a more detailed look at precisely how the restore
command works. It makes use of Neo4j's database restore command and uses a helper Docker container to manage connecting to the Infrahub database container.
- Start a helper Docker container based on the
neo4j-admin
image. This container needs to connect to the Docker volume that the Infrahub database container uses to store its data. It also needs to be able to connect to the Infrahub database container's cypher port (7687
by default). - For each
.backup
file in thebackup_directory
path specified in the command line arguments (excluding anysystem
backups) do the following:- Stop the database about to be restored using the
STOP DATABASE
cypher command. - Run the
neo4j-admin database restore
command. - Create the Neo4j database if it does not exist using the
CREATE DATABASE
cypher command. - Restart the database using the
START DATABASE
cypher command. - The
neo4j-admin database restore
command generates arestore_metadata.cypher
file that is used to restore various metadata about the restored database, such as users and roles. We attempt to execute this file automatically so that you do not need to deal with it manually.
- Stop the database about to be restored using the
- Stop and remove the helper container.