Skip to main content

How to backup and restore Infrahub

This guide shows you how to create comprehensive backups of your Infrahub deployment and restore them when needed. You'll learn to backup the Neo4j graph database, artifact storage, and task management data to ensure complete data recovery capabilities.

Prerequisites

  • Running Infrahub deployment (Docker Compose or Kubernetes)
  • Administrative access to the Neo4j database
  • Access to the artifact storage location (S3 or local filesystem)
  • Sufficient storage space for backup files
  • For cluster deployments: Understanding of your cluster topology

Create a full backup

Step 1: Install the backup tool

Install the infrahub-backup CLI tool:

curl https://infrahub.opsmill.io/ops/$(uname -s)/$(uname -m)/infrahub-backup -o infrahub-backup
chmod +x infrahub-backup

Step 2: Backup the databases

Create a backup of your running Infrahub instance:

./infrahub-backup create

The tool automatically:

  • Checks for running tasks before starting (use --force to skip)
  • Creates a timestamped backup archive (for example, infrahub_backup_20250129_153045.tar.gz)
  • Backs up Neo4j database with metadata (configurable with --neo4jmetadata)
  • Backs up Prefect/PostgreSQL task management database
  • Calculates SHA256 checksums for integrity verification
note

Artifact storage backup is planned for future versions and must currently be handled separately.

Step 3: Backup the artifact store

If using S3 for artifact storage, use AWS CLI or your preferred S3 backup tool:

# Sync S3 bucket to local backup directory
aws s3 sync s3://your-infrahub-bucket /backup/artifacts/

Restore from backup

Step 1: Prepare the environment

Ensure Infrahub services are running before starting the restore process. You can start from a scratch/blank deployment.

Restore from a backup archive:

./infrahub-backup restore infrahub_backup_20250129_153045.tar.gz

The tool automatically:

  • Validates backup integrity using checksums
  • Wipes cache and message queue data
  • Stops application containers
  • Restores PostgreSQL database first
  • Restores Neo4j database with metadata
  • Restarts all services in correct order

Step 2: Restore the databases

This is automatically handled by infrahub-backup.

Step 3: Restore the artifact store

# Restore S3 bucket from backup
aws s3 sync /backup/artifacts/ s3://your-infrahub-bucket

Step 4: Restart Infrahub services

This is automatically handled by infrahub-backup.

Validation

Verify your restoration was successful:

  1. Check database status:

    docker compose exec -T database cypher-shell -u neo4j \
    -c "SHOW DATABASES;"

    The Neo4j database should show as "online".

  2. Verify Infrahub API:

    curl http://localhost:8000/api/schema/summary

    You should receive a valid schema response.

  3. Check task manager:

    docker compose logs task-manager --tail 50

    Logs should show normal operation without errors.

  4. Test artifact retrieval: Access the Infrahub UI and verify that stored artifacts (transforms, queries) are accessible.

Backup and restore Neo4j clusters Enterprise Edition

If you're running Infrahub with a Neo4j cluster, follow these steps to backup from one node and restore to another while maintaining cluster integrity.

Prerequisites for cluster operations

  • Neo4j cluster with at least 3 nodes
  • Administrative access to all cluster nodes
  • Understanding of your cluster topology (leader and follower nodes)
caution

Always run backup and restore commands as the neo4j user inside containers to avoid permission issues with data files.

Example cluster topology

NodeRole
databaseLeader
database-core2Follower
database-core3Follower

Step 1: Create backup from a follower node

docker exec -it -u neo4j infrahub-database-core2-1 bash
mkdir -p backups
neo4j-admin database backup --to-path=backups/ neo4j
ls backups
# Output should include:
# neo4j-2025-03-24T19-57-18.backup

Step 2: Transfer backup to target node

# Copy from source container to local
docker cp infrahub-database-core2-1:/var/lib/neo4j/backups/neo4j-2025-03-24T19-57-18.backup .

# Copy from local to target container
docker cp neo4j-2025-03-24T19-57-18.backup \
infrahub-database-core3-1:/var/lib/neo4j/

Step 3: Drop database cluster-wide

Connect to any cluster node:

cypher-shell -d system -u neo4j
DROP DATABASE neo4j;
SHOW SERVERS;

drop database

Step 4: Clean target node data

Connect to the target container:

docker exec -it -u neo4j infrahub-database-core3-1 bash

Remove any existing data to avoid corruption:

rm -rf /data/databases/neo4j
rm -rf /data/transactions/neo4j

Then restart the container to ensure a clean state:

docker restart infrahub-database-core3-1

Step 5: Restore backup on target node

Reconnect to the container:

docker exec -it -u neo4j infrahub-database-core3-1 bash

Run the restore command:

neo4j-admin database restore \
--from-path=/var/lib/neo4j/neo4j-2025-03-24T19-57-18.backup neo4j

Restore database

Step 6: Identify seed instance id

Connect via Cypher shell (on the system database):

cypher-shell -d system -u neo4j

Run:

SHOW SERVERS;

Note the serverId for your target node (example: d05fce79-e63e-485a-9ce7-1abbf9d18fce).

Seed database

Step 7: Recreate database from seed

Run the following Cypher command:

CREATE DATABASE neo4j
TOPOLOGY 3 PRIMARIES
OPTIONS {
existingData: 'use',
existingDataSeedInstance: 'd05fce79-e63e-485a-9ce7-1abbf9d18fce'
};

Choose seed database

Step 8: Verify cluster sync

Check that the database is coming online:

SHOW DATABASES;

Online database

Then validate cluster sync status:

SHOW SERVERS;

Status sync servers

All nodes should eventually show the Neo4j database as online.

Troubleshooting
  • If nodes show as dirty or offline, check logs and verify /data/databases/neo4j/neostore exists
  • The CREATE DATABASE ... OPTIONS { existingData: 'use' } command is required to register restored data with the cluster

Restore cluster backup to standalone instance

If you need to analyze data from a production cluster in an isolated environment, follow these steps to restore a cluster backup to a standalone Neo4j instance.

Step 1: Create cluster backup

Create a backup from any cluster node:

neo4j-admin database backup --to-path=backups/ neo4j
# Resulting file: neo4j-2025-03-24T19-57-18.backup

Step 2: Transfer backup to standalone instance

docker cp neo4j-2025-03-24T19-57-18.backup \
infrahub-database-1:/var/lib/neo4j/

Step 3: Prepare standalone instance

Connect to the container:

docker exec -it -u neo4j infrahub-database-1 bash

Clean any existing Neo4j database (optional but recommended):

rm -rf /data/databases/neo4j
rm -rf /data/transactions/neo4j

Drop the Neo4j Database

cypher-shell -d system -u neo4j
DROP DATABASE neo4j;
SHOW SERVERS;

Choose seed database

Step 4: Restore the backup

Restore the backup file:

neo4j-admin database restore \
--from-path=/var/lib/neo4j/neo4j-2025-03-24T19-57-18.backup neo4j

Step 5: Create the database

Run the following Cypher command:

CREATE DATABASE neo4j

Step 6: Verify the status

Check that the database is coming online:

SHOW DATABASES;

Choose seed database

Then validate database status:

SHOW SERVERS;

Choose seed database

info

This process restores only data, not cluster roles, replication, or configuration settings.

Advanced usage

Using the Python-based backup utility

Legacy Tool

The Python-based utility (utilities/db_backup) is still available in the main Infrahub repository but is being replaced by infrahub-backup. Use it only if infrahub-backup doesn't meet your specific requirements.

Use non-default ports

If your deployment uses custom ports, specify them during backup and restore operations:

# Backup with custom backup port
python -m utilities.db_backup neo4j backup \
--database-backup-port=12345 \
/infrahub_backups

# Restore with custom Cypher port
python -m utilities.db_backup neo4j restore \
/infrahub_backups \
--database-cypher-port=9876

Run backup tool via Docker

If you don't have the repository cloned locally, run the backup tool directly from the Infrahub Docker image:

docker run --rm \
-v /var/run/docker.sock:/var/run/docker.sock \
registry.opsmill.io/opsmill/infrahub \
python -m utilities.db_backup