Understanding Git repositories in Infrahub
Infrahub integrates with external Git repositories to enable comprehensive version-controlled infrastructure management. This integration allows you to store and manage transformations, queries, generators, and other automation resources alongside your infrastructure data. This topic explains how Infrahub's Git integration works, the available repository types, and the architectural decisions behind these features.
Why Git integration matters
Modern infrastructure management requires both structured data (network configurations, device inventories) and code (automation scripts, templates, validation logic). By integrating with Git repositories, Infrahub bridges these two worlds, enabling:
- Unified version control: Both your infrastructure data and automation code follow the same branching and merging workflows
- Familiar developer workflows: Teams can leverage existing Git practices and tooling
- Separation of concerns: Infrastructure data resides in Infrahub's graph database while code remains in Git
- Enhanced collaboration: Multiple teams can work on different aspects of infrastructure without conflicts
Repository types and their design rationale
Infrahub supports two distinct repository connection types, each designed for specific use cases and operational requirements.
Repository: full bidirectional integration
The standard Repository type provides comprehensive, bidirectional integration between Infrahub and Git. This design choice fulfills the need for truly unified version control across infrastructure data and code.
Design principles:
- Branch parity: Every branch in the external repository will have a corresponding branch in Infrahub
- Automatic synchronization: Changes from Git are pulled automatically via background tasks
- Bidirectional flow: Changes can flow from Git to Infrahub and from Infrahub back to Git (via Proposed Changes)
How it works internally:
When you create a Repository connection, Infrahub:
- Clones the repository to local storage
- Creates Git worktrees for each branch to enable parallel operations
- Tracks all remote branches and creates corresponding Infrahub branches
- Runs background synchronization tasks every few seconds to detect changes
- Parses the
.infrahub.yml
file to determine which resources to import
The bidirectional nature means that when you merge a Proposed Change between two Infrahub branches that are both linked to Git branches, Infrahub creates a merge commit and automatically pushes it back to the external repository.
Read-only Repository: controlled unidirectional flow
The Read-only Repository type offers a simpler, unidirectional integration designed for scenarios where you need to consume resources from Git without modifying the external repository.
Design principles:
- Single reference tracking: Links one specific Git ref (branch, tag, or commit)
- Manual synchronization: Updates only happen when the
ref
property is changed - Protection of external resources: Guarantees no modifications to the external repository
How it works internally:
Read-only repositories:
- Track a single Git reference rather than all branches
- Pull changes only when the
ref
property is updated - Import resources according to the
.infrahub.yml
configuration - Maintain a simpler state model without complex branch synchronization logic
Architecture and data flow
Understanding how Infrahub manages Git repositories helps clarify the system's capabilities and constraints.
Component responsibilities
The architecture carefully separates concerns between components:
- Infrahub API Server: Never directly connects to Git repositories; handles GraphQL mutations and queries
- Task Workers: Manage all Git operations including cloning, fetching, and pushing
- Message Broker: Coordinates communication between API server and task workers
- File System Storage: Maintains local clones and worktrees in
git.repositories_directory
This separation ensures that Git operations don't block API requests and enables horizontal scaling of task workers for improved performance.
Synchronization mechanisms
For standard Repositories:
- Background tasks poll for changes multiple times per minute
- Each sync operation fetches from origin and compares local vs. remote state
- New branches automatically trigger Infrahub branch creation
- Updated branches trigger re-import of changed resources
For Read-only Repositories:
- No automatic synchronization occurs
- Updates require manual modification of the
ref
property - Each update fetches the specified commit and imports resources according to configuration
The .infrahub.yml
configuration file
The .infrahub.yml
file acts as a manifest, declaring which resources Infrahub should import from the repository. This declarative approach enables:
- Selective importing of repository contents based on specific needs
- Clear boundaries between Infrahub-managed and external resources
- Version-specific configurations (different branches can have different
.infrahub.yml
files)
For detailed information about .infrahub.yml
configuration and syntax, see Infrahub configuration file.
Status tracking and health monitoring
Repositories track their operational state through three distinct status dimensions, each serving a different monitoring purpose.
Admin status: lifecycle management
Tracks whether Infrahub is actively using the repository:
- Active: Repository is in production use
- Inactive: Repository is disabled or unused
- Staging: Repository is pending activation via Proposed Change
Operational status: connectivity health
Monitors the connection between Infrahub and the Git server:
- Online: Successfully connected and authenticated
- Error (credential): Authentication failure
- Error (connection): Network or connectivity issue
- Error: Other initialization or configuration problems
- Unknown: Status cannot be determined
Sync status: data consistency
Indicates the state of data synchronization:
- In sync: All repository resources are current in Infrahub
- Syncing: Synchronization operation in progress
- Error (import): Failed to import resources from repository
- Unknown: Synchronization state cannot be determined
Repository actions and operations
Beyond basic synchronization, repositories support administrative actions for troubleshooting and maintenance:
- Check connectivity: Validates network access and authentication without modifying data
- Reimport last commit: Forces re-processing of the current commit without fetching new changes, useful for recovering from import errors
Design trade-offs and constraints
The current architecture makes several important trade-offs:
Branch hierarchy limitations
Infrahub currently supports only single-level branch hierarchy (branches created from main). This simplification:
- Reduces complexity in conflict resolution
- Ensures predictable merge paths
- May limit advanced Git workflows
Synchronization frequency
The polling-based synchronization approach:
- Provides near-real-time updates (within seconds)
- Avoids webhook complexity and firewall issues
- May introduce slight delays compared to push-based systems
Integration with Infrahub workflows
Repository integration enhances several Infrahub features:
Proposed changes
When merging Proposed Changes where both source and destination branches are linked to Git:
- Infrahub creates a merge commit combining both branches
- The merge commit is pushed to the destination branch in Git
- This maintains consistency between Infrahub and Git history
Version control benefits
The integration provides:
- Full audit trail of who changed what and when
- Ability to roll back both data and code changes
- Parallel development through branching
- Code review workflows via Git pull requests
Further reading
- How to connect external Git repositories - Practical guide for setting up repositories
- Infrahub configuration file - Detailed
.infrahub.yml
reference - Proposed Changes - Understanding the merge and review process
- Version control in Infrahub - Broader context of Infrahub's version control features