GitOps represents a fundamental shift in how we manage infrastructure and applications. By using Git as the single source of truth for declarative infrastructure and applications, teams can leverage familiar development workflows for operations. After implementing GitOps across multiple production environments, Iβve learned what works, what doesnβt, and how to adopt these practices successfully.
What is GitOps?
GitOps is a set of practices where Git repositories contain the declarative descriptions of infrastructure and applications, and automated processes ensure the actual state matches the desired state. The key principles are:
Declarative: Everything is described declaratively, not imperatively Versioned: All changes are tracked in version control Pulled: Agents in the cluster pull changes rather than pushing from CI/CD Automated: Desired state reconciliation happens automatically
This might sound abstract, so letβs look at a concrete example.
Traditional CI/CD vs GitOps
Traditional CI/CD pipeline:
# .gitlab-ci.yml
deploy:
stage: deploy
script:
- kubectl config use-context production
- kubectl apply -f manifests/
- kubectl set image deployment/app app=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
only:
- main
Problems with this approach:
- Cluster credentials in CI/CD system
- No audit trail of whatβs deployed
- Difficult to rollback
- State drift between Git and cluster
- No visibility into deployment status
GitOps approach:
# Repository structure
config-repo/
βββ apps/
β βββ myapp/
β βββ deployment.yaml
β βββ service.yaml
β βββ kustomization.yaml
βββ infrastructure/
β βββ namespaces/
β βββ production.yaml
βββ system/
βββ flux/
βββ flux.yaml
The deployment manifest:
# apps/myapp/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: registry.example.com/myapp:v1.2.3
ports:
- containerPort: 8080
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
An operator (like Flux or Argo CD) watches this repository and applies changes automatically.
Implementing GitOps with Flux
Flux is a popular GitOps operator. Installation:
# Install Flux CLI
curl -s https://fluxcd.io/install.sh | sudo bash
# Bootstrap Flux
flux bootstrap github \
--owner=myorg \
--repository=fleet-infra \
--branch=main \
--path=clusters/production \
--personal
This creates a repository structure and installs Flux in your cluster. Flux now watches this repository for changes.
Define a GitRepository source:
# clusters/production/sources/app-repo.yaml
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: GitRepository
metadata:
name: app-repo
namespace: flux-system
spec:
interval: 1m
url: https://github.com/myorg/app-config
ref:
branch: main
Create a Kustomization to deploy:
# clusters/production/apps/myapp.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
name: myapp
namespace: flux-system
spec:
interval: 5m
path: ./apps/myapp
prune: true
sourceRef:
kind: GitRepository
name: app-repo
validation: client
healthChecks:
- apiVersion: apps/v1
kind: Deployment
name: myapp
namespace: production
Now, any changes to the app-repo repository are automatically applied to the cluster.
Implementing GitOps with Argo CD
Argo CD provides a more opinionated approach with a rich UI:
# Install Argo CD
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
# Access UI
kubectl port-forward svc/argocd-server -n argocd 8080:443
Define an application:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/myorg/app-config
targetRevision: HEAD
path: apps/myapp
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
Argo CD continuously monitors the Git repository and synchronizes the cluster state.
Environment Management
A common pattern is managing multiple environments:
config-repo/
βββ base/
β βββ deployment.yaml
β βββ service.yaml
β βββ kustomization.yaml
βββ overlays/
β βββ staging/
β β βββ kustomization.yaml
β β βββ patches.yaml
β βββ production/
β βββ kustomization.yaml
β βββ patches.yaml
Base deployment:
# base/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 1
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: registry.example.com/myapp:latest
resources:
requests:
cpu: 50m
memory: 64Mi
Production overlay:
# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../../base
namespace: production
replicas:
- name: myapp
count: 5
images:
- name: registry.example.com/myapp
newTag: v1.2.3
patches:
- path: patches.yaml
# overlays/production/patches.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
template:
spec:
containers:
- name: myapp
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 1000m
memory: 1Gi
Automated Image Updates
One challenge with GitOps is updating container images. Manual approach:
# Developer workflow
git clone https://github.com/myorg/app-config
cd app-config
kustomize edit set image registry.example.com/myapp:v1.2.4
git add .
git commit -m "Update myapp to v1.2.4"
git push
Automated approach with Flux:
apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImageRepository
metadata:
name: myapp
namespace: flux-system
spec:
image: registry.example.com/myapp
interval: 1m
---
apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImagePolicy
metadata:
name: myapp
namespace: flux-system
spec:
imageRepositoryRef:
name: myapp
policy:
semver:
range: 1.2.x
---
apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImageUpdateAutomation
metadata:
name: myapp
namespace: flux-system
spec:
interval: 1m
sourceRef:
kind: GitRepository
name: app-repo
git:
checkout:
ref:
branch: main
commit:
author:
email: fluxcdbot@example.com
name: fluxcdbot
messageTemplate: |
Update image to {{range .Updated.Images}}{{println .}}{{end}}
push:
branch: main
update:
path: ./apps/myapp
strategy: Setters
Mark the image field for updates:
spec:
containers:
- name: myapp
image: registry.example.com/myapp:v1.2.3 # {"$imagepolicy": "flux-system:myapp"}
Flux automatically updates the image tag when new versions matching the policy are available.
Progressive Delivery with Flagger
Combine GitOps with progressive delivery:
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: myapp
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
progressDeadlineSeconds: 60
service:
port: 8080
analysis:
interval: 1m
threshold: 5
maxWeight: 50
stepWeight: 10
metrics:
- name: request-success-rate
thresholdRange:
min: 99
interval: 1m
- name: request-duration
thresholdRange:
max: 500
interval: 1m
webhooks:
- name: load-test
url: http://loadtester.test/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://myapp-canary:8080/"
When you update the deployment in Git, Flagger automatically:
- Creates a canary deployment
- Gradually shifts traffic
- Monitors metrics
- Promotes or rolls back based on success criteria
Secret Management
Secrets shouldnβt be in Git. Solutions:
Sealed Secrets:
# Install kubeseal
brew install kubeseal
# Create sealed secret
kubectl create secret generic mysecret \
--from-literal=password=supersecret \
--dry-run=client -o yaml | \
kubeseal -o yaml > sealed-secret.yaml
# Commit sealed secret to Git
git add sealed-secret.yaml
git commit -m "Add sealed secret"
The sealed secret can be decrypted only by the controller in the cluster:
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
name: mysecret
namespace: production
spec:
encryptedData:
password: AgBy3i4OJSWK+PiTySYZZA9rO43cGDEq...
External Secrets Operator:
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: myapp-secrets
namespace: production
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets-manager
kind: SecretStore
target:
name: myapp-secrets
creationPolicy: Owner
data:
- secretKey: database-password
remoteRef:
key: prod/myapp/db-password
This fetches secrets from external systems (AWS Secrets Manager, HashiCorp Vault, etc.) without storing them in Git.
Handling Terraform with GitOps
For infrastructure provisioning, combine Terraform with GitOps:
apiVersion: infra.contrib.fluxcd.io/v1alpha1
kind: Terraform
metadata:
name: vpc
namespace: flux-system
spec:
interval: 10m
path: ./terraform/vpc
sourceRef:
kind: GitRepository
name: infra-repo
varsFrom:
- kind: ConfigMap
name: vpc-vars
writeOutputsToSecret:
name: vpc-outputs
The Terraform controller executes terraform apply automatically when configurations change.
Observability and Notifications
Monitor GitOps operations:
apiVersion: notification.toolkit.fluxcd.io/v1beta1
kind: Alert
metadata:
name: on-sync-failure
namespace: flux-system
spec:
providerRef:
name: slack
eventSeverity: error
eventSources:
- kind: Kustomization
name: '*'
summary: "Sync failure in {{ .InvolvedObject.name }}"
---
apiVersion: notification.toolkit.fluxcd.io/v1beta1
kind: Provider
metadata:
name: slack
namespace: flux-system
spec:
type: slack
channel: deployments
secretRef:
name: slack-webhook
Access Control and Security
GitOps requires careful access control:
Repository Access:
- Separate repositories for different teams/services
- Use branch protection and required reviews
- Implement CODEOWNERS for critical paths
Cluster Permissions:
apiVersion: v1
kind: ServiceAccount
metadata:
name: flux
namespace: flux-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: flux
rules:
- apiGroups: ["*"]
resources: ["*"]
verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: flux
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: flux
subjects:
- kind: ServiceAccount
name: flux
namespace: flux-system
Limit scope to specific namespaces in production:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: flux
namespace: production
rules:
- apiGroups: ["apps", ""]
resources: ["deployments", "services", "configmaps"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
Disaster Recovery
GitOps makes disaster recovery straightforward:
- Restore cluster: Provision new cluster
- Bootstrap GitOps: Install Flux/Argo CD
- Point to Git: Configure repository sources
- Automatic reconciliation: Everything redeploys from Git
Document this process:
#!/bin/bash
# disaster-recovery.sh
# 1. Create new cluster
eksctl create cluster -f cluster-config.yaml
# 2. Bootstrap Flux
flux bootstrap github \
--owner=myorg \
--repository=fleet-infra \
--branch=main \
--path=clusters/production
# 3. Restore external state (databases, etc.)
./restore-databases.sh
# 4. Verify deployment
flux get kustomizations
kubectl get pods --all-namespaces
Best Practices
From production experience:
- Start small: Begin with non-critical applications
- Separate concerns: Different repos for apps, infrastructure, and system components
- Branch strategy: Use branches for environments or features
- Testing: Validate manifests in CI before merge
- Monitoring: Alert on sync failures and drift
- Documentation: Keep runbooks in the same repo as configs
- Gradual rollout: Use progressive delivery for changes
Common Pitfalls
Avoid these mistakes:
Large monolithic repos: Split into manageable repositories No validation: Implement pre-commit hooks and CI checks Ignoring drift: Configure reconciliation to fix drift automatically Poor secret management: Never commit secrets, use sealed secrets or external stores Inadequate RBAC: Limit GitOps operator permissions appropriately
Conclusion
GitOps transforms operations by applying software development practices to infrastructure management. The benefits are substantial:
- Full audit trail of all changes
- Easy rollback to any previous state
- Disaster recovery through Git
- Consistent deployment process
- Reduced operational complexity
The initial investment in setting up GitOps workflows pays dividends in reliability, velocity, and peace of mind. Start with a pilot project, validate the approach, and expand gradually. Once teams experience the benefits of declarative, Git-driven operations, thereβs no going back.