GitOps represents a fundamental shift in how we manage infrastructure and applications. By using Git as the single source of truth for declarative infrastructure and applications, teams can leverage familiar development workflows for operations. After implementing GitOps across multiple production environments, I’ve learned what works, what doesn’t, and how to adopt these practices successfully.

What is GitOps?

GitOps is a set of practices where Git repositories contain the declarative descriptions of infrastructure and applications, and automated processes ensure the actual state matches the desired state. The key principles are:

Declarative: Everything is described declaratively, not imperatively Versioned: All changes are tracked in version control Pulled: Agents in the cluster pull changes rather than pushing from CI/CD Automated: Desired state reconciliation happens automatically

This might sound abstract, so let’s look at a concrete example.

Traditional CI/CD vs GitOps

Traditional CI/CD pipeline:

# .gitlab-ci.yml
deploy:
  stage: deploy
  script:
    - kubectl config use-context production
    - kubectl apply -f manifests/
    - kubectl set image deployment/app app=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
  only:
    - main

Problems with this approach:

  • Cluster credentials in CI/CD system
  • No audit trail of what’s deployed
  • Difficult to rollback
  • State drift between Git and cluster
  • No visibility into deployment status

GitOps approach:

# Repository structure
config-repo/
β”œβ”€β”€ apps/
β”‚   └── myapp/
β”‚       β”œβ”€β”€ deployment.yaml
β”‚       β”œβ”€β”€ service.yaml
β”‚       └── kustomization.yaml
β”œβ”€β”€ infrastructure/
β”‚   └── namespaces/
β”‚       └── production.yaml
└── system/
    └── flux/
        └── flux.yaml

The deployment manifest:

# apps/myapp/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
        - name: myapp
          image: registry.example.com/myapp:v1.2.3
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 512Mi

An operator (like Flux or Argo CD) watches this repository and applies changes automatically.

Implementing GitOps with Flux

Flux is a popular GitOps operator. Installation:

# Install Flux CLI
curl -s https://fluxcd.io/install.sh | sudo bash

# Bootstrap Flux
flux bootstrap github \
  --owner=myorg \
  --repository=fleet-infra \
  --branch=main \
  --path=clusters/production \
  --personal

This creates a repository structure and installs Flux in your cluster. Flux now watches this repository for changes.

Define a GitRepository source:

# clusters/production/sources/app-repo.yaml
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: GitRepository
metadata:
  name: app-repo
  namespace: flux-system
spec:
  interval: 1m
  url: https://github.com/myorg/app-config
  ref:
    branch: main

Create a Kustomization to deploy:

# clusters/production/apps/myapp.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
  name: myapp
  namespace: flux-system
spec:
  interval: 5m
  path: ./apps/myapp
  prune: true
  sourceRef:
    kind: GitRepository
    name: app-repo
  validation: client
  healthChecks:
    - apiVersion: apps/v1
      kind: Deployment
      name: myapp
      namespace: production

Now, any changes to the app-repo repository are automatically applied to the cluster.

Implementing GitOps with Argo CD

Argo CD provides a more opinionated approach with a rich UI:

# Install Argo CD
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# Access UI
kubectl port-forward svc/argocd-server -n argocd 8080:443

Define an application:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/app-config
    targetRevision: HEAD
    path: apps/myapp
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

Argo CD continuously monitors the Git repository and synchronizes the cluster state.

Environment Management

A common pattern is managing multiple environments:

config-repo/
β”œβ”€β”€ base/
β”‚   β”œβ”€β”€ deployment.yaml
β”‚   β”œβ”€β”€ service.yaml
β”‚   └── kustomization.yaml
β”œβ”€β”€ overlays/
β”‚   β”œβ”€β”€ staging/
β”‚   β”‚   β”œβ”€β”€ kustomization.yaml
β”‚   β”‚   └── patches.yaml
β”‚   └── production/
β”‚       β”œβ”€β”€ kustomization.yaml
β”‚       └── patches.yaml

Base deployment:

# base/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
        - name: myapp
          image: registry.example.com/myapp:latest
          resources:
            requests:
              cpu: 50m
              memory: 64Mi

Production overlay:

# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
  - ../../base
namespace: production
replicas:
  - name: myapp
    count: 5
images:
  - name: registry.example.com/myapp
    newTag: v1.2.3
patches:
  - path: patches.yaml
# overlays/production/patches.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  template:
    spec:
      containers:
        - name: myapp
          resources:
            requests:
              cpu: 200m
              memory: 256Mi
            limits:
              cpu: 1000m
              memory: 1Gi

Automated Image Updates

One challenge with GitOps is updating container images. Manual approach:

# Developer workflow
git clone https://github.com/myorg/app-config
cd app-config
kustomize edit set image registry.example.com/myapp:v1.2.4
git add .
git commit -m "Update myapp to v1.2.4"
git push

Automated approach with Flux:

apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImageRepository
metadata:
  name: myapp
  namespace: flux-system
spec:
  image: registry.example.com/myapp
  interval: 1m
---
apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImagePolicy
metadata:
  name: myapp
  namespace: flux-system
spec:
  imageRepositoryRef:
    name: myapp
  policy:
    semver:
      range: 1.2.x
---
apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImageUpdateAutomation
metadata:
  name: myapp
  namespace: flux-system
spec:
  interval: 1m
  sourceRef:
    kind: GitRepository
    name: app-repo
  git:
    checkout:
      ref:
        branch: main
    commit:
      author:
        email: fluxcdbot@example.com
        name: fluxcdbot
      messageTemplate: |
        Update image to {{range .Updated.Images}}{{println .}}{{end}}
    push:
      branch: main
  update:
    path: ./apps/myapp
    strategy: Setters

Mark the image field for updates:

spec:
  containers:
    - name: myapp
      image: registry.example.com/myapp:v1.2.3 # {"$imagepolicy": "flux-system:myapp"}

Flux automatically updates the image tag when new versions matching the policy are available.

Progressive Delivery with Flagger

Combine GitOps with progressive delivery:

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: myapp
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  progressDeadlineSeconds: 60
  service:
    port: 8080
  analysis:
    interval: 1m
    threshold: 5
    maxWeight: 50
    stepWeight: 10
    metrics:
      - name: request-success-rate
        thresholdRange:
          min: 99
        interval: 1m
      - name: request-duration
        thresholdRange:
          max: 500
        interval: 1m
  webhooks:
    - name: load-test
      url: http://loadtester.test/
      timeout: 5s
      metadata:
        cmd: "hey -z 1m -q 10 -c 2 http://myapp-canary:8080/"

When you update the deployment in Git, Flagger automatically:

  1. Creates a canary deployment
  2. Gradually shifts traffic
  3. Monitors metrics
  4. Promotes or rolls back based on success criteria

Secret Management

Secrets shouldn’t be in Git. Solutions:

Sealed Secrets:

# Install kubeseal
brew install kubeseal

# Create sealed secret
kubectl create secret generic mysecret \
  --from-literal=password=supersecret \
  --dry-run=client -o yaml | \
  kubeseal -o yaml > sealed-secret.yaml

# Commit sealed secret to Git
git add sealed-secret.yaml
git commit -m "Add sealed secret"

The sealed secret can be decrypted only by the controller in the cluster:

apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
  name: mysecret
  namespace: production
spec:
  encryptedData:
    password: AgBy3i4OJSWK+PiTySYZZA9rO43cGDEq...

External Secrets Operator:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: myapp-secrets
  namespace: production
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: SecretStore
  target:
    name: myapp-secrets
    creationPolicy: Owner
  data:
    - secretKey: database-password
      remoteRef:
        key: prod/myapp/db-password

This fetches secrets from external systems (AWS Secrets Manager, HashiCorp Vault, etc.) without storing them in Git.

Handling Terraform with GitOps

For infrastructure provisioning, combine Terraform with GitOps:

apiVersion: infra.contrib.fluxcd.io/v1alpha1
kind: Terraform
metadata:
  name: vpc
  namespace: flux-system
spec:
  interval: 10m
  path: ./terraform/vpc
  sourceRef:
    kind: GitRepository
    name: infra-repo
  varsFrom:
    - kind: ConfigMap
      name: vpc-vars
  writeOutputsToSecret:
    name: vpc-outputs

The Terraform controller executes terraform apply automatically when configurations change.

Observability and Notifications

Monitor GitOps operations:

apiVersion: notification.toolkit.fluxcd.io/v1beta1
kind: Alert
metadata:
  name: on-sync-failure
  namespace: flux-system
spec:
  providerRef:
    name: slack
  eventSeverity: error
  eventSources:
    - kind: Kustomization
      name: '*'
  summary: "Sync failure in {{ .InvolvedObject.name }}"
---
apiVersion: notification.toolkit.fluxcd.io/v1beta1
kind: Provider
metadata:
  name: slack
  namespace: flux-system
spec:
  type: slack
  channel: deployments
  secretRef:
    name: slack-webhook

Access Control and Security

GitOps requires careful access control:

Repository Access:

  • Separate repositories for different teams/services
  • Use branch protection and required reviews
  • Implement CODEOWNERS for critical paths

Cluster Permissions:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: flux
  namespace: flux-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: flux
rules:
  - apiGroups: ["*"]
    resources: ["*"]
    verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: flux
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: flux
subjects:
  - kind: ServiceAccount
    name: flux
    namespace: flux-system

Limit scope to specific namespaces in production:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: flux
  namespace: production
rules:
  - apiGroups: ["apps", ""]
    resources: ["deployments", "services", "configmaps"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]

Disaster Recovery

GitOps makes disaster recovery straightforward:

  1. Restore cluster: Provision new cluster
  2. Bootstrap GitOps: Install Flux/Argo CD
  3. Point to Git: Configure repository sources
  4. Automatic reconciliation: Everything redeploys from Git

Document this process:

#!/bin/bash
# disaster-recovery.sh

# 1. Create new cluster
eksctl create cluster -f cluster-config.yaml

# 2. Bootstrap Flux
flux bootstrap github \
  --owner=myorg \
  --repository=fleet-infra \
  --branch=main \
  --path=clusters/production

# 3. Restore external state (databases, etc.)
./restore-databases.sh

# 4. Verify deployment
flux get kustomizations
kubectl get pods --all-namespaces

Best Practices

From production experience:

  1. Start small: Begin with non-critical applications
  2. Separate concerns: Different repos for apps, infrastructure, and system components
  3. Branch strategy: Use branches for environments or features
  4. Testing: Validate manifests in CI before merge
  5. Monitoring: Alert on sync failures and drift
  6. Documentation: Keep runbooks in the same repo as configs
  7. Gradual rollout: Use progressive delivery for changes

Common Pitfalls

Avoid these mistakes:

Large monolithic repos: Split into manageable repositories No validation: Implement pre-commit hooks and CI checks Ignoring drift: Configure reconciliation to fix drift automatically Poor secret management: Never commit secrets, use sealed secrets or external stores Inadequate RBAC: Limit GitOps operator permissions appropriately

Conclusion

GitOps transforms operations by applying software development practices to infrastructure management. The benefits are substantial:

  • Full audit trail of all changes
  • Easy rollback to any previous state
  • Disaster recovery through Git
  • Consistent deployment process
  • Reduced operational complexity

The initial investment in setting up GitOps workflows pays dividends in reliability, velocity, and peace of mind. Start with a pilot project, validate the approach, and expand gradually. Once teams experience the benefits of declarative, Git-driven operations, there’s no going back.