Cloud Cost Optimization: FinOps for Kubernetes

Cloud costs can spiral without proper controls. After optimizing Kubernetes spending and reducing costs by 40%, I’ve learned effective FinOps strategies.

Right-Sizing Resources

Use VPA for recommendations:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: my-app
  updateMode: "Recommend"  # Don't auto-apply yet

Cluster Autoscaling

Scale nodes based on demand:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler
data:
  min-nodes: "3"
  max-nodes: "20"
  scale-down-delay: "10m"

Spot Instances

Use spot/preemptible for fault-tolerant workloads:

apiVersion: v1
kind: Pod
spec:
  tolerations:
    - key: "spot"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"

Cost Monitoring

Track spending per team/service:

sum(kube_pod_container_resource_requests{resource="cpu"})
  by (namespace)
* on(node) group_left(instance_type, cost_per_hour)
  node_labels

Optimization Checklist

Right-size pods based on actual usage
Use horizontal and vertical autoscaling
Leverage spot instances for stateless workloads
Set resource quotas per namespace
Monitor and alert on cost anomalies
Delete unused resources regularly

Reserved Instances and Savings Plans

Commit to long-term usage for discounts:

# Cost analysis for reserved capacity
reserved_instances:
  strategy: "Analyze 30-day usage patterns"
  steps:
    - "Identify steady-state workloads"
    - "Calculate break-even point (typically 6-12 months)"
    - "Purchase 1-year or 3-year reservations"
    - "Use convertible RIs for flexibility"

savings_plans:
  compute_savings_plan:
    commitment: "$100/hour for 1 year"
    discount: "Up to 66% vs on-demand"
    flexibility: "Any instance family, size, region"

  instance_savings_plan:
    commitment: "$50/hour for 3 years"
    discount: "Up to 72% vs on-demand"
    flexibility: "Specific instance family in region"

Container Resource Optimization

Right-size container requests and limits:

# Before optimization
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: api
    resources:
      requests:
        cpu: "1000m"       # Overprovisioned
        memory: "2Gi"      # Overprovisioned
      limits:
        cpu: "2000m"
        memory: "4Gi"

# After optimization (based on VPA recommendations)
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: api
    resources:
      requests:
        cpu: "250m"        # Actual usage: 150-200m
        memory: "512Mi"    # Actual usage: 300-400Mi
      limits:
        cpu: "500m"
        memory: "1Gi"

Use VPA to recommend optimal sizes:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: api
  updateMode: "Recommend"  # Or "Auto" for automatic updates
  resourcePolicy:
    containerPolicies:
    - containerName: api
      minAllowed:
        cpu: 100m
        memory: 256Mi
      maxAllowed:
        cpu: 2000m
        memory: 4Gi

Storage Cost Optimization

Optimize persistent volume usage:

# Implement storage lifecycle
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-storage
spec:
  storageClassName: gp3  # Use cost-effective gp3 instead of io2
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi

---
# Automated backup retention
apiVersion: v1
kind: ConfigMap
metadata:
  name: backup-policy
data:
  retention.yaml: |
    daily:
      keep: 7
      delete_after_days: 7
    weekly:
      keep: 4
      delete_after_days: 28
    monthly:
      keep: 12
      delete_after_days: 365

Delete unused volumes:

# Find unattached volumes
kubectl get pv | grep Released

# Automation script
#!/bin/bash
# Delete volumes released for > 7 days
kubectl get pv -o json | jq -r '
  .items[] |
  select(.status.phase == "Released") |
  select((now - (.metadata.creationTimestamp | fromdateiso8601)) > 604800) |
  .metadata.name
' | xargs -I {} kubectl delete pv {}

Network Cost Optimization

Reduce data transfer costs:

# Use private endpoints for cross-AZ traffic
apiVersion: v1
kind: Service
metadata:
  name: database
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-internal: "true"
spec:
  type: LoadBalancer
  selector:
    app: database

---
# Implement caching to reduce external API calls
apiVersion: v1
kind: ConfigMap
metadata:
  name: cache-config
data:
  redis.conf: |
    maxmemory 2gb
    maxmemory-policy allkeys-lru
    save ""  # Disable persistence for cache

Colocate services to minimize cross-region traffic:

# Pod affinity to keep related services together
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  template:
    spec:
      affinity:
        podAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - database
              topologyKey: "topology.kubernetes.io/zone"

Monitoring and Alerting

Track costs with Prometheus and alert on anomalies:

# Cost monitoring rules
groups:
  - name: cost-alerts
    rules:
      - alert: HighNodeCount
        expr: count(kube_node_info) > 50
        for: 10m
        annotations:
          summary: "Cluster has {{ $value }} nodes"
          description: "Node count exceeds budget threshold"

      - alert: UnusedResources
        expr: |
          sum(kube_pod_container_resource_requests{resource="cpu"})
          /
          sum(kube_node_status_allocatable{resource="cpu"})
          < 0.50
        for: 1h
        annotations:
          summary: "CPU utilization below 50%"
          description: "Consider downsizing cluster"

      - alert: CostAnomaly
        expr: |
          rate(cloud_cost_total[1h]) >
          1.2 * avg_over_time(rate(cloud_cost_total[1h])[7d:1h])
        for: 30m
        annotations:
          summary: "Cost increased by >20% vs 7-day average"

Kubernetes Resource Quotas

Prevent cost overruns with quotas:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-a
spec:
  hard:
    requests.cpu: "100"
    requests.memory: "200Gi"
    requests.storage: "1Ti"
    persistentvolumeclaims: "50"
    pods: "100"
    services.loadbalancers: "5"

---
# Limit ranges for default constraints
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-a
spec:
  limits:
  - max:
      cpu: "4"
      memory: "8Gi"
    min:
      cpu: "100m"
      memory: "128Mi"
    default:
      cpu: "500m"
      memory: "512Mi"
    defaultRequest:
      cpu: "250m"
      memory: "256Mi"
    type: Container

Scheduled Scaling

Scale down non-production environments during off-hours:

# CronJob to scale down at night
apiVersion: batch/v1
kind: CronJob
metadata:
  name: scale-down-dev
  namespace: development
spec:
  schedule: "0 19 * * 1-5"  # 7 PM weekdays
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: scaler
          containers:
          - name: kubectl
            image: bitnami/kubectl
            command:
            - /bin/sh
            - -c
            - |
              kubectl scale deployment --all --replicas=0 -n development
              kubectl patch hpa --all -p '{"spec":{"minReplicas":0}}' -n development
          restartPolicy: OnFailure

---
# CronJob to scale up in morning
apiVersion: batch/v1
kind: CronJob
metadata:
  name: scale-up-dev
  namespace: development
spec:
  schedule: "0 8 * * 1-5"  # 8 AM weekdays
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: scaler
          containers:
          - name: kubectl
            image: bitnami/kubectl
            command:
            - /bin/sh
            - -c
            - |
              kubectl scale deployment api --replicas=3 -n development
              kubectl scale deployment worker --replicas=2 -n development
              kubectl patch hpa --all -p '{"spec":{"minReplicas":2}}' -n development
          restartPolicy: OnFailure

Kubecost Integration

Use Kubecost for comprehensive cost visibility:

# Install Kubecost
apiVersion: v1
kind: Namespace
metadata:
  name: kubecost

---
# Kubecost configuration
apiVersion: v1
kind: ConfigMap
metadata:
  name: kubecost-config
  namespace: kubecost
data:
  config.json: |
    {
      "cloudProviderConfig": {
        "provider": "aws",
        "pricing": {
          "spotLabel": "lifecycle",
          "spotValue": "spot"
        }
      },
      "currencyCode": "USD",
      "savingsRecommendations": true
    }

Query Kubecost API for cost data:

# Get cost by namespace
curl "http://kubecost:9090/model/allocation?window=7d&aggregate=namespace"

# Get cost by deployment
curl "http://kubecost:9090/model/allocation?window=7d&aggregate=deployment"

# Get savings recommendations
curl "http://kubecost:9090/model/savings"

Cost Allocation and Chargeback

Implement showback/chargeback:

# Label resources for cost attribution
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
  labels:
    team: platform
    project: customer-portal
    environment: production
    cost-center: eng-platform
spec:
  template:
    metadata:
      labels:
        team: platform
        project: customer-portal
        environment: production
        cost-center: eng-platform

Generate cost reports:

# Cost by team
sum(
  kube_pod_container_resource_requests{resource="cpu"} * on(node) group_left() node_cost_per_cpu_hour
  +
  kube_pod_container_resource_requests{resource="memory"} * on(node) group_left() node_cost_per_memory_gb_hour
) by (label_team)

# Cost by project
sum(
  kube_pod_container_resource_requests{resource="cpu"} * on(node) group_left() node_cost_per_cpu_hour
  +
  kube_pod_container_resource_requests{resource="memory"} * on(node) group_left() node_cost_per_memory_gb_hour
) by (label_project)

FinOps Best Practices

Establish FinOps culture:

Visibility: Make costs visible to all engineers
Accountability: Team ownership of costs
Optimization: Continuous cost improvement
Forecasting: Predict future costs based on trends
Governance: Policies to prevent waste

# FinOps team charter
responsibilities:
  - "Cost visibility dashboards"
  - "Monthly cost reviews with teams"
  - "Optimization recommendations"
  - "Budget tracking and forecasting"
  - "Policy enforcement (quotas, limits)"
  - "Training teams on cost-aware development"

metrics:
  - "Cost per transaction"
  - "Cost per user"
  - "Infrastructure efficiency ratio"
  - "Waste percentage (unused resources)"
  - "Savings from optimization initiatives"

Conclusion

Effective Kubernetes cost optimization requires:

Visibility through comprehensive monitoring and reporting
Right-sizing based on actual usage patterns
Autoscaling to match capacity with demand
Spot instances for fault-tolerant workloads
Resource quotas to prevent overprovisioning
Storage optimization and lifecycle management
Network efficiency to reduce data transfer costs
Scheduled scaling for non-production environments
Cost allocation for accountability
FinOps culture with continuous improvement

Start by establishing cost visibility, then systematically address the largest cost drivers. Automate optimization where possible, and create feedback loops so teams see the cost impact of their decisions. Cost optimization is not a one-time project but an ongoing practice that requires tooling, processes, and cultural change.

The organizations that excel at cost optimization treat it as an engineering problem, invest in automation and observability, and align incentives so teams are motivated to optimize costs while maintaining reliability and performance.