Cloud costs can spiral without proper controls. After optimizing Kubernetes spending and reducing costs by 40%, I’ve learned effective FinOps strategies.
Right-Sizing Resources
Use VPA for recommendations:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: my-app
updateMode: "Recommend" # Don't auto-apply yet
Cluster Autoscaling
Scale nodes based on demand:
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-autoscaler
data:
min-nodes: "3"
max-nodes: "20"
scale-down-delay: "10m"
Spot Instances
Use spot/preemptible for fault-tolerant workloads:
apiVersion: v1
kind: Pod
spec:
tolerations:
- key: "spot"
operator: "Equal"
value: "true"
effect: "NoSchedule"
Cost Monitoring
Track spending per team/service:
sum(kube_pod_container_resource_requests{resource="cpu"})
by (namespace)
* on(node) group_left(instance_type, cost_per_hour)
node_labels
Optimization Checklist
- Right-size pods based on actual usage
- Use horizontal and vertical autoscaling
- Leverage spot instances for stateless workloads
- Set resource quotas per namespace
- Monitor and alert on cost anomalies
- Delete unused resources regularly
Reserved Instances and Savings Plans
Commit to long-term usage for discounts:
# Cost analysis for reserved capacity
reserved_instances:
strategy: "Analyze 30-day usage patterns"
steps:
- "Identify steady-state workloads"
- "Calculate break-even point (typically 6-12 months)"
- "Purchase 1-year or 3-year reservations"
- "Use convertible RIs for flexibility"
savings_plans:
compute_savings_plan:
commitment: "$100/hour for 1 year"
discount: "Up to 66% vs on-demand"
flexibility: "Any instance family, size, region"
instance_savings_plan:
commitment: "$50/hour for 3 years"
discount: "Up to 72% vs on-demand"
flexibility: "Specific instance family in region"
Container Resource Optimization
Right-size container requests and limits:
# Before optimization
apiVersion: v1
kind: Pod
spec:
containers:
- name: api
resources:
requests:
cpu: "1000m" # Overprovisioned
memory: "2Gi" # Overprovisioned
limits:
cpu: "2000m"
memory: "4Gi"
# After optimization (based on VPA recommendations)
apiVersion: v1
kind: Pod
spec:
containers:
- name: api
resources:
requests:
cpu: "250m" # Actual usage: 150-200m
memory: "512Mi" # Actual usage: 300-400Mi
limits:
cpu: "500m"
memory: "1Gi"
Use VPA to recommend optimal sizes:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: api
updateMode: "Recommend" # Or "Auto" for automatic updates
resourcePolicy:
containerPolicies:
- containerName: api
minAllowed:
cpu: 100m
memory: 256Mi
maxAllowed:
cpu: 2000m
memory: 4Gi
Storage Cost Optimization
Optimize persistent volume usage:
# Implement storage lifecycle
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-storage
spec:
storageClassName: gp3 # Use cost-effective gp3 instead of io2
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
---
# Automated backup retention
apiVersion: v1
kind: ConfigMap
metadata:
name: backup-policy
data:
retention.yaml: |
daily:
keep: 7
delete_after_days: 7
weekly:
keep: 4
delete_after_days: 28
monthly:
keep: 12
delete_after_days: 365
Delete unused volumes:
# Find unattached volumes
kubectl get pv | grep Released
# Automation script
#!/bin/bash
# Delete volumes released for > 7 days
kubectl get pv -o json | jq -r '
.items[] |
select(.status.phase == "Released") |
select((now - (.metadata.creationTimestamp | fromdateiso8601)) > 604800) |
.metadata.name
' | xargs -I {} kubectl delete pv {}
Network Cost Optimization
Reduce data transfer costs:
# Use private endpoints for cross-AZ traffic
apiVersion: v1
kind: Service
metadata:
name: database
annotations:
service.beta.kubernetes.io/aws-load-balancer-internal: "true"
spec:
type: LoadBalancer
selector:
app: database
---
# Implement caching to reduce external API calls
apiVersion: v1
kind: ConfigMap
metadata:
name: cache-config
data:
redis.conf: |
maxmemory 2gb
maxmemory-policy allkeys-lru
save "" # Disable persistence for cache
Colocate services to minimize cross-region traffic:
# Pod affinity to keep related services together
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
template:
spec:
affinity:
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- database
topologyKey: "topology.kubernetes.io/zone"
Monitoring and Alerting
Track costs with Prometheus and alert on anomalies:
# Cost monitoring rules
groups:
- name: cost-alerts
rules:
- alert: HighNodeCount
expr: count(kube_node_info) > 50
for: 10m
annotations:
summary: "Cluster has {{ $value }} nodes"
description: "Node count exceeds budget threshold"
- alert: UnusedResources
expr: |
sum(kube_pod_container_resource_requests{resource="cpu"})
/
sum(kube_node_status_allocatable{resource="cpu"})
< 0.50
for: 1h
annotations:
summary: "CPU utilization below 50%"
description: "Consider downsizing cluster"
- alert: CostAnomaly
expr: |
rate(cloud_cost_total[1h]) >
1.2 * avg_over_time(rate(cloud_cost_total[1h])[7d:1h])
for: 30m
annotations:
summary: "Cost increased by >20% vs 7-day average"
Kubernetes Resource Quotas
Prevent cost overruns with quotas:
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-quota
namespace: team-a
spec:
hard:
requests.cpu: "100"
requests.memory: "200Gi"
requests.storage: "1Ti"
persistentvolumeclaims: "50"
pods: "100"
services.loadbalancers: "5"
---
# Limit ranges for default constraints
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: team-a
spec:
limits:
- max:
cpu: "4"
memory: "8Gi"
min:
cpu: "100m"
memory: "128Mi"
default:
cpu: "500m"
memory: "512Mi"
defaultRequest:
cpu: "250m"
memory: "256Mi"
type: Container
Scheduled Scaling
Scale down non-production environments during off-hours:
# CronJob to scale down at night
apiVersion: batch/v1
kind: CronJob
metadata:
name: scale-down-dev
namespace: development
spec:
schedule: "0 19 * * 1-5" # 7 PM weekdays
jobTemplate:
spec:
template:
spec:
serviceAccountName: scaler
containers:
- name: kubectl
image: bitnami/kubectl
command:
- /bin/sh
- -c
- |
kubectl scale deployment --all --replicas=0 -n development
kubectl patch hpa --all -p '{"spec":{"minReplicas":0}}' -n development
restartPolicy: OnFailure
---
# CronJob to scale up in morning
apiVersion: batch/v1
kind: CronJob
metadata:
name: scale-up-dev
namespace: development
spec:
schedule: "0 8 * * 1-5" # 8 AM weekdays
jobTemplate:
spec:
template:
spec:
serviceAccountName: scaler
containers:
- name: kubectl
image: bitnami/kubectl
command:
- /bin/sh
- -c
- |
kubectl scale deployment api --replicas=3 -n development
kubectl scale deployment worker --replicas=2 -n development
kubectl patch hpa --all -p '{"spec":{"minReplicas":2}}' -n development
restartPolicy: OnFailure
Kubecost Integration
Use Kubecost for comprehensive cost visibility:
# Install Kubecost
apiVersion: v1
kind: Namespace
metadata:
name: kubecost
---
# Kubecost configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: kubecost-config
namespace: kubecost
data:
config.json: |
{
"cloudProviderConfig": {
"provider": "aws",
"pricing": {
"spotLabel": "lifecycle",
"spotValue": "spot"
}
},
"currencyCode": "USD",
"savingsRecommendations": true
}
Query Kubecost API for cost data:
# Get cost by namespace
curl "http://kubecost:9090/model/allocation?window=7d&aggregate=namespace"
# Get cost by deployment
curl "http://kubecost:9090/model/allocation?window=7d&aggregate=deployment"
# Get savings recommendations
curl "http://kubecost:9090/model/savings"
Cost Allocation and Chargeback
Implement showback/chargeback:
# Label resources for cost attribution
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
labels:
team: platform
project: customer-portal
environment: production
cost-center: eng-platform
spec:
template:
metadata:
labels:
team: platform
project: customer-portal
environment: production
cost-center: eng-platform
Generate cost reports:
# Cost by team
sum(
kube_pod_container_resource_requests{resource="cpu"} * on(node) group_left() node_cost_per_cpu_hour
+
kube_pod_container_resource_requests{resource="memory"} * on(node) group_left() node_cost_per_memory_gb_hour
) by (label_team)
# Cost by project
sum(
kube_pod_container_resource_requests{resource="cpu"} * on(node) group_left() node_cost_per_cpu_hour
+
kube_pod_container_resource_requests{resource="memory"} * on(node) group_left() node_cost_per_memory_gb_hour
) by (label_project)
FinOps Best Practices
Establish FinOps culture:
- Visibility: Make costs visible to all engineers
- Accountability: Team ownership of costs
- Optimization: Continuous cost improvement
- Forecasting: Predict future costs based on trends
- Governance: Policies to prevent waste
# FinOps team charter
responsibilities:
- "Cost visibility dashboards"
- "Monthly cost reviews with teams"
- "Optimization recommendations"
- "Budget tracking and forecasting"
- "Policy enforcement (quotas, limits)"
- "Training teams on cost-aware development"
metrics:
- "Cost per transaction"
- "Cost per user"
- "Infrastructure efficiency ratio"
- "Waste percentage (unused resources)"
- "Savings from optimization initiatives"
Conclusion
Effective Kubernetes cost optimization requires:
- Visibility through comprehensive monitoring and reporting
- Right-sizing based on actual usage patterns
- Autoscaling to match capacity with demand
- Spot instances for fault-tolerant workloads
- Resource quotas to prevent overprovisioning
- Storage optimization and lifecycle management
- Network efficiency to reduce data transfer costs
- Scheduled scaling for non-production environments
- Cost allocation for accountability
- FinOps culture with continuous improvement
Start by establishing cost visibility, then systematically address the largest cost drivers. Automate optimization where possible, and create feedback loops so teams see the cost impact of their decisions. Cost optimization is not a one-time project but an ongoing practice that requires tooling, processes, and cultural change.
The organizations that excel at cost optimization treat it as an engineering problem, invest in automation and observability, and align incentives so teams are motivated to optimize costs while maintaining reliability and performance.