2018 marked a turning point for cloud-native technologies. Kubernetes crossed into mainstream adoption, service mesh moved from experimental to production-ready, and serverless computing matured beyond simple functions. As someone deeply involved in building and operating cloud-native systems this year, I want to reflect on what changed, what we learned, and where the ecosystem is headed.

Kubernetes Becomes Boring (In a Good Way)

The most significant shift in 2018 was Kubernetes becoming the default choice for container orchestration. The “orchestration wars” are over—Kubernetes won. But more importantly, Kubernetes became boring infrastructure. That’s a compliment.

What Changed:

  • Major cloud providers launched managed Kubernetes (GKE matured, EKS launched, AKS became generally available)
  • Enterprise adoption accelerated dramatically
  • Focus shifted from “should we use Kubernetes?” to “how do we run it well?”
  • The ecosystem consolidated around Kubernetes as the platform

What We Learned: Running Kubernetes well is harder than installing it. The real challenges are:

  • Multi-tenancy and resource isolation
  • Network policy and security
  • Monitoring and observability at scale
  • Upgrade and lifecycle management
  • Cost optimization

Key Takeaway: Kubernetes is infrastructure now. The value is in what you build on top of it, not in running it itself. If you’re not in the platform business, use managed Kubernetes.

Service Mesh Goes Production

2018 was the year service mesh graduated from interesting technology to production deployment. Istio 1.0 launched in July, Linkerd 2.0 was released focusing on simplicity, and AWS announced App Mesh.

What We Deployed:

  • Mutual TLS between all services without application changes
  • Traffic shifting for canary deployments
  • Distributed tracing across services
  • Fine-grained authorization policies

What Surprised Us: The complexity was higher than expected. Debugging issues when the mesh is involved adds another layer. The operational overhead of running Istio in production is significant—you need dedicated expertise.

What Worked:

  • Starting with observability only (no policy enforcement)
  • Gradual rollout service by service
  • Comprehensive testing in staging
  • Strong Prometheus and Grafana setup before adding mesh

What Didn’t:

  • Trying to mesh everything at once
  • Enabling mTLS strict mode without thorough testing
  • Underestimating resource overhead
  • Insufficient monitoring of the mesh itself

Key Takeaway: Service mesh solves real problems (observability, security, traffic management) but introduces complexity. Adopt incrementally and ensure your team has the expertise to operate it.

Serverless Matures

Serverless computing in 2018 moved beyond simple event-driven functions to supporting complex applications.

Major Developments:

  • AWS Lambda added layers and custom runtimes
  • Google Cloud Functions and Azure Functions reached GA
  • Knative launched, providing Kubernetes-native serverless
  • Serverless frameworks matured significantly

What We Built:

  • API backends entirely on Lambda
  • Event-driven data pipelines
  • Scheduled jobs and cron replacements
  • GraphQL servers on serverless

What We Learned: Serverless works well for:

  • Irregular or unpredictable workloads
  • Event-driven architectures
  • Rapid prototyping
  • Applications with long idle periods

Serverless struggles with:

  • Predictable, constant load (cheaper to run containers)
  • Very low latency requirements (cold starts matter)
  • Long-running processes
  • Complex state management

Key Takeaway: Serverless is excellent for the right use cases. It’s not a replacement for all computing—it’s another tool in the toolbox. Start with event-driven workloads and expand from there.

GitOps Gains Momentum

GitOps emerged as the preferred way to manage Kubernetes deployments. Flux and Argo CD both saw significant adoption.

Why It Clicked:

  • Git as single source of truth is intuitive for developers
  • Pull-based deployment is more secure than CI/CD pushing to clusters
  • Declarative configuration enables easy rollback
  • Audit trail comes for free

Implementation Patterns: We standardized on:

  • Separate repositories for application code and deployment manifests
  • Kustomize for environment-specific overlays
  • Automatic image updates via Flux
  • Sealed Secrets for GitOps-friendly secret management

Challenges:

  • Secret management (solved with Sealed Secrets and External Secrets)
  • Learning curve for teams used to imperative deployments
  • Initial complexity of setting up the pipeline

Key Takeaway: GitOps reduces cognitive load by using familiar tools (Git) for operations. The investment in setup pays off quickly in reduced deployment errors and faster incident recovery.

Observability Becomes Essential

The shift from monoliths to microservices made traditional monitoring inadequate. 2018 was the year observability—metrics, logs, and traces together—became standard practice.

The Stack:

  • Prometheus for metrics (with Thanos for long-term storage)
  • EFK or ELK for logs (with Fluentd becoming more common than Logstash)
  • Jaeger or Zipkin for distributed tracing
  • Grafana for visualization

What Changed:

  • Distributed tracing moved from optional to essential
  • SLO-based alerting replaced threshold-based alerts
  • Correlation between metrics, logs, and traces became expected
  • Observability shifted left—instrumented during development, not after deployment

Lessons Learned:

  • Start with the three pillars (metrics, logs, traces) from day one
  • Instrument code as you write it
  • Define SLOs early
  • Alert on SLO violations, not arbitrary thresholds
  • Correlation via trace IDs is invaluable for debugging

Key Takeaway: Observability isn’t optional for distributed systems. Build it in from the start, and invest in correlating signals across metrics, logs, and traces.

Security Shifts Left

2018 saw security moving earlier in the development lifecycle rather than being a gate before production.

Key Developments:

  • Image scanning in CI/CD pipelines became standard
  • Pod Security Policies gained adoption
  • Service mesh enabled zero-trust networking
  • Secret management tools (Vault, Sealed Secrets) matured

Practices That Worked:

  • Scanning images in CI/CD and failing builds on critical vulnerabilities
  • Running containers as non-root by default
  • Network policies denying all traffic by default
  • Using dedicated secrets management instead of environment variables
  • Runtime security monitoring with Falco

Common Pitfalls:

  • Treating security as a checkbox
  • Implementing policies without proper tooling
  • Blocking developers without providing alternatives
  • Ignoring secret rotation

Key Takeaway: Security is everyone’s responsibility. Provide developers with secure defaults and tools that make security easy, not something that slows them down.

The Rise of Platform Teams

Organizations realized that expecting every team to become Kubernetes experts doesn’t scale. Platform teams emerged to provide self-service infrastructure.

What Platform Teams Built:

  • Golden paths for common use cases
  • Self-service deployment pipelines
  • Standardized observability
  • Reusable Helm charts or Kustomize bases
  • Internal developer portals

Benefits:

  • Application teams move faster
  • Consistency across services
  • Centralized expertise for complex infrastructure
  • Reduced duplication of effort

Anti-Patterns to Avoid:

  • Platform teams becoming gatekeepers
  • Building platforms nobody wants to use
  • Not involving users in platform design
  • Over-engineering solutions

Key Takeaway: Platform teams should enable developer productivity, not control deployments. Build platforms with users, not for them.

Cost Optimization Becomes Critical

As cloud-native adoption grew, so did cloud bills. 2018 was the year organizations got serious about cost optimization.

What We Implemented:

  • Resource requests and limits on all pods
  • Cluster autoscaling
  • Pod disruption budgets for safe node draining
  • Spot/preemptible instances for non-critical workloads
  • Namespace resource quotas

Tools That Helped:

  • Kubernetes metrics for right-sizing
  • Cloud provider cost analysis tools
  • Custom dashboards for cost per service
  • Automated recommendations for optimization

Biggest Savings:

  • Rightsizing pods (many were overprovisioned by 50%+)
  • Using spot instances for batch workloads
  • Autoscaling to match demand
  • Deleting unused resources

Key Takeaway: Cloud-native doesn’t mean cost-efficient by default. Implement resource limits, use autoscaling, and monitor costs as closely as you monitor performance.

The Challenges We’re Still Solving

Despite the progress, significant challenges remain:

Complexity: The cloud-native ecosystem is sprawling. The CNCF landscape has hundreds of projects. Choosing the right tools and integrating them is challenging.

Learning Curve: The barrier to entry is high. Understanding Kubernetes, service mesh, observability stacks, and security requires significant investment.

Multi-Cluster Management: Running multiple clusters across regions and clouds remains operationally complex.

Developer Experience: The developer experience often suffers. Local development with Kubernetes is painful. Debugging distributed systems is hard.

Standards: Too many competing standards for similar problems. Service mesh interfaces, cloud-native storage, monitoring solutions—fragmentation remains.

Predictions for 2019

Based on 2018’s trajectory, here’s what I expect:

Service Mesh Standardization: Service Mesh Interface (SMI) will gain traction. Organizations will want to avoid lock-in to specific mesh implementations.

Serverless on Kubernetes: Knative and similar projects will make Kubernetes a viable serverless platform, reducing the need for separate FaaS platforms.

Multi-Cluster Becomes Normal: Running applications across multiple clusters for resilience and regional compliance will become standard practice.

Platform Engineering Professionalization: Platform engineering will emerge as a distinct discipline with its own tools, practices, and career paths.

GitOps Everywhere: GitOps will expand beyond Kubernetes to manage other infrastructure through declarative configuration in Git.

Advice for 2019

Based on what we learned in 2018:

Start Simple: Don’t adopt every new technology. Start with Kubernetes, add observability, then carefully evaluate what else you need.

Invest in Platform: Build internal platforms that make the right thing the easy thing for developers.

Observability First: You can’t operate what you can’t observe. Instrument everything from day one.

Security as Code: Treat security policies as code. Automate enforcement and scanning.

Learn from Others: The cloud-native community is vibrant. Engage with others through conferences, meetups, and open source.

Focus on Fundamentals: Distributed systems principles matter more than specific tools. Understand consensus, consistency, availability tradeoffs.

Conclusion

2018 was transformative for cloud-native technologies. What was cutting-edge became mainstream. What was experimental became production-ready.

The ecosystem matured significantly:

  • Kubernetes became boring infrastructure
  • Service mesh proved its value
  • Serverless found its place
  • Observability became essential
  • Security shifted left
  • Platform teams emerged

But we’re still in the early days. The complexity is real, the learning curve is steep, and best practices are still emerging. The organizations that will succeed are those that:

  • Adopt incrementally rather than all at once
  • Invest in platform teams and developer experience
  • Focus on fundamentals over shiny new tools
  • Build expertise gradually
  • Learn from the community

2019 will bring new challenges and new solutions. The pace of change won’t slow down. But the foundation we built in 2018—Kubernetes, service mesh, observability, GitOps—gives us a solid platform to build on.

Here’s to another year of building resilient, scalable, cloud-native systems.