As 2023 draws to a close, it’s been a transformative year for infrastructure and platform engineering. From the explosion of AI workloads to the maturation of technologies like eBPF and Rust, the landscape has shifted dramatically. This post reflects on the major trends, challenges, and learnings from the year.

The AI Infrastructure Boom

2023 will be remembered as the year AI moved from research to production at scale. The infrastructure requirements for AI workloads have driven innovation across the stack.

Key Developments

Vector Databases Went Mainstream

  • Moved from niche to critical infrastructure
  • Pinecone, Weaviate, Qdrant saw massive adoption
  • HNSW and IVF algorithms became table stakes
  • Hybrid search (vector + keyword) emerged as best practice

GPU Shortages and Optimization

  • NVIDIA H100 GPUs became harder to acquire than unicorns
  • Led to creative optimization strategies
  • Quantization, distillation, and model compression became essential
  • Alternative accelerators (Cerebras, Graphcore) gained traction

Embedding Models Evolved

  • From BERT to specialized models (E5, instructor)
  • Multilingual and multimodal embeddings matured
  • Dimension reduction without accuracy loss improved

What We Learned

The infrastructure for AI is fundamentally different from traditional web workloads:

  1. Stateful by Nature: Vector databases require careful sharding and replication
  2. Memory-Bound: High-dimensional vectors stress memory bandwidth
  3. Cost-Intensive: GPU and storage costs dwarf traditional compute
  4. Latency-Sensitive: Real-time inference requires sub-100ms responses

My biggest learning: Don’t underestimate the operational complexity of AI infrastructure. A vector database isn’t just Postgres with vectors—it requires specialized operational expertise, monitoring, and optimization.

Rust’s Continued Rise

Rust adoption accelerated in 2023, particularly for infrastructure and systems programming.

Where Rust Shined

High-Performance Data Paths

  • Packet processing at line rate
  • Zero-copy parsers
  • Lock-free data structures
  • SIMD optimization

Safety in Critical Systems

  • Memory safety without garbage collection
  • Fearless concurrency
  • Type-safe configurations
  • Compile-time guarantees

Developer Productivity

  • Cargo ecosystem matured significantly
  • Async runtime stabilization (Tokio 1.x)
  • Better IDE support and tooling
  • Growing talent pool

Challenges Remain

Not everything is perfect in Rust-land:

  • Compile times can still be painful for large projects
  • Learning curve remains steep for teams
  • Async ecosystem still has rough edges
  • Binary sizes can be larger than equivalent C/C++

My take: Rust is the right choice for new systems-level infrastructure, but requires team investment in training and tooling. The safety and performance benefits are worth it.

eBPF Goes Production

eBPF moved from “interesting technology” to “production-critical” in 2023.

Major Use Cases

Observability Without Instrumentation

  • Continuous profiling (Parca, Pyroscope)
  • Network flow monitoring
  • Security event collection
  • Performance analysis

Network Performance

  • XDP for DDoS mitigation
  • Service mesh data planes (Cilium)
  • Load balancing at line rate
  • Connection tracking

Security Monitoring

  • Runtime security (Falco, Tetragon)
  • File integrity monitoring
  • Privilege escalation detection
  • Malware detection

Lessons Learned

eBPF is powerful but has limitations:

  • Kernel version dependencies still cause issues
  • Verifier rejections can be frustrating
  • Debugging is hard compared to user-space
  • Map size limits require careful planning

Best practice: Start with well-tested eBPF frameworks (Cilium, Falco) rather than writing raw eBPF. Graduate to custom programs only when needed.

Multi-Cloud Reality Check

The promise of multi-cloud was portability and resilience. The reality is more nuanced.

What Actually Worked

Strategic Multi-Cloud

  • Different clouds for different workloads
  • Leverage cloud-specific strengths
  • Avoid single-vendor lock-in for critical services

Edge + Cloud Hybrid

  • Edge for low latency
  • Cloud for capacity and analytics
  • Clear data flow and ownership

What Didn’t Work

Cloud-Agnostic Abstractions

  • Lowest common denominator
  • Miss cloud-specific optimizations
  • Complexity without clear benefits

Active-Active Across Clouds

  • Cross-cloud latency still significant
  • Data consistency challenges
  • Cost of data egress prohibitive

My conclusion: Multi-cloud by default is wrong. Multi-cloud for strategic reasons (resilience, M&A, regulations) makes sense. Multi-cloud for portability is often premature optimization.

Platform Engineering Maturity

2023 saw platform engineering emerge as a recognized discipline with clear patterns.

What Worked

Developer Self-Service

  • Reduce time from idea to production
  • Service catalogs and templates
  • API-driven everything
  • Clear golden paths

Policy as Code

  • Enforce security and compliance
  • Cost management automation
  • Centralized guardrails
  • Shift-left security

Cost Visibility

  • Per-team/per-service cost allocation
  • Automated optimization recommendations
  • Showback and chargeback
  • Resource utilization tracking

What Needs Improvement

Developer Experience

  • Too many tools, not enough integration
  • Cognitive load still high
  • Documentation often outdated
  • Onboarding remains painful

Platform Operations

  • Platform teams often understaffed
  • Treating platform as product needs maturity
  • Measuring platform ROI is challenging
  • Balancing standardization vs. flexibility

The key insight: Platform engineering is product development, not just infrastructure. Treat developers as customers, measure satisfaction, iterate based on feedback.

Performance work in 2023 focused on efficiency in an economic downturn.

Major Themes

Right-Sizing Everything

  • Automated resource optimization
  • Spot instances and reserved capacity
  • Graviton adoption (ARM in cloud)
  • Function-level cost attribution

Latency Reduction

  • Zero-copy wherever possible
  • Lock-free data structures
  • CPU cache optimization
  • SIMD for data processing

Observability Overhead Reduction

  • Sampling strategies
  • Efficient metrics aggregation
  • eBPF for low-overhead collection
  • Tail-based tracing

The common thread: Every microsecond and every dollar matters. Performance optimization isn’t premature—it’s good engineering in a resource-constrained world.

Security Evolution

Security in 2023 was about shifting left and reducing attack surface.

Zero Trust Everywhere

  • Mutual TLS by default
  • Identity-based access control
  • Continuous verification
  • Hardware attestation

Supply Chain Security

  • SBOM (Software Bill of Materials) adoption
  • Container image signing (Sigstore)
  • Dependency vulnerability scanning
  • Provenance verification

Runtime Security

  • eBPF for threat detection
  • File integrity monitoring
  • Anomaly detection with ML
  • Automated incident response

The mindset shift: Security is not a gate, it’s a continuous process. Automate, detect, respond—not prevent.

Technology Predictions for 2024

Looking ahead to 2024, here’s what I’m watching:

LLMs in Production

  • RAG (Retrieval-Augmented Generation) will become standard
  • Vector database importance will grow
  • Prompt engineering will professionalize
  • Fine-tuning will become more accessible

WebAssembly at the Edge

  • WASM for edge functions
  • Portable, secure, fast
  • Better language support
  • Standard component model

Platform Engineering Evolution

  • AI-assisted platform operations
  • Automated optimization
  • Predictive scaling
  • Natural language interfaces

Rust Ecosystem Growth

  • More critical infrastructure in Rust
  • Better async ecosystem
  • Improved compile times
  • Enterprise adoption increases

Observability 2.0

  • OpenTelemetry becomes standard
  • eBPF-based observability mainstream
  • AI-powered anomaly detection
  • Automated root cause analysis

Key Lessons from 2023

Technical Lessons

  1. Complexity is the enemy - Simple systems win over clever ones
  2. Measure everything - You can’t optimize what you don’t measure
  3. Design for failure - Resilience > perfection
  4. Operational excellence matters - Great tech with poor ops fails
  5. Developer experience is paramount - Make the right thing easy

Organizational Lessons

  1. Platform teams are strategic - Investment in platforms pays dividends
  2. Security can’t be bolted on - Build it in from the start
  3. Cost optimization is continuous - Not a one-time project
  4. Documentation is infrastructure - Treat it as first-class
  5. Feedback loops are critical - Short cycles, rapid iteration

Personal Lessons

  1. Depth beats breadth - Master the fundamentals
  2. Writing clarifies thinking - Document to understand
  3. Community accelerates learning - Engage, share, contribute
  4. Balance is essential - Sustainable pace wins marathons
  5. Stay curious - Technology never stops evolving

Looking Forward

2023 was a year of rapid change and evolution. AI moved from experimental to production-critical. Rust continued its ascent. eBPF became mainstream. Platform engineering matured. Security shifted left.

2024 promises to be even more interesting. LLMs will reshape how we interact with systems. WebAssembly will bring new deployment models. Platform engineering will incorporate more AI. Observability will become more automated.

The constant in all this change: Good engineering fundamentals never go out of style. Focus on reliability, performance, security, and developer experience. Master the basics. Build on solid foundations.

Here’s to an exciting 2024 ahead!

If you want to go deeper on topics from this year:

AI Infrastructure

  • “Building LLM Applications for Production” by Chip Huyen
  • “Designing Data-Intensive Applications” (still relevant!)

Rust

  • “Rust for Rustaceans” by Jon Gjengset
  • “Zero to Production in Rust” by Luca Palmieri

eBPF

  • “Learning eBPF” by Liz Rice
  • “BPF Performance Tools” by Brendan Gregg

Platform Engineering

  • “Team Topologies” by Matthew Skelton and Manuel Pais
  • “Platform Engineering on Kubernetes” (Komodor guide)

Distributed Systems

  • “Understanding Distributed Systems” by Roberto Vitillo
  • “Database Internals” by Alex Petrov

Thanks for following along this year. See you in 2024!