As 2023 draws to a close, it’s been a transformative year for infrastructure and platform engineering. From the explosion of AI workloads to the maturation of technologies like eBPF and Rust, the landscape has shifted dramatically. This post reflects on the major trends, challenges, and learnings from the year.
The AI Infrastructure Boom
2023 will be remembered as the year AI moved from research to production at scale. The infrastructure requirements for AI workloads have driven innovation across the stack.
Key Developments
Vector Databases Went Mainstream
- Moved from niche to critical infrastructure
- Pinecone, Weaviate, Qdrant saw massive adoption
- HNSW and IVF algorithms became table stakes
- Hybrid search (vector + keyword) emerged as best practice
GPU Shortages and Optimization
- NVIDIA H100 GPUs became harder to acquire than unicorns
- Led to creative optimization strategies
- Quantization, distillation, and model compression became essential
- Alternative accelerators (Cerebras, Graphcore) gained traction
Embedding Models Evolved
- From BERT to specialized models (E5, instructor)
- Multilingual and multimodal embeddings matured
- Dimension reduction without accuracy loss improved
What We Learned
The infrastructure for AI is fundamentally different from traditional web workloads:
- Stateful by Nature: Vector databases require careful sharding and replication
- Memory-Bound: High-dimensional vectors stress memory bandwidth
- Cost-Intensive: GPU and storage costs dwarf traditional compute
- Latency-Sensitive: Real-time inference requires sub-100ms responses
My biggest learning: Don’t underestimate the operational complexity of AI infrastructure. A vector database isn’t just Postgres with vectors—it requires specialized operational expertise, monitoring, and optimization.
Rust’s Continued Rise
Rust adoption accelerated in 2023, particularly for infrastructure and systems programming.
Where Rust Shined
High-Performance Data Paths
- Packet processing at line rate
- Zero-copy parsers
- Lock-free data structures
- SIMD optimization
Safety in Critical Systems
- Memory safety without garbage collection
- Fearless concurrency
- Type-safe configurations
- Compile-time guarantees
Developer Productivity
- Cargo ecosystem matured significantly
- Async runtime stabilization (Tokio 1.x)
- Better IDE support and tooling
- Growing talent pool
Challenges Remain
Not everything is perfect in Rust-land:
- Compile times can still be painful for large projects
- Learning curve remains steep for teams
- Async ecosystem still has rough edges
- Binary sizes can be larger than equivalent C/C++
My take: Rust is the right choice for new systems-level infrastructure, but requires team investment in training and tooling. The safety and performance benefits are worth it.
eBPF Goes Production
eBPF moved from “interesting technology” to “production-critical” in 2023.
Major Use Cases
Observability Without Instrumentation
- Continuous profiling (Parca, Pyroscope)
- Network flow monitoring
- Security event collection
- Performance analysis
Network Performance
- XDP for DDoS mitigation
- Service mesh data planes (Cilium)
- Load balancing at line rate
- Connection tracking
Security Monitoring
- Runtime security (Falco, Tetragon)
- File integrity monitoring
- Privilege escalation detection
- Malware detection
Lessons Learned
eBPF is powerful but has limitations:
- Kernel version dependencies still cause issues
- Verifier rejections can be frustrating
- Debugging is hard compared to user-space
- Map size limits require careful planning
Best practice: Start with well-tested eBPF frameworks (Cilium, Falco) rather than writing raw eBPF. Graduate to custom programs only when needed.
Multi-Cloud Reality Check
The promise of multi-cloud was portability and resilience. The reality is more nuanced.
What Actually Worked
Strategic Multi-Cloud
- Different clouds for different workloads
- Leverage cloud-specific strengths
- Avoid single-vendor lock-in for critical services
Edge + Cloud Hybrid
- Edge for low latency
- Cloud for capacity and analytics
- Clear data flow and ownership
What Didn’t Work
Cloud-Agnostic Abstractions
- Lowest common denominator
- Miss cloud-specific optimizations
- Complexity without clear benefits
Active-Active Across Clouds
- Cross-cloud latency still significant
- Data consistency challenges
- Cost of data egress prohibitive
My conclusion: Multi-cloud by default is wrong. Multi-cloud for strategic reasons (resilience, M&A, regulations) makes sense. Multi-cloud for portability is often premature optimization.
Platform Engineering Maturity
2023 saw platform engineering emerge as a recognized discipline with clear patterns.
What Worked
Developer Self-Service
- Reduce time from idea to production
- Service catalogs and templates
- API-driven everything
- Clear golden paths
Policy as Code
- Enforce security and compliance
- Cost management automation
- Centralized guardrails
- Shift-left security
Cost Visibility
- Per-team/per-service cost allocation
- Automated optimization recommendations
- Showback and chargeback
- Resource utilization tracking
What Needs Improvement
Developer Experience
- Too many tools, not enough integration
- Cognitive load still high
- Documentation often outdated
- Onboarding remains painful
Platform Operations
- Platform teams often understaffed
- Treating platform as product needs maturity
- Measuring platform ROI is challenging
- Balancing standardization vs. flexibility
The key insight: Platform engineering is product development, not just infrastructure. Treat developers as customers, measure satisfaction, iterate based on feedback.
Performance Optimization Trends
Performance work in 2023 focused on efficiency in an economic downturn.
Major Themes
Right-Sizing Everything
- Automated resource optimization
- Spot instances and reserved capacity
- Graviton adoption (ARM in cloud)
- Function-level cost attribution
Latency Reduction
- Zero-copy wherever possible
- Lock-free data structures
- CPU cache optimization
- SIMD for data processing
Observability Overhead Reduction
- Sampling strategies
- Efficient metrics aggregation
- eBPF for low-overhead collection
- Tail-based tracing
The common thread: Every microsecond and every dollar matters. Performance optimization isn’t premature—it’s good engineering in a resource-constrained world.
Security Evolution
Security in 2023 was about shifting left and reducing attack surface.
Zero Trust Everywhere
- Mutual TLS by default
- Identity-based access control
- Continuous verification
- Hardware attestation
Supply Chain Security
- SBOM (Software Bill of Materials) adoption
- Container image signing (Sigstore)
- Dependency vulnerability scanning
- Provenance verification
Runtime Security
- eBPF for threat detection
- File integrity monitoring
- Anomaly detection with ML
- Automated incident response
The mindset shift: Security is not a gate, it’s a continuous process. Automate, detect, respond—not prevent.
Technology Predictions for 2024
Looking ahead to 2024, here’s what I’m watching:
LLMs in Production
- RAG (Retrieval-Augmented Generation) will become standard
- Vector database importance will grow
- Prompt engineering will professionalize
- Fine-tuning will become more accessible
WebAssembly at the Edge
- WASM for edge functions
- Portable, secure, fast
- Better language support
- Standard component model
Platform Engineering Evolution
- AI-assisted platform operations
- Automated optimization
- Predictive scaling
- Natural language interfaces
Rust Ecosystem Growth
- More critical infrastructure in Rust
- Better async ecosystem
- Improved compile times
- Enterprise adoption increases
Observability 2.0
- OpenTelemetry becomes standard
- eBPF-based observability mainstream
- AI-powered anomaly detection
- Automated root cause analysis
Key Lessons from 2023
Technical Lessons
- Complexity is the enemy - Simple systems win over clever ones
- Measure everything - You can’t optimize what you don’t measure
- Design for failure - Resilience > perfection
- Operational excellence matters - Great tech with poor ops fails
- Developer experience is paramount - Make the right thing easy
Organizational Lessons
- Platform teams are strategic - Investment in platforms pays dividends
- Security can’t be bolted on - Build it in from the start
- Cost optimization is continuous - Not a one-time project
- Documentation is infrastructure - Treat it as first-class
- Feedback loops are critical - Short cycles, rapid iteration
Personal Lessons
- Depth beats breadth - Master the fundamentals
- Writing clarifies thinking - Document to understand
- Community accelerates learning - Engage, share, contribute
- Balance is essential - Sustainable pace wins marathons
- Stay curious - Technology never stops evolving
Looking Forward
2023 was a year of rapid change and evolution. AI moved from experimental to production-critical. Rust continued its ascent. eBPF became mainstream. Platform engineering matured. Security shifted left.
2024 promises to be even more interesting. LLMs will reshape how we interact with systems. WebAssembly will bring new deployment models. Platform engineering will incorporate more AI. Observability will become more automated.
The constant in all this change: Good engineering fundamentals never go out of style. Focus on reliability, performance, security, and developer experience. Master the basics. Build on solid foundations.
Here’s to an exciting 2024 ahead!
Recommended Reading
If you want to go deeper on topics from this year:
AI Infrastructure
- “Building LLM Applications for Production” by Chip Huyen
- “Designing Data-Intensive Applications” (still relevant!)
Rust
- “Rust for Rustaceans” by Jon Gjengset
- “Zero to Production in Rust” by Luca Palmieri
eBPF
- “Learning eBPF” by Liz Rice
- “BPF Performance Tools” by Brendan Gregg
Platform Engineering
- “Team Topologies” by Matthew Skelton and Manuel Pais
- “Platform Engineering on Kubernetes” (Komodor guide)
Distributed Systems
- “Understanding Distributed Systems” by Roberto Vitillo
- “Database Internals” by Alex Petrov
Thanks for following along this year. See you in 2024!