Distributed AI Training: Scaling Model Development
January 21, 2026
Practical patterns for distributed training of large models, from data parallelism to pipeline parallelism and efficient collective communication.
January 21, 2026
Practical patterns for distributed training of large models, from data parallelism to pipeline parallelism and efficient collective communication.
January 19, 2026
Achieving sub-millisecond AI inference latency through model optimization, batching strategies, and hardware acceleration techniques.
January 15, 2026
Strategies for deploying AI models to edge devices, from mobile phones to IoT sensors, with WebAssembly and optimized runtimes.
January 13, 2026
Exploring the mature Rust ecosystem in 2026, from web services to distributed systems, with practical patterns for production deployments.
January 9, 2026
Strategies for deploying reasoning-focused AI models at scale, balancing compute costs, latency requirements, and quality objectives.
July 14, 2024
Deep dive into the architectural decisions and trade-offs that enabled reducing system latency by 5x in a production security platform.
June 23, 2024
Architectural patterns for deploying WebAssembly at the edge, balancing security isolation, cold start performance, and operational complexity.
May 19, 2024
Building machine learning systems for security analytics that can detect threats in real-time across massive data streams
January 14, 2024
Practical guide to deploying and operating Large Language Models in production environments, including infrastructure, optimization, and reliability patterns
October 8, 2023
Architectural patterns for designing robust control planes that manage distributed infrastructure at scale
September 11, 2023
Deep dive into optimizing data path performance for high-throughput, low-latency systems with practical techniques and measurements
July 19, 2023
Designing and operating highly available systems across multiple cloud providers with practical patterns and real-world trade-offs
June 14, 2023
Deploying eBPF programs for production observability, security monitoring, and network optimization at scale
May 20, 2023
A practical exploration of adopting Rust for high-performance systems programming, including real-world migration patterns and lessons learned
April 22, 2023
A comprehensive guide to vector databases, from fundamentals to production deployment for AI-powered applications
March 18, 2023
Deep dive into designing and implementing bot detection systems using behavioral analysis, fingerprinting, and machine learning
February 12, 2023
Practical insights on deploying ML models for real-time threat detection, including feature engineering, model selection, and performance optimization
December 28, 2022
A year-end reflection on architectural lessons learned from operating large-scale distributed systems, managing 60+ microservices, and optimizing systems processing hundreds of millions of events.
November 18, 2022
Architectural patterns and design decisions for building systems that process hundreds of millions of events daily, covering scalability, reliability, and performance optimization.
July 14, 2022
Architectural approaches to implementing distributed tracing at scale, covering design decisions, trade-offs, and patterns for observability in microservices architectures.
April 14, 2022
A detailed walkthrough of systematic performance optimization that achieved 8x latency improvement through measurement, analysis, and targeted fixes.
January 20, 2022
Advanced patterns and best practices for building reliable, high-throughput event streaming platforms based on real-world experience at massive scale.
September 16, 2021
Architectural patterns and implementation strategies for deploying applications across multiple regions while maintaining consistency, performance, and availability.
August 19, 2021
Exploring eBPF technology for deep system observability, performance monitoring, and network analysis without kernel modifications or application changes.
July 14, 2021
Exploring edge computing architectures, CDN integration, and strategies for distributing computation to reduce latency and improve user experience.
April 18, 2021
A detailed walkthrough of performance optimization techniques that achieved an 8x latency reduction in a high-scale distributed system.
April 14, 2016
Why we chose Go for performance-critical key management services and lessons learned from rewriting Java services in Go
October 18, 2015
Why Go has become my language of choice for building cloud-native security services, with practical examples of concurrency patterns and performance characteristics.
August 14, 2014
Practical code optimization techniques that delivered real performance improvements in production systems
July 18, 2014
Leveraging Spark for analyzing massive volumes of flow data and gaining insights into storage network behavior
May 20, 2014
Early exploration of the upcoming MDS 9700 platform and architectural changes needed to leverage its capabilities
February 12, 2014
Practical guide to implementing and using lock-free data structures in FC-Redirect, including ring buffers, queues, and hash tables
January 16, 2014
Deep dive into platform-specific optimizations for FC-Redirect on the N7000, leveraging its unique architecture for 30% better performance
December 20, 2013
Reflecting on a year of scaling FC-Redirect from 1K to 12K flows, achieving 20% performance improvements, and lessons learned along the way
October 8, 2013
A systematic approach to performance optimization based on lessons from scaling FC-Redirect, including tools, techniques, and mental models
June 20, 2013
How implementing asynchronous processing patterns improved FC-Redirect throughput by 40% while maintaining correctness guarantees
May 15, 2013
Deep dive into the challenges and solutions for migrating FC-Redirect to the MDS 9250i platform while maintaining backward compatibility
April 22, 2013
A war story about debugging an intermittent flow corruption issue that only appeared in production under specific load patterns
March 18, 2013
How choosing the right data structures improved FC-Redirect performance by 10x and reduced memory footprint
February 20, 2013
How implementing intelligent message batching reduced network overhead by 80% and improved FC-Redirect performance
October 16, 2012
Systematic approaches to storage capacity planning that prevent both over-provisioning waste and under-provisioning crises
July 25, 2012
Understanding different LUN provisioning approaches and their impact on capacity management and performance
June 19, 2012
Understanding NoSQL database storage architectures and how they differ from traditional relational databases
May 22, 2012
Understanding the unique storage requirements of Hadoop and how they differ from traditional enterprise storage
March 19, 2012
Practical best practices for designing and optimizing storage infrastructure for VMware environments
February 14, 2012
Deep dive into flash storage technology, architecture, and how SSDs are changing storage design patterns
December 20, 2011
Methodologies and techniques for diagnosing and resolving complex storage network issues
September 27, 2011
Deep dive into network protocol optimization techniques for maximizing storage network performance
August 23, 2011
Exploring the principles behind distributed storage systems like GFS and their influence on modern storage architecture
May 17, 2011
Advanced techniques for optimizing SAN performance and troubleshooting common bottlenecks in storage networks
March 18, 2011
Why iSCSI has become the practical choice for mid-market storage networking and when it makes sense over Fibre Channel