As 2013 draws to a close, I’m reflecting on an incredible year of technical challenges and growth. FC-Redirect has evolved dramatically, and I’ve learned more about distributed systems, performance optimization, and production debugging than I thought possible. Here’s my year in review.
The Big Numbers
Let me start with the quantitative achievements:
Scale:
- Flow capacity: 1,000 → 12,000 (12x increase)
- Deployment size: Largest customer now runs 25-node clusters
- Traffic: Handling 2.9M packets/sec (up from 2.1M)
Performance:
- Overall throughput: +40% improvement
- Latency: P99 reduced from 4.2μs to 2.0μs
- CPU efficiency: -35% utilization at same load
- Memory footprint: Only +15% despite 12x flow capacity
Reliability:
- Uptime: 99.999% across all deployments
- Zero data loss events
- Mean time to recovery: 3 minutes
- Zero unplanned outages
These numbers tell a story of successful scaling, but the real story is in how we achieved them.
Key Technical Achievements
1. Data Structure Revolution
The single most impactful change was replacing our O(n) flow lookup with an O(1) hash table implementation. This wasn’t just a performance optimization; it fundamentally changed what scales were achievable.
The hash table work taught me that algorithmic complexity isn’t academic theory. In production systems at scale, O(n) vs O(1) is the difference between success and failure. I’ll never take data structure choice lightly again.
2. Asynchronous Architecture
Decoupling fast-path operations from slow-path work through asynchronous processing was transformative. By batching and deferring non-critical operations, we:
- Reduced network traffic by 80%
- Improved fast-path latency by 30%
- Enabled smooth handling of load spikes
This architectural pattern is now fundamental to how I think about system design. Whenever I see mixed-latency operations, I immediately consider async decoupling.
3. Platform Migration to MDS 9250i
Migrating FC-Redirect to the new MDS 9250i platform forced me to rethink many assumptions. Moving from ASIC-based processing to x86 required:
- SIMD optimizations using AVX2
- Cache-conscious data layout
- Multi-core parallelism with flow affinity
- Power management and dynamic frequency scaling
This migration taught me that porting code isn’t just about making it compile. It’s about rearchitecting for the target platform’s strengths. The x86 version is now actually faster than the ASIC version for many workloads.
4. High Availability at Scale
Achieving 99.999% uptime required more than redundancy. It required:
- Quorum-based replication
- Fast failure detection (sub-second)
- Automated recovery procedures
- Graceful degradation under overload
- Rolling upgrades with zero downtime
The HA work taught me that reliability is a system property, not a component property. Every layer must be designed for failure.
Lessons from Production
Some of my best learning came from debugging production issues:
The Race Condition That Wasn’t Atomic
Debugging the intermittent corruption issue taught me that atomicity doesn’t compose. Just because individual operations are atomic doesn’t mean sequences are. Read-modify-write sequences require atomic RMW operations, not separate atomic reads and writes.
This was humbling because we’d been careful about atomics, but we’d made a subtle error. It reminded me that concurrent programming is genuinely hard, and you can’t be too careful.
The Hash Function That Failed at Scale
The performance degradation issue revealed that hash functions must be tested with real-world data patterns. Our hash function worked fine with random data but had terrible distribution for sequential WWPNs, which are common in real deployments.
This taught me that synthetic benchmarks aren’t sufficient. You must test with actual customer workloads and data patterns.
The Silent Failure That Caused Data Loss
The retry queue that silently dropped updates taught me to never fail silently. If you must fail, fail loudly: log it, alert on it, increment a metric. Better yet, implement backpressure to prevent failure.
Silent failures are insidious because they appear as mysterious downstream issues, not clear failures at the source.
Technologies and Trends
2013 was an exciting year in the broader technology landscape:
Docker’s Emergence
Docker’s release in March has huge implications for storage networking. The containerization trend will create demand for:
- Dynamic storage provisioning
- Storage mobility across hosts
- Performance isolation in multi-tenant environments
- API-driven infrastructure
I’ve started thinking about how FC-Redirect can support containerized workloads. The convergence of stateless containers and stateful storage is a challenge we’ll need to solve.
Software-Defined Everything
SDN principles are spreading beyond networking to storage, security, and infrastructure generally. The separation of control plane and data plane, centralized control, and programmability are powerful patterns.
I’ve been applying SDN thinking to FC-Redirect, treating it as an SDN application for storage networks. This has led to better APIs, more dynamic behavior, and easier automation.
The Cloud Continues Growing
AWS and cloud providers continue maturing. As more workloads move to the cloud, traditional storage networking must evolve. The challenge is bridging on-premise storage infrastructure with cloud resources.
Personal Growth
Beyond technical skills, I’ve grown in several ways:
Systems Thinking
I’ve become better at seeing systems as wholes, not just collections of components. Understanding emergent behavior, feedback loops, and system-level properties has made me more effective at designing and debugging complex systems.
Communication
I’ve gotten better at explaining technical concepts to non-technical stakeholders. Whether writing documentation, presenting to customers, or discussing with product managers, clear communication is as important as technical skills.
Debugging Discipline
I’ve developed a systematic debugging approach: gather data, form hypotheses, test them, iterate. No more random code changes hoping to fix issues. Methodical investigation is faster and more effective.
Performance Methodology
My performance optimization workflow (measure, understand, optimize, validate) has become second nature. Profile first, optimize the critical path, validate improvements. This discipline prevents wasted effort on unimportant optimizations.
What Didn’t Go Well
Not everything was smooth:
Over-Engineering
Early in the year, I spent two weeks building a complex adaptive load balancer that we ultimately didn’t need. Simple round-robin would have sufficed. I learned to start simple and add complexity only when needed.
Insufficient Testing
Several bugs made it to production that better testing would have caught. I’ve since improved our test coverage and added stress tests for concurrency issues.
Documentation Debt
I prioritized code over documentation, leading to knowledge silos and onboarding difficulties. Going forward, documentation gets written alongside code, not afterward.
Looking Ahead to 2014
Several exciting projects are on the horizon:
Continued Scaling
We have customers who want to scale beyond 12K flows. I’m exploring approaches to reach 50K or even 100K flows:
- Hierarchical flow tables
- Flow aggregation and summarization
- More aggressive caching
- Distributed flow processing
Cloud Integration
Building bridges between on-premise FC infrastructure and cloud storage:
- Hybrid storage architectures
- Cloud-based replication
- Bursting to cloud for peak loads
Container Support
Making storage networking work seamlessly with Docker and containers:
- Dynamic volume provisioning
- Container-aware QoS
- Storage mobility for container migration
Platform Expansion
Bringing FC-Redirect to more platforms:
- N7000 optimization
- MDS 9700 support (when it ships)
- Virtual appliance version
Gratitude
This year’s achievements weren’t solo efforts. Thanks to:
- My team at Cisco for collaboration and support
- Customers who pushed us to scale and reported issues
- The broader systems engineering community for sharing knowledge
- My family for supporting late nights debugging production issues
Reflections
Looking back, 2013 was transformative professionally. I started the year knowing distributed systems theoretically. I’m ending it having built, scaled, debugged, and optimized a production distributed system serving mission-critical workloads.
The gap between theory and practice is vast. Textbooks teach algorithms and protocols, but they don’t teach:
- How to debug a Heisenbug that only appears at 3 AM in production
- How to optimize for real hardware with real performance characteristics
- How to balance engineering perfection with shipping deadlines
- How to design for failures you haven’t imagined yet
These lessons come only from building real systems.
The most important lesson: building reliable, high-performance distributed systems is hard. Really hard. But it’s also incredibly rewarding. Every challenge overcome, every bottleneck eliminated, every customer issue resolved makes the system better and makes me a better engineer.
Goals for 2014
As I look ahead:
- Scale FC-Redirect to 50K flows while maintaining performance
- Achieve 99.999% uptime across all deployments (again)
- Support containerized workloads with dynamic storage provisioning
- Expand to new platforms (N7000, MDS 9700)
- Build better monitoring and observability
- Reduce technical debt through refactoring and documentation
- Mentor junior engineers and share knowledge
- Continue learning about distributed systems, performance, and reliability
Conclusion
2013 was a year of tremendous growth, both for FC-Redirect and for me personally. We scaled 12x, improved performance by 40%, achieved five-nines uptime, and learned countless lessons along the way.
But we’re just getting started. The challenges ahead are even more exciting: larger scales, new platforms, containerization, cloud integration. Storage networking is evolving rapidly, and I’m privileged to be working on the technologies that will define its future.
Here’s to 2014 and the challenges it will bring. I can’t wait to see what we’ll build.
Thanks for reading, and happy holidays!