Performance tuning is one of the most challenging and rewarding aspects of storage networking. At Cisco, I spend a lot of time analyzing performance issues and optimizing FC-Redirect for various workloads. Let me share some techniques that go beyond the obvious.

Understanding Storage Performance

Before optimizing anything, you need to understand what “performance” means for storage. It’s not just about bandwidth. The key metrics are:

IOPS: Input/Output operations per second—how many I/O requests can be completed.

Throughput: The actual data transfer rate, measured in MB/s or GB/s.

Latency: The time from issuing an I/O request to receiving the response.

Queue Depth: How many outstanding I/O operations are in flight simultaneously.

Different workloads care about different metrics. Database OLTP workloads are IOPS-intensive with small block sizes. Streaming video is throughput-intensive with large sequential I/O. Understanding your workload is the first step in optimization.

The I/O Path

To optimize performance, you need to understand the complete I/O path from application to storage:

  1. Application issues I/O request
  2. OS file system layer
  3. Volume manager
  4. Device driver (HBA)
  5. HBA firmware
  6. FC fabric
  7. Storage array port
  8. Array cache
  9. Storage controller
  10. Disk subsystem

Each layer can become a bottleneck. Effective troubleshooting requires instrumentation at multiple layers to identify where time is being spent.

HBA Tuning

The Host Bus Adapter is often overlooked but can significantly impact performance. Key parameters to tune:

Queue Depth: The number of outstanding commands the HBA will issue. Modern HBAs support queue depths of 128 or more. For workloads with lots of parallelism, increasing queue depth can improve throughput significantly.

However, deeper queues increase latency for individual operations. For latency-sensitive workloads, shallower queues may be better. You need to match queue depth to your workload characteristics.

Interrupt Coalescing: HBAs can batch interrupts to reduce CPU overhead. This improves efficiency but adds latency. Tuning the coalescing parameters requires balancing throughput and latency.

Link Speed and Flow Control: Ensure the HBA is negotiating the correct link speed (8 Gbps for modern FC). Verify flow control is working correctly—imbalances can cause performance degradation.

Fabric Optimization

The FC fabric itself can become a bottleneck. Here are areas to examine:

Oversubscription: If you have many hosts sharing uplinks to storage, you can create oversubscription. A 24-port switch with 8 Gbps ports has 192 Gbps of aggregate bandwidth. If the uplinks are just two 8 Gbps ports (16 Gbps), you have 12:1 oversubscription. Under load, this creates congestion.

The solution is to add uplinks, use port channels, or implement intelligent traffic distribution.

Buffer Credits: FC uses a credit-based flow control mechanism. Each link has a number of buffer credits that limits how many frames can be in flight. For long-distance links, you need more credits to keep the pipe full.

Insufficient credits limit throughput, especially for large sequential I/O. Most switches allow tuning buffer credits per-port.

ISL Congestion: Inter-switch links (ISLs) can become congested, especially in larger fabrics. Monitor ISL utilization and add links as needed. Port channels can aggregate multiple links for more bandwidth.

BB_Credits: Buffer-to-buffer credits are fundamental to FC flow control. We allocate these carefully on MDS switches to ensure optimal performance across different link types and distances.

Storage Array Optimization

The storage array is often the ultimate bottleneck. Key areas to optimize:

Cache Sizing: Array cache dramatically improves performance by absorbing bursts and coalescing small writes. Ensure your array has adequate cache for your workload. For write-intensive workloads, more cache is better.

RAID Configuration: RAID level significantly impacts performance. RAID 5 has poor write performance due to the read-modify-write penalty. RAID 10 provides better performance but less usable capacity. RAID 6 provides better protection than RAID 5 but worse write performance.

Match RAID level to workload characteristics. OLTP databases often benefit from RAID 10, while sequential workloads may be fine with RAID 5 or RAID 6.

Disk Type and Spindle Count: More spindles mean more IOPS capability, even if you don’t need the capacity. A LUN striped across 20 disks will outperform one striped across 5 disks, assuming the array has sufficient backend bandwidth.

SSD is increasingly common for performance-critical workloads. A single SSD can deliver 10,000+ IOPS, equivalent to 100+ traditional disks.

Block Size Alignment: Misaligned block sizes cause extra I/O operations. Ensure file system block sizes align with array stripe sizes. For VMware, ensure guest OS partitions start on proper boundaries.

Multipathing Configuration

Multipathing provides both redundancy and load balancing. Common policies include:

Round-Robin: Distributes I/O across all paths. This provides the best throughput for many workloads.

Least Queue Depth: Routes I/O to the path with the fewest outstanding commands. This can help balance load dynamically.

Least Blocks: Routes based on the number of blocks transferred. Useful for mixed workloads.

The optimal policy depends on your workload and array capabilities. Some arrays perform better with certain policies. Always test.

Identifying Bottlenecks

When performance is poor, systematic troubleshooting is essential:

Start at the Application: Is the application itself the bottleneck? High CPU usage or inefficient code can make storage appear slow.

Check Queue Depths: Low queue depths indicate the application isn’t generating enough parallel I/O to saturate the storage.

Monitor HBA Utilization: Look at frame rates, bandwidth utilization, and error counters. High error rates indicate link quality issues.

Examine Fabric Statistics: Check for congestion, discards, or timeout errors. These indicate fabric-level problems.

Analyze Array Metrics: Look at cache hit rates, controller CPU, backend disk utilization. Low cache hit rates or saturated backend buses indicate array bottlenecks.

Common Performance Pitfalls

Here are issues I see frequently:

Misconfigured RAID: Using RAID 5 for write-intensive databases is a common mistake that cripples performance.

Insufficient Spindle Count: Buying a large array with too few disks. You have the capacity but not the IOPS.

Single Path: Not using multipathing means you’re using half your available bandwidth and have no redundancy.

Default Settings: Never tuning any parameters. Default settings are rarely optimal for production workloads.

Block Misalignment: Especially common with virtualization. Misaligned partitions can reduce performance by 30% or more.

Advanced Optimization Techniques

For extreme performance requirements, consider:

Read/Write Separation: Put write-intensive logs on different LUNs than read-intensive data. This allows optimizing each separately.

Tiering: Place hot data on SSD, warm data on fast disks, cold data on capacity disks. Storage virtualization can automate this.

Caching Appliances: Products like EMC FAST Cache extend array cache with SSD. This can dramatically improve performance for read-heavy workloads.

Application-Level Optimization: Sometimes the biggest gains come from optimizing the application itself—better indexing, query optimization, or algorithm improvements.

Measurement and Monitoring

You can’t optimize what you don’t measure. Essential monitoring includes:

Baseline Performance: Know what normal looks like. Without baselines, you can’t identify degradation.

Real-Time Monitoring: Track key metrics continuously. Set thresholds and alerts for anomalies.

End-to-End Visibility: Monitor the entire I/O path, not just individual components.

Correlation: When performance degrades, correlate across metrics to identify the root cause.

We’ve built extensive monitoring capabilities into MDS switches to help with this. The right instrumentation makes troubleshooting dramatically easier.

The 80/20 Rule

In my experience, 80% of performance improvements come from 20% of optimizations:

  • Proper RAID configuration
  • Adequate spindle count
  • Multipathing with appropriate policy
  • Sufficient HBA queue depth
  • Eliminating misalignment

Get these basics right before pursuing exotic optimizations.

Conclusion

SAN performance optimization is part science, part art. You need to understand the technology deeply, measure carefully, and think systematically about the I/O path.

Working on FC-Redirect has taught me that performance is often about eliminating bottlenecks rather than making fast things faster. Finding and fixing that one slow component can transform overall system performance.

The key is systematic troubleshooting, comprehensive monitoring, and deep understanding of every layer in the stack. Master these skills, and you’ll be able to diagnose and fix performance issues that others find mysterious.

Storage performance is a fascinating field that rewards deep expertise. I hope these insights help you optimize your own storage environments.