Capacity planning is both art and science. Plan too conservatively and you waste money. Plan too aggressively and you run out of capacity at the worst time. After a year of working with FC-Redirect at Cisco, I’ve developed approaches to capacity planning that I want to share.
Why Capacity Planning Matters
Poor capacity planning leads to:
Emergency Purchases: Rushed procurement at premium prices without competitive bidding.
Service Outages: Running out of capacity can take systems down.
Wasted Resources: Over-provisioning means paying for unused capacity.
Project Delays: Waiting for storage procurement delays new projects.
Performance Problems: Operating near full capacity often degrades performance.
Good capacity planning avoids these problems.
Planning Horizons
Plan for different time horizons:
Immediate (0-3 months): Detailed planning based on known projects and growth rates.
Near-Term (3-12 months): Based on trends and anticipated projects.
Long-Term (1-3 years): Strategic planning for major refreshes or architectural changes.
Different horizons require different levels of detail and have different uncertainties.
Data Collection
Capacity planning starts with data:
Current Utilization
Total Capacity: How much storage you have.
Used Capacity: How much is actually used.
Allocated Capacity: How much is allocated (may differ from used with thin provisioning).
Free Capacity: Available for growth.
Per-Array Breakdown: Utilization varies across arrays.
Per-Application Breakdown: Which applications consume most storage?
Historical Growth
Monthly Growth: Track capacity consumption month by month.
Growth Rate: Calculate percentage growth per month or year.
Trends: Is growth accelerating, decelerating, or steady?
Seasonal Patterns: Some workloads have seasonal variations.
Projects: Major projects that caused capacity spikes.
Historical data predicts future needs.
Performance Data
IOPS: Current IOPS consumption and peak IOPS.
Throughput: Current throughput and peak throughput.
Latency: Current latency and whether it’s acceptable.
Queue Depth: Indicates if you’re approaching performance limits.
You may hit performance limits before capacity limits.
Growth Modeling
Model different growth scenarios:
Linear Growth
Assumption: Growth continues at current rate.
Calculation: Current capacity + (growth rate × time period).
When Appropriate: Stable environments with predictable growth.
Limitations: Doesn’t account for major projects or changing patterns.
Simple but often inaccurate for long-term planning.
Exponential Growth
Assumption: Growth accelerates over time.
Calculation: Current capacity × (1 + growth rate)^time.
When Appropriate: Rapidly growing environments (new services, data warehouses).
Limitations: May over-predict if growth slows.
Common in early-stage growth but often transitions to linear.
Step Function
Assumption: Capacity needs increase in steps (new projects, acquisitions).
Calculation: Model each known project’s capacity needs.
When Appropriate: When major projects are known.
Limitations: Doesn’t account for base growth.
Combine with linear growth for complete picture.
Monte Carlo Simulation
Approach: Model uncertainty with probability distributions.
Benefits: Provides confidence intervals, not just point estimates.
Challenges: More complex, requires statistical knowledge.
When Appropriate: High-stakes decisions with significant uncertainty.
Sophisticated but valuable for major investments.
Thin Provisioning Implications
Thin provisioning complicates capacity planning:
Virtual vs. Physical: Track both allocated (virtual) and consumed (physical) capacity.
Allocation Rate: How quickly is virtual capacity being allocated?
Consumption Rate: How quickly is physical capacity being consumed?
Over-Provisioning Ratio: Ratio of virtual to physical capacity.
Future Consumption: As VMs grow, virtual capacity becomes physical consumption.
Model future physical consumption based on virtual allocation and expected growth.
Headroom and Safety Margin
Never plan to 100% capacity:
Operating Headroom: Leave 15-25% free for performance. Arrays perform poorly when >85% full.
Growth Buffer: Accommodate growth between procurement cycles.
Snapshot Space: Reserve capacity for snapshots.
Unexpected Needs: Buffer for unplanned requirements.
Procurement Lead Time: Enough capacity to last through procurement and installation.
I recommend planning to ~75% capacity, leaving 25% headroom.
Performance Capacity Planning
Capacity isn’t just GB—it’s also IOPS and throughput:
IOPS Capacity: Can your storage deliver required IOPS?
Throughput Capacity: Can it deliver required bandwidth?
Latency Requirements: Does it meet latency SLAs?
Mixed Workloads: Different workloads compete for performance.
Peak vs. Average: Plan for peak loads, not just average.
You may need more spindles for performance even if you have sufficient GB.
Tiering Strategy
Multiple storage tiers with different costs:
Tier 0: All-flash, very expensive, ultra-high performance.
Tier 1: Fast spinning disk (15K RPM) or hybrid, expensive, high performance.
Tier 2: Moderate disk (10K RPM), moderate cost and performance.
Tier 3: Capacity disk (7.2K RPM), cheap, adequate performance for archives.
Cloud/Tape: Very cheap, very slow, for archives and backups.
Model capacity needs per tier based on workload requirements.
Automated Tiering Impact
Automated tiering changes capacity planning:
Right-Sizing: Can provision less Tier 0/1 if automated tiering moves cold data to cheaper tiers.
Efficiency: Better utilization of expensive tiers.
Complexity: Need to model how much data stays hot vs. goes cold.
Performance: Ensure sufficient high-performance tier for working set.
Factor tiering efficiency into capacity models.
Virtualization Impact
Virtualization affects capacity planning:
VM Sprawl: Easy VM creation leads to rapid growth.
Snapshots: VMware snapshots consume capacity—often more than expected.
Thin Provisioning: VMs often use thin provisioning, complicating planning.
Cloning: VM cloning multiplies capacity needs.
Density: More VMs per datastore increases growth rate.
Plan for higher growth rates in virtualized environments.
Deduplication and Compression
Data reduction technologies affect capacity:
Deduplication Ratio: Can achieve 10:1 or more for certain workloads (VDI, backups).
Compression Ratio: Typically 2:1 to 3:1.
Workload Dependent: Ratios vary dramatically by workload.
Ongoing vs. Initial: Dedupe ratio often decreases over time as unique data accumulates.
Model conservatively—assume lower dedupe/compression ratios than vendor claims.
Cloud and Hybrid Models
Cloud complicates on-premises capacity planning:
Cloud Tiering: Cold data moves to cloud, reducing on-premises needs.
Cloud Bursting: Use cloud for temporary capacity needs.
Hybrid Applications: Some data in cloud, some on-premises.
Cost Trade-offs: Balance on-premises capex vs. cloud opex.
Factor cloud integration into long-term capacity models.
Procurement Planning
Align capacity planning with procurement:
Lead Times: Factor in vendor lead times (often 6-12 weeks).
Budget Cycles: Align capacity needs with budget cycles.
Volume Discounts: Larger purchases may get better pricing.
Refresh Cycles: Plan for replacing aging equipment.
Maintenance Contracts: Factor in maintenance costs.
Plan procurement 6-12 months ahead to avoid emergency purchases.
Monitoring and Alerting
Continuous monitoring validates planning:
Capacity Trending: Automated tracking of capacity consumption.
Threshold Alerts: Alert at 70%, 80%, 90% capacity.
Growth Rate Alerts: Alert on unexpected growth acceleration.
Forecast Reports: Regular reports on projected capacity exhaustion.
Performance Monitoring: Ensure you’re not hitting performance limits.
Monitoring provides early warning when reality diverges from plan.
Scenario Planning
Model different scenarios:
Best Case: Lower growth than expected.
Expected Case: Growth matches current trends.
Worst Case: Higher growth or major unexpected projects.
Disaster Scenario: What if you suddenly need 2x capacity?
Understanding range of possibilities helps make robust decisions.
Application-Level Planning
Different applications have different patterns:
Databases: Steady growth, spiky during batch processes.
File Services: Continuous growth, rarely shrinks.
Email: Per-user growth multiplied by user count.
Backup: Growth tied to primary storage growth.
VDI: Jump when new users onboard.
Big Data: Can grow explosively with new analytics projects.
Model each application class separately then aggregate.
Documentation
Document capacity planning:
Current State: Baseline capacity and utilization.
Growth Assumptions: What assumptions underlie projections?
Scenarios: Different scenarios modeled.
Recommendations: What capacity to procure and when.
Review Schedule: When to revisit plan.
Actuals vs. Plan: Track how reality compares to plan.
Good documentation enables reviewing and refining planning over time.
Review and Adjustment
Capacity planning is iterative:
Monthly Review: Compare actual growth to projected.
Quarterly Adjustment: Update projections based on actual trends.
Annual Re-baseline: Major revision of long-term plans.
Post-Project Review: Learn from major projects.
Continuously refine planning based on experience.
Common Mistakes
Capacity planning mistakes I see:
Planning to 100%: No headroom for growth or performance.
Ignoring Performance: Planning GB but ignoring IOPS/throughput needs.
Using Vendor Claims: Believing optimistic dedupe/compression ratios.
No Safety Margin: No buffer for unexpected needs.
Ignoring Snapshots: Not accounting for snapshot capacity.
Stale Data: Planning based on old utilization data.
Not Tracking: No monitoring of actual vs. planned.
Avoid these and your planning will be much more accurate.
Tools and Automation
Leverage tools:
Capacity Management Software: Tools that track utilization and project future needs.
Reporting: Automated capacity reports.
Alerting: Automated alerts on capacity thresholds.
Trending: Visualization of capacity trends.
Forecasting: Predictive analytics for capacity exhaustion dates.
Good tools make capacity planning easier and more accurate.
FC-Redirect and Capacity Planning
Storage virtualization affects capacity planning:
Pooling: Capacity pooled across heterogeneous arrays.
Thin Provisioning: Centralized thin provisioning simplifies planning.
Mobility: Easy data movement enables better utilization.
Visibility: Centralized view of capacity across all arrays.
Virtualization can improve capacity utilization and simplify planning.
Cost Modeling
Capacity planning includes cost:
Hardware Costs: Arrays, disks, switches, etc.
Maintenance: Annual maintenance fees (typically 15-20% of purchase price).
Power and Cooling: Ongoing operational costs.
Administration: Staff time to manage storage.
Software Licenses: Management software, replication, etc.
Total Cost of Ownership (TCO) over 3-5 years guides decisions.
Conclusion
Effective capacity planning requires:
- Accurate data collection
- Historical trend analysis
- Multiple scenario modeling
- Performance considerations
- Adequate safety margins
- Continuous monitoring and adjustment
The goal is balance: enough capacity to avoid crises without wasting money on unused resources.
Key principles:
- Plan for ~75% utilization, leaving 25% headroom
- Model both capacity (GB) and performance (IOPS/throughput)
- Include safety margins for unexpected needs
- Factor in procurement lead times
- Monitor continuously and adjust plans based on reality
- Document assumptions and review regularly
Working with FC-Redirect has shown me how storage virtualization can improve capacity planning through better visibility, pooling, and utilization. Centralized capacity management across heterogeneous arrays simplifies what would otherwise be complex per-array planning.
Good capacity planning seems invisible—you always have enough capacity but aren’t wasting money. Poor capacity planning is very visible—either running out of storage or vast amounts of idle capacity.
Invest time in capacity planning. The return—avoiding crises and optimizing spending—far exceeds the effort.