More customers are adopting hybrid cloud strategies: keeping core workloads on-premise while leveraging AWS for burst capacity, disaster recovery, and new applications. This creates an interesting challenge: how do we bridge high-performance on-premise FC storage with AWS cloud storage?
The Hybrid Challenge
Customers want:
- Keep existing FC storage investments
- Leverage AWS for flexibility and scale
- Seamless data movement between environments
- Consistent management across both
But on-premise and cloud storage are fundamentally different:
On-Premise FC: AWS:
- Low latency (<1ms) - Higher latency (10-50ms)
- High bandwidth (32 Gbps) - Variable bandwidth
- Predictable performance - Shared resources
- CapEx model - OpEx model
- Block storage (LUNs) - Object storage (S3)
Bridging these worlds requires careful architecture.
Use Case 1: Cloud Bursting
Burst to AWS when on-premise capacity is exhausted:
Architecture
On-Premise:
βββββββββββββββββββββββββββββββββββ
β Production VMs (FC Storage) β
β ββββββ ββββββ ββββββ β
β βVM1 β βVM2 β βVM3 β β
β ββββ¬ββ ββββ¬ββ ββββ¬ββ β
β β β β β
β ββββΌβββββββΌβββββββΌβββββββ β
β β FC Storage Array β β
β ββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββ
When capacity fills:
β
VPN Tunnel
β
βΌ
AWS:
βββββββββββββββββββββββββββββββββββ
β Burst VMs (EBS Storage) β
β ββββββ ββββββ β
β βVM4 β βVM5 β β
β ββββ¬ββ ββββ¬ββ β
β β β β
β ββββΌβββββββΌβββββββ β
β β EBS Volumes β β
β ββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββ
Implementation
from boto3 import client
import fc_redirect_client
class HybridStorageOrchestrator:
def __init__(self, fc_client, aws_client):
self.fc = fc_client
self.ec2 = aws_client
def check_capacity(self):
fc_stats = self.fc.get_capacity_stats()
utilization = fc_stats['used'] / fc_stats['total']
return {
'utilization': utilization,
'should_burst': utilization > 0.85 # Burst at 85%
}
def create_burst_vm(self, vm_spec):
# Create EBS volume
volume = self.ec2.create_volume(
Size=vm_spec['disk_size_gb'],
VolumeType='gp2', # SSD
AvailabilityZone='us-east-1a'
)
# Launch EC2 instance
instance = self.ec2.run_instances(
ImageId=vm_spec['ami_id'],
InstanceType=vm_spec['instance_type'],
BlockDeviceMappings=[{
'DeviceName': '/dev/sda1',
'Ebs': {'VolumeId': volume['VolumeId']}
}]
)
# Register with load balancer
self.register_with_lb(instance['InstanceId'])
return instance
def migrate_data_to_aws(self, fc_volume_id, ebs_volume_id):
# Create DataSync task for one-time migration
datasync = client('datasync')
task = datasync.create_task(
SourceLocationArn=f'fc://{fc_volume_id}',
DestinationLocationArn=f'ebs://{ebs_volume_id}',
Options={
'VerifyMode': 'POINT_IN_TIME_CONSISTENT',
'PreserveDeletedFiles': 'PRESERVE',
'TransferMode': 'ALL'
}
)
# Start transfer
datasync.start_task_execution(TaskArn=task['TaskArn'])
return task
Data Synchronization
Keep data synchronized between on-premise and AWS:
class HybridDataSync:
def __init__(self, fc_client, s3_client):
self.fc = fc_client
self.s3 = s3_client
def setup_continuous_sync(self, fc_volume, s3_bucket):
# Set up replication from FC to S3
replication_config = {
'source': f'fc://{fc_volume}',
'destination': f's3://{s3_bucket}',
'mode': 'continuous',
'bandwidth_limit_mbps': 100, # Don't saturate VPN
'schedule': {
'type': 'change_detection',
'interval_seconds': 60
}
}
# Monitor FC volume for changes
self.fc.watch_volume_changes(
volume_id=fc_volume,
callback=lambda changes: self.replicate_changes(changes, s3_bucket)
)
def replicate_changes(self, changes, s3_bucket):
# Batch small changes
if len(changes) < 100:
# Upload to S3
for change in changes:
self.s3.put_object(
Bucket=s3_bucket,
Key=change['path'],
Body=change['data']
)
else:
# Use multipart upload for large batches
self.s3.upload_fileobj(
Bucket=s3_bucket,
Key='batch.tar.gz',
Fileobj=create_tar_archive(changes)
)
Use Case 2: Disaster Recovery
Use AWS as DR target for on-premise FC storage:
Architecture
Primary Site (On-Premise):
βββββββββββββββββββββββββββββββββββ
β Production (Active) β
β ββββββββββββββββββββββββββββββ β
β β FC Storage (Primary) β β
β ββββββββββββββ¬ββββββββββββββββ β
βββββββββββββββββΌββββββββββββββββββ
β
Async Replication
β
βΌ
DR Site (AWS):
βββββββββββββββββββββββββββββββββββ
β Standby (Passive) β
β ββββββββββββββββββββββββββββββ β
β β S3 + EBS (DR Copy) β β
β ββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββ
Implementation
class DisasterRecovery:
def __init__(self, fc_client, s3_client):
self.fc = fc_client
self.s3 = s3_client
def setup_dr_replication(self, fc_volume, s3_bucket):
# Create snapshot schedule
snapshot_schedule = {
'frequency': 'hourly',
'retention': {
'hourly': 24, # Keep 24 hourly
'daily': 7, # Keep 7 daily
'weekly': 4, # Keep 4 weekly
'monthly': 12 # Keep 12 monthly
}
}
# Set up automated snapshots
self.fc.create_snapshot_schedule(
volume_id=fc_volume,
schedule=snapshot_schedule,
callback=lambda snapshot: self.replicate_snapshot_to_s3(
snapshot, s3_bucket
)
)
def replicate_snapshot_to_s3(self, snapshot, s3_bucket):
# Export snapshot to block format
snapshot_data = self.fc.export_snapshot(snapshot['id'])
# Compress
compressed = compress_snapshot(snapshot_data)
# Upload to S3 with metadata
self.s3.put_object(
Bucket=s3_bucket,
Key=f'snapshots/{snapshot["timestamp"]}.snap.gz',
Body=compressed,
Metadata={
'source_volume': snapshot['volume_id'],
'timestamp': str(snapshot['timestamp']),
'size_bytes': str(snapshot['size'])
},
StorageClass='STANDARD_IA' # Infrequent Access (cheaper)
)
def failover_to_aws(self, s3_bucket):
# List available snapshots
snapshots = self.s3.list_objects_v2(
Bucket=s3_bucket,
Prefix='snapshots/'
)
# Get latest snapshot
latest = sorted(
snapshots['Contents'],
key=lambda x: x['LastModified']
)[-1]
# Download and restore
snapshot_data = self.s3.get_object(
Bucket=s3_bucket,
Key=latest['Key']
)
# Decompress
decompressed = decompress_snapshot(snapshot_data['Body'].read())
# Create EBS volume from snapshot
volume = self.ec2.create_volume(
Size=calculate_size_gb(decompressed),
AvailabilityZone='us-east-1a',
SnapshotId=self.import_snapshot_to_ebs(decompressed)
)
# Launch DR instances
self.launch_dr_instances(volume['VolumeId'])
# Update DNS to point to DR site
self.update_dns_to_dr()
return volume
Use Case 3: Tiered Storage
Automatically tier cold data to AWS S3:
Implementation
class StorageTiering:
def __init__(self, fc_client, s3_client):
self.fc = fc_client
self.s3 = s3_client
def analyze_access_patterns(self, fc_volume):
# Get access statistics
stats = self.fc.get_volume_access_stats(
volume_id=fc_volume,
period_days=90
)
# Identify cold data (not accessed in 60 days)
cold_data = []
for block in stats['blocks']:
if block['last_access_days'] > 60:
cold_data.append(block)
return cold_data
def tier_to_s3(self, fc_volume, s3_bucket):
cold_data = self.analyze_access_patterns(fc_volume)
# Move cold blocks to S3
for block in cold_data:
# Read block from FC
data = self.fc.read_block(fc_volume, block['id'])
# Upload to S3 Glacier (cheapest tier)
self.s3.put_object(
Bucket=s3_bucket,
Key=f'cold/{fc_volume}/{block["id"]}',
Body=data,
StorageClass='GLACIER'
)
# Replace block with stub on FC
self.fc.replace_with_stub(
volume_id=fc_volume,
block_id=block['id'],
stub_metadata={
's3_bucket': s3_bucket,
's3_key': f'cold/{fc_volume}/{block["id"]}'
}
)
def recall_from_s3(self, fc_volume, block_id):
# Get stub metadata
stub = self.fc.get_stub_metadata(fc_volume, block_id)
# Initiate Glacier retrieval (takes hours)
restore_request = self.s3.restore_object(
Bucket=stub['s3_bucket'],
Key=stub['s3_key'],
RestoreRequest={
'Days': 1,
'GlacierJobParameters': {
'Tier': 'Expedited' # 1-5 minutes
}
}
)
# Poll until ready
while True:
response = self.s3.head_object(
Bucket=stub['s3_bucket'],
Key=stub['s3_key']
)
if 'x-amz-restore' in response['ResponseMetadata']['HTTPHeaders']:
if 'ongoing-request="false"' in response['ResponseMetadata']['HTTPHeaders']['x-amz-restore']:
break
time.sleep(60)
# Download from S3
obj = self.s3.get_object(
Bucket=stub['s3_bucket'],
Key=stub['s3_key']
)
# Write back to FC volume
self.fc.write_block(fc_volume, block_id, obj['Body'].read())
# Remove stub
self.fc.remove_stub(fc_volume, block_id)
Performance Considerations
Hybrid storage requires careful performance management:
class HybridPerformanceManager:
def __init__(self):
self.latency_targets = {
'fc': 1, # 1ms
'ebs': 10, # 10ms
's3': 100, # 100ms
'glacier': 3600000 # Hours
}
def route_io_request(self, request):
# Latency-sensitive: Keep on FC
if request['latency_requirement_ms'] < 5:
return 'fc'
# Moderate latency: EBS acceptable
elif request['latency_requirement_ms'] < 50:
# Check if data is on EBS already
if self.is_on_ebs(request['block_id']):
return 'ebs'
else:
return 'fc' # Don't migrate for one access
# High latency tolerable: S3 acceptable
else:
return 's3'
def optimize_data_placement(self, workload_stats):
# Frequently accessed: Keep on FC
hot_threshold = 100 # accesses per day
# Occasionally accessed: Tier to EBS
warm_threshold = 10
# Rarely accessed: Tier to S3/Glacier
cold_threshold = 1
for block in workload_stats['blocks']:
accesses_per_day = block['accesses'] / workload_stats['period_days']
if accesses_per_day >= hot_threshold:
if block['current_tier'] != 'fc':
self.migrate_to_fc(block)
elif accesses_per_day >= warm_threshold:
if block['current_tier'] not in ['fc', 'ebs']:
self.migrate_to_ebs(block)
elif accesses_per_day >= cold_threshold:
if block['current_tier'] not in ['fc', 'ebs', 's3']:
self.migrate_to_s3(block)
else:
if block['current_tier'] != 'glacier':
self.migrate_to_glacier(block)
Cost Optimization
Hybrid storage enables cost optimization:
class HybridCostOptimizer:
def __init__(self):
self.costs = {
'fc': {
'storage_per_gb_month': 0.50, # Expensive
'iops': 0.0001
},
'ebs': {
'storage_per_gb_month': 0.10, # Medium
'iops': 0.00002
},
's3': {
'storage_per_gb_month': 0.023, # Cheap
'requests_per_1000': 0.005
},
'glacier': {
'storage_per_gb_month': 0.004, # Cheapest
'retrieval_per_gb': 0.09
}
}
def calculate_optimal_placement(self, block_stats):
# Calculate cost for each tier
costs = {}
for tier in ['fc', 'ebs', 's3', 'glacier']:
storage_cost = (
block_stats['size_gb'] *
self.costs[tier]['storage_per_gb_month']
)
if tier in ['fc', 'ebs']:
access_cost = (
block_stats['accesses_per_month'] *
self.costs[tier]['iops']
)
elif tier == 's3':
access_cost = (
block_stats['accesses_per_month'] / 1000 *
self.costs[tier]['requests_per_1000']
)
else: # glacier
access_cost = (
block_stats['accesses_per_month'] *
block_stats['size_gb'] *
self.costs[tier]['retrieval_per_gb']
)
costs[tier] = storage_cost + access_cost
# Return cheapest tier
return min(costs, key=costs.get)
Lessons Learned
Building hybrid storage taught me:
-
Latency matters: Cloud storage is 10-100x slower than FC. Design for this.
-
Bandwidth is limited: VPN/DirectConnect has finite bandwidth. Donβt saturate it.
-
Cost optimization is complex: Cheapest storage isnβt always cheapest total cost.
-
Data gravity is real: Moving data between environments is expensive. Minimize movement.
-
Automation is essential: Manual hybrid management doesnβt scale.
Looking Forward
Hybrid storage is the future for enterprises:
- Keep performance-critical data on-premise (FC)
- Tier warm data to cloud (EBS)
- Archive cold data to cheap cloud storage (S3/Glacier)
- Burst to cloud for temporary capacity
The challenge is making this seamless. FC-Redirectβs hybrid capabilities enable transparent tiering and cloud bursting.
As AWS adds more storage options (FSx, Storage Gateway, etc.), hybrid architectures become more powerful.
The future isnβt all-cloud or all-on-premise. Itβs hybrid, with workloads and data placed optimally across environments.
Hybrid storage: best of both worlds.