Hybrid Storage: Bridging On-Premise FC and AWS

More customers are adopting hybrid cloud strategies: keeping core workloads on-premise while leveraging AWS for burst capacity, disaster recovery, and new applications. This creates an interesting challenge: how do we bridge high-performance on-premise FC storage with AWS cloud storage?

The Hybrid Challenge

Customers want:

Keep existing FC storage investments
Leverage AWS for flexibility and scale
Seamless data movement between environments
Consistent management across both

But on-premise and cloud storage are fundamentally different:

On-Premise FC:                    AWS:
- Low latency (<1ms)              - Higher latency (10-50ms)
- High bandwidth (32 Gbps)        - Variable bandwidth
- Predictable performance         - Shared resources
- CapEx model                     - OpEx model
- Block storage (LUNs)            - Object storage (S3)

Bridging these worlds requires careful architecture.

Use Case 1: Cloud Bursting

Burst to AWS when on-premise capacity is exhausted:

Architecture

On-Premise:
┌─────────────────────────────────┐
│  Production VMs (FC Storage)    │
│  ┌────┐ ┌────┐ ┌────┐          │
│  │VM1 │ │VM2 │ │VM3 │          │
│  └──┬─┘ └──┬─┘ └──┬─┘          │
│     │      │      │             │
│  ┌──▼──────▼──────▼──────┐     │
│  │   FC Storage Array     │     │
│  └────────────────────────┘     │
└─────────────────────────────────┘

When capacity fills:
           │
       VPN Tunnel
           │
           ▼
AWS:
┌─────────────────────────────────┐
│  Burst VMs (EBS Storage)        │
│  ┌────┐ ┌────┐                  │
│  │VM4 │ │VM5 │                  │
│  └──┬─┘ └──┬─┘                  │
│     │      │                     │
│  ┌──▼──────▼──────┐             │
│  │   EBS Volumes  │             │
│  └────────────────┘             │
└─────────────────────────────────┘

Implementation

from boto3 import client
import fc_redirect_client

class HybridStorageOrchestrator:
    def __init__(self, fc_client, aws_client):
        self.fc = fc_client
        self.ec2 = aws_client

    def check_capacity(self):
        fc_stats = self.fc.get_capacity_stats()

        utilization = fc_stats['used'] / fc_stats['total']

        return {
            'utilization': utilization,
            'should_burst': utilization > 0.85  # Burst at 85%
        }

    def create_burst_vm(self, vm_spec):
        # Create EBS volume
        volume = self.ec2.create_volume(
            Size=vm_spec['disk_size_gb'],
            VolumeType='gp2',  # SSD
            AvailabilityZone='us-east-1a'
        )

        # Launch EC2 instance
        instance = self.ec2.run_instances(
            ImageId=vm_spec['ami_id'],
            InstanceType=vm_spec['instance_type'],
            BlockDeviceMappings=[{
                'DeviceName': '/dev/sda1',
                'Ebs': {'VolumeId': volume['VolumeId']}
            }]
        )

        # Register with load balancer
        self.register_with_lb(instance['InstanceId'])

        return instance

    def migrate_data_to_aws(self, fc_volume_id, ebs_volume_id):
        # Create DataSync task for one-time migration
        datasync = client('datasync')

        task = datasync.create_task(
            SourceLocationArn=f'fc://{fc_volume_id}',
            DestinationLocationArn=f'ebs://{ebs_volume_id}',
            Options={
                'VerifyMode': 'POINT_IN_TIME_CONSISTENT',
                'PreserveDeletedFiles': 'PRESERVE',
                'TransferMode': 'ALL'
            }
        )

        # Start transfer
        datasync.start_task_execution(TaskArn=task['TaskArn'])

        return task

Data Synchronization

Keep data synchronized between on-premise and AWS:

class HybridDataSync:
    def __init__(self, fc_client, s3_client):
        self.fc = fc_client
        self.s3 = s3_client

    def setup_continuous_sync(self, fc_volume, s3_bucket):
        # Set up replication from FC to S3
        replication_config = {
            'source': f'fc://{fc_volume}',
            'destination': f's3://{s3_bucket}',
            'mode': 'continuous',
            'bandwidth_limit_mbps': 100,  # Don't saturate VPN
            'schedule': {
                'type': 'change_detection',
                'interval_seconds': 60
            }
        }

        # Monitor FC volume for changes
        self.fc.watch_volume_changes(
            volume_id=fc_volume,
            callback=lambda changes: self.replicate_changes(changes, s3_bucket)
        )

    def replicate_changes(self, changes, s3_bucket):
        # Batch small changes
        if len(changes) < 100:
            # Upload to S3
            for change in changes:
                self.s3.put_object(
                    Bucket=s3_bucket,
                    Key=change['path'],
                    Body=change['data']
                )
        else:
            # Use multipart upload for large batches
            self.s3.upload_fileobj(
                Bucket=s3_bucket,
                Key='batch.tar.gz',
                Fileobj=create_tar_archive(changes)
            )

Use Case 2: Disaster Recovery

Use AWS as DR target for on-premise FC storage:

Architecture

Primary Site (On-Premise):
┌─────────────────────────────────┐
│  Production (Active)            │
│  ┌────────────────────────────┐ │
│  │  FC Storage (Primary)      │ │
│  └────────────┬───────────────┘ │
└───────────────┼─────────────────┘
                │
          Async Replication
                │
                ▼
DR Site (AWS):
┌─────────────────────────────────┐
│  Standby (Passive)              │
│  ┌────────────────────────────┐ │
│  │  S3 + EBS (DR Copy)        │ │
│  └────────────────────────────┘ │
└─────────────────────────────────┘

Implementation

class DisasterRecovery:
    def __init__(self, fc_client, s3_client):
        self.fc = fc_client
        self.s3 = s3_client

    def setup_dr_replication(self, fc_volume, s3_bucket):
        # Create snapshot schedule
        snapshot_schedule = {
            'frequency': 'hourly',
            'retention': {
                'hourly': 24,   # Keep 24 hourly
                'daily': 7,     # Keep 7 daily
                'weekly': 4,    # Keep 4 weekly
                'monthly': 12   # Keep 12 monthly
            }
        }

        # Set up automated snapshots
        self.fc.create_snapshot_schedule(
            volume_id=fc_volume,
            schedule=snapshot_schedule,
            callback=lambda snapshot: self.replicate_snapshot_to_s3(
                snapshot, s3_bucket
            )
        )

    def replicate_snapshot_to_s3(self, snapshot, s3_bucket):
        # Export snapshot to block format
        snapshot_data = self.fc.export_snapshot(snapshot['id'])

        # Compress
        compressed = compress_snapshot(snapshot_data)

        # Upload to S3 with metadata
        self.s3.put_object(
            Bucket=s3_bucket,
            Key=f'snapshots/{snapshot["timestamp"]}.snap.gz',
            Body=compressed,
            Metadata={
                'source_volume': snapshot['volume_id'],
                'timestamp': str(snapshot['timestamp']),
                'size_bytes': str(snapshot['size'])
            },
            StorageClass='STANDARD_IA'  # Infrequent Access (cheaper)
        )

    def failover_to_aws(self, s3_bucket):
        # List available snapshots
        snapshots = self.s3.list_objects_v2(
            Bucket=s3_bucket,
            Prefix='snapshots/'
        )

        # Get latest snapshot
        latest = sorted(
            snapshots['Contents'],
            key=lambda x: x['LastModified']
        )[-1]

        # Download and restore
        snapshot_data = self.s3.get_object(
            Bucket=s3_bucket,
            Key=latest['Key']
        )

        # Decompress
        decompressed = decompress_snapshot(snapshot_data['Body'].read())

        # Create EBS volume from snapshot
        volume = self.ec2.create_volume(
            Size=calculate_size_gb(decompressed),
            AvailabilityZone='us-east-1a',
            SnapshotId=self.import_snapshot_to_ebs(decompressed)
        )

        # Launch DR instances
        self.launch_dr_instances(volume['VolumeId'])

        # Update DNS to point to DR site
        self.update_dns_to_dr()

        return volume

Use Case 3: Tiered Storage

Automatically tier cold data to AWS S3:

Implementation

class StorageTiering:
    def __init__(self, fc_client, s3_client):
        self.fc = fc_client
        self.s3 = s3_client

    def analyze_access_patterns(self, fc_volume):
        # Get access statistics
        stats = self.fc.get_volume_access_stats(
            volume_id=fc_volume,
            period_days=90
        )

        # Identify cold data (not accessed in 60 days)
        cold_data = []
        for block in stats['blocks']:
            if block['last_access_days'] > 60:
                cold_data.append(block)

        return cold_data

    def tier_to_s3(self, fc_volume, s3_bucket):
        cold_data = self.analyze_access_patterns(fc_volume)

        # Move cold blocks to S3
        for block in cold_data:
            # Read block from FC
            data = self.fc.read_block(fc_volume, block['id'])

            # Upload to S3 Glacier (cheapest tier)
            self.s3.put_object(
                Bucket=s3_bucket,
                Key=f'cold/{fc_volume}/{block["id"]}',
                Body=data,
                StorageClass='GLACIER'
            )

            # Replace block with stub on FC
            self.fc.replace_with_stub(
                volume_id=fc_volume,
                block_id=block['id'],
                stub_metadata={
                    's3_bucket': s3_bucket,
                    's3_key': f'cold/{fc_volume}/{block["id"]}'
                }
            )

    def recall_from_s3(self, fc_volume, block_id):
        # Get stub metadata
        stub = self.fc.get_stub_metadata(fc_volume, block_id)

        # Initiate Glacier retrieval (takes hours)
        restore_request = self.s3.restore_object(
            Bucket=stub['s3_bucket'],
            Key=stub['s3_key'],
            RestoreRequest={
                'Days': 1,
                'GlacierJobParameters': {
                    'Tier': 'Expedited'  # 1-5 minutes
                }
            }
        )

        # Poll until ready
        while True:
            response = self.s3.head_object(
                Bucket=stub['s3_bucket'],
                Key=stub['s3_key']
            )

            if 'x-amz-restore' in response['ResponseMetadata']['HTTPHeaders']:
                if 'ongoing-request="false"' in response['ResponseMetadata']['HTTPHeaders']['x-amz-restore']:
                    break

            time.sleep(60)

        # Download from S3
        obj = self.s3.get_object(
            Bucket=stub['s3_bucket'],
            Key=stub['s3_key']
        )

        # Write back to FC volume
        self.fc.write_block(fc_volume, block_id, obj['Body'].read())

        # Remove stub
        self.fc.remove_stub(fc_volume, block_id)

Performance Considerations

Hybrid storage requires careful performance management:

class HybridPerformanceManager:
    def __init__(self):
        self.latency_targets = {
            'fc': 1,      # 1ms
            'ebs': 10,    # 10ms
            's3': 100,    # 100ms
            'glacier': 3600000  # Hours
        }

    def route_io_request(self, request):
        # Latency-sensitive: Keep on FC
        if request['latency_requirement_ms'] < 5:
            return 'fc'

        # Moderate latency: EBS acceptable
        elif request['latency_requirement_ms'] < 50:
            # Check if data is on EBS already
            if self.is_on_ebs(request['block_id']):
                return 'ebs'
            else:
                return 'fc'  # Don't migrate for one access

        # High latency tolerable: S3 acceptable
        else:
            return 's3'

    def optimize_data_placement(self, workload_stats):
        # Frequently accessed: Keep on FC
        hot_threshold = 100  # accesses per day

        # Occasionally accessed: Tier to EBS
        warm_threshold = 10

        # Rarely accessed: Tier to S3/Glacier
        cold_threshold = 1

        for block in workload_stats['blocks']:
            accesses_per_day = block['accesses'] / workload_stats['period_days']

            if accesses_per_day >= hot_threshold:
                if block['current_tier'] != 'fc':
                    self.migrate_to_fc(block)

            elif accesses_per_day >= warm_threshold:
                if block['current_tier'] not in ['fc', 'ebs']:
                    self.migrate_to_ebs(block)

            elif accesses_per_day >= cold_threshold:
                if block['current_tier'] not in ['fc', 'ebs', 's3']:
                    self.migrate_to_s3(block)

            else:
                if block['current_tier'] != 'glacier':
                    self.migrate_to_glacier(block)

Cost Optimization

Hybrid storage enables cost optimization:

class HybridCostOptimizer:
    def __init__(self):
        self.costs = {
            'fc': {
                'storage_per_gb_month': 0.50,  # Expensive
                'iops': 0.0001
            },
            'ebs': {
                'storage_per_gb_month': 0.10,  # Medium
                'iops': 0.00002
            },
            's3': {
                'storage_per_gb_month': 0.023,  # Cheap
                'requests_per_1000': 0.005
            },
            'glacier': {
                'storage_per_gb_month': 0.004,  # Cheapest
                'retrieval_per_gb': 0.09
            }
        }

    def calculate_optimal_placement(self, block_stats):
        # Calculate cost for each tier
        costs = {}

        for tier in ['fc', 'ebs', 's3', 'glacier']:
            storage_cost = (
                block_stats['size_gb'] *
                self.costs[tier]['storage_per_gb_month']
            )

            if tier in ['fc', 'ebs']:
                access_cost = (
                    block_stats['accesses_per_month'] *
                    self.costs[tier]['iops']
                )
            elif tier == 's3':
                access_cost = (
                    block_stats['accesses_per_month'] / 1000 *
                    self.costs[tier]['requests_per_1000']
                )
            else:  # glacier
                access_cost = (
                    block_stats['accesses_per_month'] *
                    block_stats['size_gb'] *
                    self.costs[tier]['retrieval_per_gb']
                )

            costs[tier] = storage_cost + access_cost

        # Return cheapest tier
        return min(costs, key=costs.get)

Lessons Learned

Building hybrid storage taught me:

Latency matters: Cloud storage is 10-100x slower than FC. Design for this.
Bandwidth is limited: VPN/DirectConnect has finite bandwidth. Don’t saturate it.
Cost optimization is complex: Cheapest storage isn’t always cheapest total cost.
Data gravity is real: Moving data between environments is expensive. Minimize movement.
Automation is essential: Manual hybrid management doesn’t scale.

Looking Forward

Hybrid storage is the future for enterprises:

Keep performance-critical data on-premise (FC)
Tier warm data to cloud (EBS)
Archive cold data to cheap cloud storage (S3/Glacier)
Burst to cloud for temporary capacity

The challenge is making this seamless. FC-Redirect’s hybrid capabilities enable transparent tiering and cloud bursting.

As AWS adds more storage options (FSx, Storage Gateway, etc.), hybrid architectures become more powerful.

The future isn’t all-cloud or all-on-premise. It’s hybrid, with workloads and data placed optimally across environments.

Hybrid storage: best of both worlds.

The Hybrid Challenge

Use Case 1: Cloud Bursting

Architecture

Implementation

Data Synchronization

Use Case 2: Disaster Recovery

Architecture

Implementation

Use Case 3: Tiered Storage

Implementation

Performance Considerations

Cost Optimization

Lessons Learned

Looking Forward

Related Posts

2014 Year in Review: Evolution and Maturity

Integrating FC-Redirect with OpenStack: Cloud Meets Storage

Implementing Distributed Consensus with Raft for FC-Redirect