Microservices are gaining traction in the industry, and Iβve been thinking about how these patterns apply to storage networking infrastructure. While FC-Redirect isnβt a traditional web application, many microservices principles translate surprisingly well to our domain.
What Are Microservices?
The microservices approach advocates building systems as collections of small, independent services that:
- Do one thing well
- Communicate via well-defined APIs
- Can be deployed independently
- Are loosely coupled
- Own their own data
Contrast this with our traditional monolithic architecture where FC-Redirect was a single large process handling all concerns.
Decomposing FC-Redirect
Iβve been exploring how to decompose FC-Redirect into services:
Service Boundaries
Traditional Monolith:
βββββββββββββββββββββββββββββββββββ
β FC-Redirect Process β
β ββββββββββββββββββββββββββββ β
β β Flow Management β β
β β Policy Engine β β
β β Statistics Collection β β
β β Monitoring β β
β β Configuration Management β β
β β Storage Layer β β
β ββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββ
Microservices Decomposition:
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
βFlow Manager β βPolicy Engine β βStats Service β
βService β βService β β β
ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ
β β β
βββββββββββββββββββ΄ββββββββββββββββββ
β
ββββββββββββΌβββββββββββ
β Message Bus (IPC) β
ββββββββββββ¬βββββββββββ
βββββββββββββββββββ΄ββββββββββββββββββ
β β β
ββββββββΌββββββββ ββββββββΌββββββββ ββββββββΌββββββββ
βConfig Serviceβ βMonitor Serviceβ βStorage Serviceβ
ββββββββββββββββ βββββββββββββββββ βββββββββββββββββ
Service Responsibilities
Flow Manager Service:
// Flow management API
typedef struct flow_manager_api {
flow_id_t (*create_flow)(flow_spec_t *spec);
bool (*delete_flow)(flow_id_t id);
flow_info_t (*get_flow)(flow_id_t id);
flow_list_t (*list_flows)(filter_t *filter);
} flow_manager_api_t;
Policy Engine Service:
// Policy evaluation API
typedef struct policy_engine_api {
policy_decision_t (*evaluate)(flow_context_t *ctx);
bool (*update_policy)(policy_id_t id, policy_t *policy);
policy_list_t (*list_policies)();
} policy_engine_api_t;
Statistics Service:
// Statistics collection API
typedef struct stats_service_api {
void (*record_packet)(flow_id_t flow, packet_info_t *info);
flow_stats_t (*get_flow_stats)(flow_id_t flow);
aggregate_stats_t (*get_aggregate_stats)(time_range_t *range);
} stats_service_api_t;
Inter-Service Communication
Services need efficient, reliable communication:
Shared Memory IPC
For low-latency local communication:
typedef struct ipc_channel {
// Shared memory ring buffer
void *shared_mem;
size_t size;
// Synchronization
atomic_uint64_t read_index;
atomic_uint64_t write_index;
// Service metadata
service_id_t producer;
service_id_t consumer;
} ipc_channel_t;
// Create channel between services
ipc_channel_t* create_ipc_channel(service_id_t producer,
service_id_t consumer) {
ipc_channel_t *channel = malloc(sizeof(ipc_channel_t));
// Create shared memory segment
channel->shared_mem = create_shm("/fc_redirect_ipc",
CHANNEL_SIZE);
channel->producer = producer;
channel->consumer = consumer;
atomic_store(&channel->read_index, 0);
atomic_store(&channel->write_index, 0);
return channel;
}
// Send message
bool ipc_send(ipc_channel_t *channel, message_t *msg) {
size_t write_idx = atomic_load(&channel->write_index);
size_t read_idx = atomic_load(&channel->read_index);
// Check if full
if (write_idx - read_idx >= CHANNEL_SIZE) {
return false;
}
// Write message
size_t offset = (write_idx % CHANNEL_SIZE) * sizeof(message_t);
memcpy(channel->shared_mem + offset, msg, sizeof(message_t));
// Update write index
atomic_store(&channel->write_index, write_idx + 1);
return true;
}
Shared memory IPC gives us sub-microsecond latency between services.
Service Discovery
Services need to find each other:
typedef struct service_registry {
struct {
service_id_t id;
char name[64];
char endpoint[256];
service_status_t status;
timestamp_t last_heartbeat;
} services[MAX_SERVICES];
int num_services;
pthread_rwlock_t lock;
} service_registry_t;
// Register service
bool register_service(service_registry_t *registry,
const char *name,
const char *endpoint) {
pthread_rwlock_wrlock(®istry->lock);
if (registry->num_services >= MAX_SERVICES) {
pthread_rwlock_unlock(®istry->lock);
return false;
}
int idx = registry->num_services++;
registry->services[idx].id = generate_service_id();
strncpy(registry->services[idx].name, name, 64);
strncpy(registry->services[idx].endpoint, endpoint, 256);
registry->services[idx].status = SERVICE_RUNNING;
registry->services[idx].last_heartbeat = time_ms();
pthread_rwlock_unlock(®istry->lock);
return true;
}
// Discover service
char* discover_service(service_registry_t *registry,
const char *name) {
pthread_rwlock_rdlock(®istry->lock);
for (int i = 0; i < registry->num_services; i++) {
if (strcmp(registry->services[i].name, name) == 0 &&
registry->services[i].status == SERVICE_RUNNING) {
char *endpoint = strdup(registry->services[i].endpoint);
pthread_rwlock_unlock(®istry->lock);
return endpoint;
}
}
pthread_rwlock_unlock(®istry->lock);
return NULL;
}
Health Checking
Services must monitor each other:
typedef struct health_check {
service_id_t target;
health_check_fn check_fn;
uint32_t interval_ms;
uint32_t timeout_ms;
uint32_t failure_threshold;
} health_check_t;
void* health_check_thread(void *arg) {
health_check_t *hc = (health_check_t*)arg;
uint32_t consecutive_failures = 0;
while (running) {
bool healthy = hc->check_fn(hc->target, hc->timeout_ms);
if (healthy) {
consecutive_failures = 0;
mark_service_healthy(hc->target);
} else {
consecutive_failures++;
if (consecutive_failures >= hc->failure_threshold) {
mark_service_unhealthy(hc->target);
trigger_failover(hc->target);
}
}
sleep_ms(hc->interval_ms);
}
return NULL;
}
Benefits Weβve Seen
Independent Scaling
Different services have different resource needs:
Flow Manager: CPU-bound, scales horizontally
Policy Engine: CPU-bound, but mostly read-only (caching helps)
Stats Service: Write-heavy, benefits from batching
Storage Service: I/O-bound, needs SSD storage
With microservices, we can scale each independently:
# Scale flow managers horizontally
start_service flow_manager --instance 1 --cores 0,1
start_service flow_manager --instance 2 --cores 2,3
start_service flow_manager --instance 3 --cores 4,5
# Stats service on different hardware
start_service stats_service --storage ssd --batch-size 1000
# Single policy engine (mostly read-only, cached)
start_service policy_engine --cache-size 10000
Independent Deployment
We can update services independently:
# Update stats service without affecting flow processing
stop_service stats_service
deploy_service stats_service --version 2.1.0
start_service stats_service
# Flow processing continues uninterrupted
# (stats buffered during service restart)
Failure Isolation
Service failures donβt cascade:
// If stats service fails, flow processing continues
void process_packet_resilient(packet_t *pkt) {
// Critical path: flow management
flow_entry_t *flow = flow_manager_lookup(pkt->flow_key);
policy_decision_t decision = policy_engine_evaluate(flow);
apply_decision(pkt, decision);
// Non-critical: statistics (best-effort)
if (stats_service_available()) {
stats_service_record(pkt);
} else {
// Buffer for later or drop (stats are not critical)
buffer_stats_update(pkt);
}
forward_packet(pkt);
}
Challenges and Solutions
Challenge 1: Latency Overhead
Service boundaries add latency:
Solution: Shared Memory and Batching
// Batch requests to reduce IPC overhead
typedef struct batch_request {
request_t requests[BATCH_SIZE];
uint32_t count;
} batch_request_t;
response_t* call_service_batched(service_id_t service,
request_t *req) {
static __thread batch_request_t batch = {0};
// Add to batch
batch.requests[batch.count++] = *req;
// Send when full or timeout
if (batch.count >= BATCH_SIZE || batch_timeout_expired()) {
response_t *responses = ipc_call_batch(service, &batch);
response_t *my_response = &responses[batch.count - 1];
batch.count = 0;
return my_response;
}
// Wait for batch to fill (or timeout)
return wait_for_batch_response();
}
Challenge 2: Data Consistency
Services have separate data:
Solution: Event Sourcing
typedef struct event {
event_type_t type;
uint64_t sequence_number;
timestamp_t timestamp;
char data[EVENT_DATA_SIZE];
} event_t;
// Append-only event log
typedef struct event_log {
int fd;
atomic_uint64_t sequence;
} event_log_t;
void append_event(event_log_t *log, event_type_t type,
const void *data, size_t size) {
event_t event = {
.type = type,
.sequence_number = atomic_fetch_add(&log->sequence, 1),
.timestamp = time_us()
};
memcpy(event.data, data, size);
// Append to log (durable)
write(log->fd, &event, sizeof(event));
fsync(log->fd);
// Publish to subscribers
publish_event(&event);
}
// Services consume events to build their views
void stats_service_event_handler(event_t *event) {
switch (event->type) {
case EVENT_FLOW_CREATED:
create_stats_entry(event->data);
break;
case EVENT_PACKET_PROCESSED:
update_packet_stats(event->data);
break;
case EVENT_FLOW_DELETED:
delete_stats_entry(event->data);
break;
}
}
Challenge 3: Operational Complexity
More services = more complexity:
Solution: Service Orchestration
# docker-compose.yml style configuration
services:
flow_manager:
replicas: 3
resources:
cpus: 2
memory: 4GB
health_check:
interval: 5s
timeout: 2s
policy_engine:
replicas: 1
resources:
cpus: 1
memory: 2GB
depends_on:
- config_service
stats_service:
replicas: 2
resources:
cpus: 1
memory: 4GB
storage: ssd
When Not to Use Microservices
Microservices arenβt always appropriate:
Donβt use microservices if:
- Latency is critical: Service boundaries add overhead
- Team is small: Operational complexity outweighs benefits
- System is simple: Microservices add unnecessary complexity
- No clear boundaries: Forced decomposition creates tight coupling
For FC-Redirect, hybrid approach:
- Core packet processing: Monolith (latency-critical)
- Supporting services: Microservices (stats, monitoring, config)
ββββββββββββββββββββββββββββββββββ
β Fast Path (Monolithic Core) β
β ββββββββββββββββββββββββββββ β
β β Packet Processing β β
β β Flow Lookup β β
β β Policy Application β β
β ββββββββββββββββββββββββββββ β
βββββββββββββ¬βββββββββββββββββββββ
β
ββββββββ΄βββββββ
β β
ββββββΌββββββ βββββΌβββββββ βββββββββββ
βStats Svc β βMonitor β βConfig β
ββββββββββββ ββββββββββββ βββββββββββ
Microservices (Non-Critical)
Lessons Learned
After six months of experimenting with microservices:
-
Start monolithic, decompose later: Donβt start with microservices. Let service boundaries emerge.
-
Service boundaries are hard: Bad boundaries create more problems than monoliths.
-
IPC performance matters: Shared memory IPC is orders of magnitude faster than network RPC.
-
Observability is critical: More services = more complexity. Invest in monitoring.
-
Operational tooling is essential: Without good tooling, microservices become operational nightmares.
Looking Forward
Microservices patterns will increasingly influence infrastructure software. The benefits (independent scaling, deployment, failure isolation) are compelling for complex systems.
For FC-Redirect, weβre adopting a hybrid approach: monolithic core for the critical path, microservices for supporting functionality. This balances performance with modularity.
The key is pragmatism. Use microservices where they provide clear value, but donβt force them everywhere. Sometimes a well-designed monolith is the right answer.
As our industry continues evolving toward containerized, distributed architectures, understanding microservices patterns becomes increasingly important. Not because we should use them everywhere, but because they represent valuable thinking about modularity, boundaries, and system design.
The future of infrastructure is distributed. Microservices give us patterns to manage that complexity effectively.