API gateways serve as the edge layer between external clients and internal microservices, consolidating cross-cutting concerns while enabling independent service evolution. The architectural decisions around gateway topology, routing patterns, and responsibility distribution fundamentally shape system complexity, performance characteristics, and operational flexibility.
The Gateway Problem Space
In distributed microservices architectures, clients face several challenges when interacting directly with services. Each service exposes different protocols, authentication mechanisms, and API contracts. Network chattiness increases as clients make multiple round-trips to compose responses. Security concerns multiply as each service must implement authentication, authorization, and rate limiting independently.
API gateways address these challenges by providing a unified entry point, but this consolidation creates new architectural decisions about responsibility placement, scaling patterns, and failure mode handling.
Gateway Topology Patterns
The physical deployment topology of gateways significantly impacts system behavior under various conditions.
Centralized Gateway
A single gateway instance (or horizontally scaled cluster) handles all ingress traffic, routing to backend services based on request characteristics.
# Centralized gateway topology
topology:
gateway:
deployment: cluster
replicas: auto-scaled
min: 3
max: 50
metrics:
- type: cpu
target: 70
- type: requests-per-second
target: 10000
routing:
- path: /api/users/*
backend: user-service.internal:8080
timeout: 5s
retry: 3
- path: /api/orders/*
backend: order-service.internal:8080
timeout: 10s
retry: 2
Architectural implications: Centralized gateways simplify network topology and consolidate observability, but create single failure domains. All traffic flows through the gateway, making it a critical path component. Gateway outages or performance degradation affect all services behind it.
The centralized pattern works well for moderate scale systems where operational simplicity outweighs concerns about single points of failure. Load balancing across gateway instances provides horizontal scaling, but resource contentionâCPU, memory, network bandwidthâeventually limits throughput.
Gateway per Service Team
Deploy separate gateway instances per service team or bounded context, allowing teams independent control over their edge layer.
# Team-based gateway topology
gateways:
- name: user-gateway
team: identity
paths: [/api/users/*, /api/auth/*]
backends: [user-service, auth-service]
policies:
rate_limit: 1000/min per client
authentication: oauth2
- name: commerce-gateway
team: commerce
paths: [/api/orders/*, /api/payments/*]
backends: [order-service, payment-service]
policies:
rate_limit: 500/min per client
authentication: api_key
Trade-offs: This pattern provides team autonomy and failure isolationâproblems in one gateway donât cascade to others. However, it introduces operational complexity. Multiple gateway deployments require coordination on cross-cutting concerns like authentication token validation, observability standards, and client SDK patterns.
Backend for Frontend (BFF)
Create specialized gateways per client typeâweb, mobile, partner APIsâeach optimized for specific client needs.
Architectural characteristics: BFFs allow optimization per client type. Mobile clients receive minimal payloads to reduce data transfer. Web clients get denormalized responses to minimize round-trips. Partner APIs expose stable contracts while internal services evolve freely.
The cost is additional gateway deployments and potential code duplication across BFFs. Teams must balance optimization benefits against operational overhead of maintaining multiple gateways.
Routing Strategies
Gateways route requests to backend services using various strategies, each with distinct behavioral characteristics.
Path-Based Routing
Route based on URL path patterns, the most common approach.
# Path-based routing configuration
routing:
rules:
- match:
path:
prefix: /api/v1/users
route:
destination: user-service-v1
rewrite: /users
- match:
path:
prefix: /api/v2/users
route:
destination: user-service-v2
rewrite: /users
Trade-offs: Path-based routing is simple and predictable. Clients understand URL structure maps to services. However, this approach couples API structure to service decomposition. Refactoring services requires client-visible URL changes unless careful path rewriting maintains backward compatibility.
Header-Based Routing
Route based on request headers, enabling feature flags, A/B testing, and canary deployments.
# Header-based routing for experimentation
routing:
rules:
- match:
headers:
X-Feature-Flag: new-checkout
route:
destination: checkout-service-v2
weight: 100
- match:
headers:
X-Client-Version: ^1\..*
route:
destination: legacy-api-gateway
- default:
destination: checkout-service-v1
Architectural implications: Header-based routing decouples deployment from user-visible changes. Teams deploy new service versions, gradually shifting traffic using headers. This enables sophisticated release strategiesâcanary releases, blue-green deployments, per-user feature flags.
The complexity emerges in state management. If routing changes mid-session, users might see inconsistent behavior. Sessions must be sticky to specific service versions, or services must handle version transitions gracefully.
Authentication and Authorization
Gateways typically handle authentication, but authorization placement remains a nuanced architectural decision.
Gateway Authentication with Backend Authorization
Gateway validates identity; services make authorization decisions.
# Authentication at gateway
authentication:
provider: oauth2
token_validation:
jwks_uri: https://auth.example.com/.well-known/jwks.json
cache_duration: 5m
algorithms: [RS256]
claims_to_headers:
- claim: sub
header: X-User-ID
- claim: email
header: X-User-Email
- claim: tenant_id
header: X-Tenant-ID
Architectural rationale: Authentication is cross-cuttingâevery request needs identity validation. Centralizing at the gateway prevents duplication across services. Authorization, however, is often domain-specific. The order service understands order access rules; the user service knows profile visibility policies.
This separation allows services to evolve authorization logic independently while maintaining centralized authentication. Services receive authenticated user context via headers and make contextual authorization decisions.
Policy-Based Authorization at Gateway
Define authorization policies declaratively at gateway layer.
# OPA-based authorization at gateway
authorization:
engine: open-policy-agent
policies:
- name: require-admin-for-delete
rule: |
allow {
input.method == "DELETE"
input.user.roles[_] == "admin"
}
- name: user-access-own-data
rule: |
allow {
input.path == "/api/users/:user_id"
input.user.id == input.params.user_id
}
Trade-offs: Centralized authorization simplifies security auditing and enables consistent policy enforcement. However, it requires encoding domain logic in gateway policies. As policies grow complex, the gateway becomes coupled to business rules better handled in services.
This pattern works well for coarse-grained authorizationârole-based access, tenant isolation, API quota enforcement. Fine-grained authorizationâcan this user edit this specific orderâbelongs in services with domain context.
Rate Limiting Strategies
Rate limiting protects backend services from overload and ensures fair resource allocation across clients.
Client-Based Rate Limiting
Limit requests per client identifierâAPI key, user ID, IP address.
# Client-based rate limiting
rate_limits:
- identifier: api_key
limits:
- period: 1m
requests: 1000
- period: 1h
requests: 50000
- identifier: user_id
limits:
- period: 1m
requests: 100
- period: 1h
requests: 5000
Architectural considerations: Client-based limiting requires persistent storage to track request counts across gateway instances. Redis commonly serves this purpose, providing atomic increment operations and expiration for time windows.
The gateway becomes stateful through its dependency on Redis. Gateway instances share state, allowing horizontal scaling, but Redis becomes a critical dependency. Redis unavailability could force fail-open (no rate limiting) or fail-closed (reject all requests) decisions.
Endpoint-Based Rate Limiting
Limit requests per API endpoint to protect specific services.
# Endpoint-based rate limiting
rate_limits:
- endpoint: /api/search
backend_capacity: 1000/s
limits:
- period: 1s
requests: 800 # 80% of capacity
queue_excess: true
- endpoint: /api/reports/generate
backend_capacity: 10/s
limits:
- period: 1s
requests: 8
reject_excess: true
Trade-offs: Endpoint limiting protects expensive operations. Report generation, search queries, and data exports often consume disproportionate resources. Limiting these endpoints prevents resource exhaustion from affecting other operations.
The challenge is setting appropriate limits. Backend capacity varies with load, data characteristics, and infrastructure health. Static limits might be too conservative during low load or too permissive during degradation.
Response Aggregation
Gateways can aggregate multiple backend calls into single client responses, reducing network round-trips.
REST Aggregation Endpoints
Create specialized endpoints that aggregate backend calls.
# REST aggregation endpoint
endpoints:
- path: /api/dashboard
method: GET
aggregation:
parallel:
- name: user_profile
service: user-service
path: /users/{user_id}
timeout: 1s
- name: recent_orders
service: order-service
path: /orders?user_id={user_id}&limit=10
timeout: 2s
- name: recommendations
service: recommendation-service
path: /recommendations/{user_id}
timeout: 3s
optional: true
Trade-offs: REST aggregation reduces client round-trips but makes gateway responsible for composition logic. The gateway must handle partial failuresâwhat if recommendations timeout but other calls succeed?
The pattern works well for specific high-traffic use cases where performance justifies maintenance cost. Creating aggregation endpoints for every possible data combination becomes unsustainable.
Caching Strategies
Gateways cache responses to reduce backend load and improve latency.
Response Caching
Cache complete responses based on request characteristics.
# Response caching configuration
caching:
backend: redis-cluster
policies:
- paths: [/api/products/*, /api/categories/*]
ttl: 5m
cache_key: path + query_params
invalidate_on:
- header: Cache-Control
value: no-cache
- paths: [/api/users/*/profile]
ttl: 1m
cache_key: path + X-User-ID header
vary_by: [Accept-Language]
Architectural considerations: Caching introduces consistency challenges. Cached responses become stale as backend state changes. Cache invalidation strategiesâtime-based expiration, event-driven purgingâdetermine consistency guarantees.
Time-based caching accepts bounded staleness. Product catalogs cached for 5 minutes might show outdated inventory, but the performance benefit justifies eventual consistency.
Observability Integration
Gateways provide natural observability points for request/response patterns.
Distributed Tracing
Initiate traces at gateway, propagating context to backend services.
# Tracing configuration
tracing:
provider: opentelemetry
sampling_rate: 0.1 # 10% of requests
propagation:
inject_headers:
- traceparent
- tracestate
attributes:
- http.method
- http.url
- http.status_code
- user.id
- tenant.id
export:
endpoint: otel-collector:4317
batch_size: 512
Architectural implications: Gateways start the distributed trace, creating parent spans for each request. Backend services create child spans, building complete request trees.
This provides end-to-end visibility into request flowâtime spent in gateway, network latency to services, service processing time, database queries. Performance bottlenecks become visible through trace analysis.
Failure Handling Patterns
Gateways must handle backend failures gracefully, preventing cascading failures.
Circuit Breaker
Prevent calling unhealthy backends, failing fast instead of waiting for timeouts.
# Circuit breaker configuration
circuit_breaker:
- backend: recommendation-service
failure_threshold: 50%
evaluation_window: 10s
open_duration: 30s
half_open_requests: 3
fallback:
type: static_response
status: 200
body: {"recommendations": []}
Trade-offs: Circuit breakers prevent resource exhaustion from repeatedly calling failing services. Instead of waiting seconds for timeouts, gateways fail fast when circuits open.
The challenge is tuning thresholds. Too sensitive causes false positives during transient errors. Too lenient allows prolonged degradation before opening.
Retry Logic
Retry failed requests with exponential backoff.
# Retry configuration
retries:
- backends: [user-service, order-service]
max_attempts: 3
backoff: exponential
initial_delay: 100ms
max_delay: 2s
retry_on:
- connection_error
- timeout
- http_5xx
- backends: [payment-service]
max_attempts: 1 # Don't retry non-idempotent operations
Architectural considerations: Retries improve reliability for transient failures but can amplify load during outages. If a backend is overloaded, retries worsen the problem.
Idempotency is critical. Retrying payment operations without idempotency keys causes duplicate charges. Services must design for retries or gateways must avoid retrying non-idempotent operations.
Conclusion
API gateway architecture involves continuous trade-offs between centralization and distribution, performance and flexibility, simplicity and sophistication. Successful gateway strategies start minimalâbasic routing and authenticationâthen evolve toward aggregation, caching, and advanced traffic management as needs emerge.
The most resilient gateway architectures treat the gateway as infrastructure, not application logic. Complex business rules belong in services. Gateways handle cross-cutting concernsâauthentication, rate limiting, observabilityâwhile delegating domain decisions to services with appropriate context.
Organizations building gateway layers benefit from platform thinking. Providing teams shared gateway frameworks with sensible defaults reduces inconsistency while allowing customization for specific needs. This balance between standardization and flexibility determines long-term maintainability of the edge layer.