API gateways serve as the edge layer between external clients and internal microservices, consolidating cross-cutting concerns while enabling independent service evolution. The architectural decisions around gateway topology, routing patterns, and responsibility distribution fundamentally shape system complexity, performance characteristics, and operational flexibility.

The Gateway Problem Space

In distributed microservices architectures, clients face several challenges when interacting directly with services. Each service exposes different protocols, authentication mechanisms, and API contracts. Network chattiness increases as clients make multiple round-trips to compose responses. Security concerns multiply as each service must implement authentication, authorization, and rate limiting independently.

API gateways address these challenges by providing a unified entry point, but this consolidation creates new architectural decisions about responsibility placement, scaling patterns, and failure mode handling.

Gateway Topology Patterns

The physical deployment topology of gateways significantly impacts system behavior under various conditions.

Centralized Gateway

A single gateway instance (or horizontally scaled cluster) handles all ingress traffic, routing to backend services based on request characteristics.

# Centralized gateway topology
topology:
  gateway:
    deployment: cluster
    replicas: auto-scaled
    min: 3
    max: 50
    metrics:
      - type: cpu
        target: 70
      - type: requests-per-second
        target: 10000

  routing:
    - path: /api/users/*
      backend: user-service.internal:8080
      timeout: 5s
      retry: 3

    - path: /api/orders/*
      backend: order-service.internal:8080
      timeout: 10s
      retry: 2

Architectural implications: Centralized gateways simplify network topology and consolidate observability, but create single failure domains. All traffic flows through the gateway, making it a critical path component. Gateway outages or performance degradation affect all services behind it.

The centralized pattern works well for moderate scale systems where operational simplicity outweighs concerns about single points of failure. Load balancing across gateway instances provides horizontal scaling, but resource contention—CPU, memory, network bandwidth—eventually limits throughput.

Gateway per Service Team

Deploy separate gateway instances per service team or bounded context, allowing teams independent control over their edge layer.

# Team-based gateway topology
gateways:
  - name: user-gateway
    team: identity
    paths: [/api/users/*, /api/auth/*]
    backends: [user-service, auth-service]
    policies:
      rate_limit: 1000/min per client
      authentication: oauth2

  - name: commerce-gateway
    team: commerce
    paths: [/api/orders/*, /api/payments/*]
    backends: [order-service, payment-service]
    policies:
      rate_limit: 500/min per client
      authentication: api_key

Trade-offs: This pattern provides team autonomy and failure isolation—problems in one gateway don’t cascade to others. However, it introduces operational complexity. Multiple gateway deployments require coordination on cross-cutting concerns like authentication token validation, observability standards, and client SDK patterns.

Backend for Frontend (BFF)

Create specialized gateways per client type—web, mobile, partner APIs—each optimized for specific client needs.

Architectural characteristics: BFFs allow optimization per client type. Mobile clients receive minimal payloads to reduce data transfer. Web clients get denormalized responses to minimize round-trips. Partner APIs expose stable contracts while internal services evolve freely.

The cost is additional gateway deployments and potential code duplication across BFFs. Teams must balance optimization benefits against operational overhead of maintaining multiple gateways.

Routing Strategies

Gateways route requests to backend services using various strategies, each with distinct behavioral characteristics.

Path-Based Routing

Route based on URL path patterns, the most common approach.

# Path-based routing configuration
routing:
  rules:
    - match:
        path:
          prefix: /api/v1/users
      route:
        destination: user-service-v1
        rewrite: /users

    - match:
        path:
          prefix: /api/v2/users
      route:
        destination: user-service-v2
        rewrite: /users

Trade-offs: Path-based routing is simple and predictable. Clients understand URL structure maps to services. However, this approach couples API structure to service decomposition. Refactoring services requires client-visible URL changes unless careful path rewriting maintains backward compatibility.

Header-Based Routing

Route based on request headers, enabling feature flags, A/B testing, and canary deployments.

# Header-based routing for experimentation
routing:
  rules:
    - match:
        headers:
          X-Feature-Flag: new-checkout
      route:
        destination: checkout-service-v2
        weight: 100

    - match:
        headers:
          X-Client-Version: ^1\..*
      route:
        destination: legacy-api-gateway

    - default:
        destination: checkout-service-v1

Architectural implications: Header-based routing decouples deployment from user-visible changes. Teams deploy new service versions, gradually shifting traffic using headers. This enables sophisticated release strategies—canary releases, blue-green deployments, per-user feature flags.

The complexity emerges in state management. If routing changes mid-session, users might see inconsistent behavior. Sessions must be sticky to specific service versions, or services must handle version transitions gracefully.

Authentication and Authorization

Gateways typically handle authentication, but authorization placement remains a nuanced architectural decision.

Gateway Authentication with Backend Authorization

Gateway validates identity; services make authorization decisions.

# Authentication at gateway
authentication:
  provider: oauth2
  token_validation:
    jwks_uri: https://auth.example.com/.well-known/jwks.json
    cache_duration: 5m
    algorithms: [RS256]

  claims_to_headers:
    - claim: sub
      header: X-User-ID
    - claim: email
      header: X-User-Email
    - claim: tenant_id
      header: X-Tenant-ID

Architectural rationale: Authentication is cross-cutting—every request needs identity validation. Centralizing at the gateway prevents duplication across services. Authorization, however, is often domain-specific. The order service understands order access rules; the user service knows profile visibility policies.

This separation allows services to evolve authorization logic independently while maintaining centralized authentication. Services receive authenticated user context via headers and make contextual authorization decisions.

Policy-Based Authorization at Gateway

Define authorization policies declaratively at gateway layer.

# OPA-based authorization at gateway
authorization:
  engine: open-policy-agent
  policies:
    - name: require-admin-for-delete
      rule: |
        allow {
          input.method == "DELETE"
          input.user.roles[_] == "admin"
        }

    - name: user-access-own-data
      rule: |
        allow {
          input.path == "/api/users/:user_id"
          input.user.id == input.params.user_id
        }

Trade-offs: Centralized authorization simplifies security auditing and enables consistent policy enforcement. However, it requires encoding domain logic in gateway policies. As policies grow complex, the gateway becomes coupled to business rules better handled in services.

This pattern works well for coarse-grained authorization—role-based access, tenant isolation, API quota enforcement. Fine-grained authorization—can this user edit this specific order—belongs in services with domain context.

Rate Limiting Strategies

Rate limiting protects backend services from overload and ensures fair resource allocation across clients.

Client-Based Rate Limiting

Limit requests per client identifier—API key, user ID, IP address.

# Client-based rate limiting
rate_limits:
  - identifier: api_key
    limits:
      - period: 1m
        requests: 1000
      - period: 1h
        requests: 50000

  - identifier: user_id
    limits:
      - period: 1m
        requests: 100
      - period: 1h
        requests: 5000

Architectural considerations: Client-based limiting requires persistent storage to track request counts across gateway instances. Redis commonly serves this purpose, providing atomic increment operations and expiration for time windows.

The gateway becomes stateful through its dependency on Redis. Gateway instances share state, allowing horizontal scaling, but Redis becomes a critical dependency. Redis unavailability could force fail-open (no rate limiting) or fail-closed (reject all requests) decisions.

Endpoint-Based Rate Limiting

Limit requests per API endpoint to protect specific services.

# Endpoint-based rate limiting
rate_limits:
  - endpoint: /api/search
    backend_capacity: 1000/s
    limits:
      - period: 1s
        requests: 800  # 80% of capacity
    queue_excess: true

  - endpoint: /api/reports/generate
    backend_capacity: 10/s
    limits:
      - period: 1s
        requests: 8
    reject_excess: true

Trade-offs: Endpoint limiting protects expensive operations. Report generation, search queries, and data exports often consume disproportionate resources. Limiting these endpoints prevents resource exhaustion from affecting other operations.

The challenge is setting appropriate limits. Backend capacity varies with load, data characteristics, and infrastructure health. Static limits might be too conservative during low load or too permissive during degradation.

Response Aggregation

Gateways can aggregate multiple backend calls into single client responses, reducing network round-trips.

REST Aggregation Endpoints

Create specialized endpoints that aggregate backend calls.

# REST aggregation endpoint
endpoints:
  - path: /api/dashboard
    method: GET
    aggregation:
      parallel:
        - name: user_profile
          service: user-service
          path: /users/{user_id}
          timeout: 1s

        - name: recent_orders
          service: order-service
          path: /orders?user_id={user_id}&limit=10
          timeout: 2s

        - name: recommendations
          service: recommendation-service
          path: /recommendations/{user_id}
          timeout: 3s
          optional: true

Trade-offs: REST aggregation reduces client round-trips but makes gateway responsible for composition logic. The gateway must handle partial failures—what if recommendations timeout but other calls succeed?

The pattern works well for specific high-traffic use cases where performance justifies maintenance cost. Creating aggregation endpoints for every possible data combination becomes unsustainable.

Caching Strategies

Gateways cache responses to reduce backend load and improve latency.

Response Caching

Cache complete responses based on request characteristics.

# Response caching configuration
caching:
  backend: redis-cluster
  policies:
    - paths: [/api/products/*, /api/categories/*]
      ttl: 5m
      cache_key: path + query_params
      invalidate_on:
        - header: Cache-Control
          value: no-cache

    - paths: [/api/users/*/profile]
      ttl: 1m
      cache_key: path + X-User-ID header
      vary_by: [Accept-Language]

Architectural considerations: Caching introduces consistency challenges. Cached responses become stale as backend state changes. Cache invalidation strategies—time-based expiration, event-driven purging—determine consistency guarantees.

Time-based caching accepts bounded staleness. Product catalogs cached for 5 minutes might show outdated inventory, but the performance benefit justifies eventual consistency.

Observability Integration

Gateways provide natural observability points for request/response patterns.

Distributed Tracing

Initiate traces at gateway, propagating context to backend services.

# Tracing configuration
tracing:
  provider: opentelemetry
  sampling_rate: 0.1  # 10% of requests

  propagation:
    inject_headers:
      - traceparent
      - tracestate

  attributes:
    - http.method
    - http.url
    - http.status_code
    - user.id
    - tenant.id

  export:
    endpoint: otel-collector:4317
    batch_size: 512

Architectural implications: Gateways start the distributed trace, creating parent spans for each request. Backend services create child spans, building complete request trees.

This provides end-to-end visibility into request flow—time spent in gateway, network latency to services, service processing time, database queries. Performance bottlenecks become visible through trace analysis.

Failure Handling Patterns

Gateways must handle backend failures gracefully, preventing cascading failures.

Circuit Breaker

Prevent calling unhealthy backends, failing fast instead of waiting for timeouts.

# Circuit breaker configuration
circuit_breaker:
  - backend: recommendation-service
    failure_threshold: 50%
    evaluation_window: 10s
    open_duration: 30s
    half_open_requests: 3

    fallback:
      type: static_response
      status: 200
      body: {"recommendations": []}

Trade-offs: Circuit breakers prevent resource exhaustion from repeatedly calling failing services. Instead of waiting seconds for timeouts, gateways fail fast when circuits open.

The challenge is tuning thresholds. Too sensitive causes false positives during transient errors. Too lenient allows prolonged degradation before opening.

Retry Logic

Retry failed requests with exponential backoff.

# Retry configuration
retries:
  - backends: [user-service, order-service]
    max_attempts: 3
    backoff: exponential
    initial_delay: 100ms
    max_delay: 2s
    retry_on:
      - connection_error
      - timeout
      - http_5xx

  - backends: [payment-service]
    max_attempts: 1  # Don't retry non-idempotent operations

Architectural considerations: Retries improve reliability for transient failures but can amplify load during outages. If a backend is overloaded, retries worsen the problem.

Idempotency is critical. Retrying payment operations without idempotency keys causes duplicate charges. Services must design for retries or gateways must avoid retrying non-idempotent operations.

Conclusion

API gateway architecture involves continuous trade-offs between centralization and distribution, performance and flexibility, simplicity and sophistication. Successful gateway strategies start minimal—basic routing and authentication—then evolve toward aggregation, caching, and advanced traffic management as needs emerge.

The most resilient gateway architectures treat the gateway as infrastructure, not application logic. Complex business rules belong in services. Gateways handle cross-cutting concerns—authentication, rate limiting, observability—while delegating domain decisions to services with appropriate context.

Organizations building gateway layers benefit from platform thinking. Providing teams shared gateway frameworks with sensible defaults reduces inconsistency while allowing customization for specific needs. This balance between standardization and flexibility determines long-term maintainability of the edge layer.