Envoy Proxy Deep Dive: The Foundation of Modern Service Mesh

Envoy has emerged as the de facto standard for service mesh data planes, powering Istio, AWS App Mesh, and many custom implementations. Understanding how Envoy works is essential for anyone operating modern microservices architectures. This post explores Envoy’s architecture, configuration patterns, and real-world usage.

Why Envoy?

Before Envoy, proxies like NGINX and HAProxy dominated. So what makes Envoy special?

HTTP/2 and gRPC native: First-class support for modern protocols Dynamic configuration: Configuration updates without restarts Advanced load balancing: Multiple algorithms with health checking Observability: Rich metrics, logging, and tracing out-of-the-box Extensibility: Filter chain architecture for customization Modern architecture: Built for cloud-native environments

Envoy Architecture

Envoy operates as a Layer 7 proxy with a sophisticated architecture:

┌─────────────────────────────────────────┐
│           Downstream (Client)            │
└──────────────────┬──────────────────────┘
                   │
┌──────────────────▼──────────────────────┐
│         Listener (0.0.0.0:80)           │
│  ┌────────────────────────────────────┐ │
│  │        Filter Chain                │ │
│  │  ┌──────────────────────────────┐  │ │
│  │  │  HTTP Connection Manager    │  │ │
│  │  │    ┌──────────────────────┐  │  │ │
│  │  │    │   HTTP Filters       │  │  │ │
│  │  │    │  - Router           │  │  │ │
│  │  │    │  - JWT Auth         │  │  │ │
│  │  │    │  - Rate Limit       │  │  │ │
│  │  │    └──────────────────────┘  │  │ │
│  │  └──────────────────────────────┘  │ │
│  └────────────────────────────────────┘ │
└──────────────────┬──────────────────────┘
                   │
┌──────────────────▼──────────────────────┐
│            Cluster Manager              │
│  ┌──────────────┐  ┌──────────────┐    │
│  │   Cluster A  │  │   Cluster B  │    │
│  │              │  │              │    │
│  │  Endpoint 1  │  │  Endpoint 1  │    │
│  │  Endpoint 2  │  │  Endpoint 2  │    │
│  └──────────────┘  └──────────────┘    │
└──────────────────┬──────────────────────┘
                   │
┌──────────────────▼──────────────────────┐
│         Upstream (Backend)              │
└─────────────────────────────────────────┘

Basic Configuration

Envoy uses YAML or JSON configuration. Here’s a minimal example:

static_resources:
  listeners:
    - name: main_listener
      address:
        socket_address:
          address: 0.0.0.0
          port_value: 10000
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: ingress_http
                codec_type: AUTO
                route_config:
                  name: local_route
                  virtual_hosts:
                    - name: backend
                      domains: ["*"]
                      routes:
                        - match:
                            prefix: "/"
                          route:
                            cluster: backend_service
                http_filters:
                  - name: envoy.filters.http.router

  clusters:
    - name: backend_service
      connect_timeout: 5s
      type: STRICT_DNS
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: backend_service
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: backend
                      port_value: 8080

Dynamic Configuration with xDS

The real power of Envoy is dynamic configuration through the xDS APIs:

LDS (Listener Discovery Service): Discover listeners
RDS (Route Discovery Service): Discover routes
CDS (Cluster Discovery Service): Discover clusters
EDS (Endpoint Discovery Service): Discover endpoints
SDS (Secret Discovery Service): Discover TLS certificates

Example dynamic cluster discovery:

node:
  id: node1
  cluster: service

dynamic_resources:
  cds_config:
    api_config_source:
      api_type: GRPC
      grpc_services:
        - envoy_grpc:
            cluster_name: xds_cluster

  lds_config:
    api_config_source:
      api_type: GRPC
      grpc_services:
        - envoy_grpc:
            cluster_name: xds_cluster

static_resources:
  clusters:
    - name: xds_cluster
      connect_timeout: 1s
      type: STRICT_DNS
      lb_policy: ROUND_ROBIN
      http2_protocol_options: {}
      load_assignment:
        cluster_name: xds_cluster
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: control-plane
                      port_value: 18000

Load Balancing Strategies

Envoy supports multiple load balancing algorithms:

clusters:
  - name: service_cluster
    type: STRICT_DNS
    # Round robin (default)
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: service_cluster
      endpoints:
        - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: service1
                    port_value: 8080
            - endpoint:
                address:
                  socket_address:
                    address: service2
                    port_value: 8080

Least request load balancing:

clusters:
  - name: service_cluster
    type: STRICT_DNS
    lb_policy: LEAST_REQUEST
    least_request_lb_config:
      choice_count: 2  # Select best of 2 random endpoints

Ring hash for consistent hashing:

clusters:
  - name: service_cluster
    type: STRICT_DNS
    lb_policy: RING_HASH
    ring_hash_lb_config:
      minimum_ring_size: 1024
      hash_function: XX_HASH

# Route configuration for hash-based routing
route_config:
  virtual_hosts:
    - name: service
      domains: ["*"]
      routes:
        - match:
            prefix: "/"
          route:
            cluster: service_cluster
            hash_policy:
              - header:
                  header_name: x-user-id

Health Checking

Active health checking:

clusters:
  - name: service_cluster
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: service_cluster
      endpoints:
        - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: service1
                    port_value: 8080
    health_checks:
      - timeout: 1s
        interval: 5s
        unhealthy_threshold: 2
        healthy_threshold: 2
        http_health_check:
          path: /health
          expected_statuses:
            - start: 200
              end: 299

Outlier detection for passive health checking:

clusters:
  - name: service_cluster
    outlier_detection:
      consecutive_5xx: 5
      interval: 10s
      base_ejection_time: 30s
      max_ejection_percent: 50
      enforcing_consecutive_5xx: 100
      enforcing_success_rate: 100
      success_rate_minimum_hosts: 5
      success_rate_request_volume: 100
      success_rate_stdev_factor: 1900

Circuit Breaking

Protect upstream services from overload:

clusters:
  - name: service_cluster
    circuit_breakers:
      thresholds:
        - priority: DEFAULT
          max_connections: 1000
          max_pending_requests: 1000
          max_requests: 1000
          max_retries: 3
        - priority: HIGH
          max_connections: 2000
          max_pending_requests: 2000
          max_requests: 2000
          max_retries: 5

Retry and Timeout Configuration

route_config:
  virtual_hosts:
    - name: backend
      domains: ["*"]
      routes:
        - match:
            prefix: "/api"
          route:
            cluster: backend_service
            timeout: 15s
            retry_policy:
              retry_on: "5xx,reset,connect-failure,refused-stream"
              num_retries: 3
              per_try_timeout: 5s
              retry_host_predicate:
                - name: envoy.retry_host_predicates.previous_hosts
              host_selection_retry_max_attempts: 5

Rate Limiting

Global rate limiting:

http_filters:
  - name: envoy.filters.http.ratelimit
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
      domain: backend_service
      rate_limit_service:
        grpc_service:
          envoy_grpc:
            cluster_name: ratelimit_service
        transport_api_version: V3

route_config:
  virtual_hosts:
    - name: backend
      domains: ["*"]
      routes:
        - match:
            prefix: "/api"
          route:
            cluster: backend_service
            rate_limits:
              - actions:
                  - request_headers:
                      header_name: x-user-id
                      descriptor_key: user_id

Local rate limiting:

http_filters:
  - name: envoy.filters.http.local_ratelimit
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
      stat_prefix: http_local_rate_limiter
      token_bucket:
        max_tokens: 100
        tokens_per_fill: 10
        fill_interval: 1s
      filter_enabled:
        runtime_key: local_rate_limit_enabled
        default_value:
          numerator: 100
          denominator: HUNDRED
      filter_enforced:
        runtime_key: local_rate_limit_enforced
        default_value:
          numerator: 100
          denominator: HUNDRED

TLS Configuration

Downstream TLS (serving HTTPS):

listeners:
  - name: https_listener
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 443
    filter_chains:
      - filters:
          - name: envoy.filters.network.http_connection_manager
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              stat_prefix: ingress_https
              route_config:
                name: local_route
                virtual_hosts:
                  - name: backend
                    domains: ["*"]
                    routes:
                      - match:
                          prefix: "/"
                        route:
                          cluster: backend_service
              http_filters:
                - name: envoy.filters.http.router
        transport_socket:
          name: envoy.transport_sockets.tls
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
            common_tls_context:
              tls_certificates:
                - certificate_chain:
                    filename: "/etc/envoy/certs/server.crt"
                  private_key:
                    filename: "/etc/envoy/certs/server.key"

Upstream TLS (connecting to HTTPS backends):

clusters:
  - name: backend_service
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: backend_service
      endpoints:
        - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: backend
                    port_value: 443
    transport_socket:
      name: envoy.transport_sockets.tls
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
        common_tls_context:
          validation_context:
            trusted_ca:
              filename: "/etc/envoy/certs/ca.crt"

Observability

Metrics:

admin:
  access_log_path: /tmp/admin_access.log
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 9901

stats_sinks:
  - name: envoy.stat_sinks.statsd
    typed_config:
      "@type": type.googleapis.com/envoy.config.metrics.v3.StatsdSink
      tcp_cluster_name: statsd_cluster
      prefix: envoy

Access logging:

http_filters:
  - name: envoy.filters.network.http_connection_manager
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
      access_log:
        - name: envoy.access_loggers.file
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
            path: "/var/log/envoy/access.log"
            log_format:
              json_format:
                start_time: "%START_TIME%"
                method: "%REQ(:METHOD)%"
                path: "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%"
                protocol: "%PROTOCOL%"
                response_code: "%RESPONSE_CODE%"
                response_flags: "%RESPONSE_FLAGS%"
                bytes_received: "%BYTES_RECEIVED%"
                bytes_sent: "%BYTES_SENT%"
                duration: "%DURATION%"
                upstream_service_time: "%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%"
                x_forwarded_for: "%REQ(X-FORWARDED-FOR)%"
                user_agent: "%REQ(USER-AGENT)%"
                request_id: "%REQ(X-REQUEST-ID)%"
                authority: "%REQ(:AUTHORITY)%"
                upstream_host: "%UPSTREAM_HOST%"

Distributed tracing:

tracing:
  http:
    name: envoy.tracers.zipkin
    typed_config:
      "@type": type.googleapis.com/envoy.config.trace.v3.ZipkinConfig
      collector_cluster: jaeger
      collector_endpoint: "/api/v2/spans"
      collector_endpoint_version: HTTP_JSON

http_connection_manager:
  tracing:
    provider:
      name: envoy.tracers.zipkin
    random_sampling:
      value: 100.0  # Trace 100% of requests

Custom Filters

Extend Envoy with Lua:

http_filters:
  - name: envoy.filters.http.lua
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua
      inline_code: |
        function envoy_on_request(request_handle)
          -- Add custom header
          request_handle:headers():add("x-custom-header", "value")

          -- Log request
          request_handle:logInfo("Processing request: " .. request_handle:headers():get(":path"))

          -- Check auth
          local auth = request_handle:headers():get("authorization")
          if auth == nil then
            request_handle:respond(
              {[":status"] = "401"},
              "Unauthorized"
            )
          end
        end

        function envoy_on_response(response_handle)
          -- Modify response
          response_handle:headers():add("x-processed-by", "envoy")
        end
  - name: envoy.filters.http.router

Running Envoy

Docker deployment:

# docker-compose.yml
version: '3'
services:
  envoy:
    image: envoyproxy/envoy:v1.24-latest
    ports:
      - "10000:10000"
      - "9901:9901"
    volumes:
      - ./envoy.yaml:/etc/envoy/envoy.yaml
      - ./certs:/etc/envoy/certs
    command: ["-c", "/etc/envoy/envoy.yaml", "--log-level", "info"]

  backend:
    image: nginx:alpine
    ports:
      - "8080:80"

Kubernetes deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: envoy
spec:
  replicas: 3
  selector:
    matchLabels:
      app: envoy
  template:
    metadata:
      labels:
        app: envoy
    spec:
      containers:
        - name: envoy
          image: envoyproxy/envoy:v1.24-latest
          ports:
            - containerPort: 10000
              name: http
            - containerPort: 9901
              name: admin
          volumeMounts:
            - name: config
              mountPath: /etc/envoy
          command:
            - /usr/local/bin/envoy
            - -c
            - /etc/envoy/envoy.yaml
            - --log-level
            - info
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 1000m
              memory: 512Mi
      volumes:
        - name: config
          configMap:
            name: envoy-config

Performance Tuning

Worker threads:

static_resources:
  listeners:
    - name: listener_0
      address:
        socket_address:
          address: 0.0.0.0
          port_value: 10000
      per_connection_buffer_limit_bytes: 32768

# Set via command line
# --concurrency 4  # Number of worker threads

Connection pooling:

clusters:
  - name: backend_service
    connect_timeout: 5s
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    circuit_breakers:
      thresholds:
        - max_connections: 1000
    http2_protocol_options:
      max_concurrent_streams: 100

Debugging

Enable debug logging:

# Start with debug logging
envoy -c envoy.yaml --log-level debug

# Or change at runtime via admin API
curl -X POST http://localhost:9901/logging?level=debug

Admin interface:

# Get stats
curl http://localhost:9901/stats

# Get config dump
curl http://localhost:9901/config_dump

# Get clusters
curl http://localhost:9901/clusters

Conclusion

Envoy’s architecture and rich feature set make it ideal for service mesh implementations. Key takeaways:

Dynamic configuration via xDS APIs enables zero-downtime updates
Rich load balancing and health checking for reliability
Circuit breaking and retries prevent cascading failures
Observability built-in with metrics, logging, and tracing
Extensibility through filters and Lua scripting

Start with static configuration to understand the basics, then move to dynamic configuration for production deployments. Whether you’re using Envoy standalone or as part of a service mesh, understanding its capabilities is essential for building resilient microservices architectures.