Envoy has emerged as the de facto standard for service mesh data planes, powering Istio, AWS App Mesh, and many custom implementations. Understanding how Envoy works is essential for anyone operating modern microservices architectures. This post explores Envoyβs architecture, configuration patterns, and real-world usage.
Why Envoy?
Before Envoy, proxies like NGINX and HAProxy dominated. So what makes Envoy special?
HTTP/2 and gRPC native: First-class support for modern protocols Dynamic configuration: Configuration updates without restarts Advanced load balancing: Multiple algorithms with health checking Observability: Rich metrics, logging, and tracing out-of-the-box Extensibility: Filter chain architecture for customization Modern architecture: Built for cloud-native environments
Envoy Architecture
Envoy operates as a Layer 7 proxy with a sophisticated architecture:
βββββββββββββββββββββββββββββββββββββββββββ
β Downstream (Client) β
ββββββββββββββββββββ¬βββββββββββββββββββββββ
β
ββββββββββββββββββββΌβββββββββββββββββββββββ
β Listener (0.0.0.0:80) β
β ββββββββββββββββββββββββββββββββββββββ β
β β Filter Chain β β
β β ββββββββββββββββββββββββββββββββ β β
β β β HTTP Connection Manager β β β
β β β ββββββββββββββββββββββββ β β β
β β β β HTTP Filters β β β β
β β β β - Router β β β β
β β β β - JWT Auth β β β β
β β β β - Rate Limit β β β β
β β β ββββββββββββββββββββββββ β β β
β β ββββββββββββββββββββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββ¬βββββββββββββββββββββββ
β
ββββββββββββββββββββΌβββββββββββββββββββββββ
β Cluster Manager β
β ββββββββββββββββ ββββββββββββββββ β
β β Cluster A β β Cluster B β β
β β β β β β
β β Endpoint 1 β β Endpoint 1 β β
β β Endpoint 2 β β Endpoint 2 β β
β ββββββββββββββββ ββββββββββββββββ β
ββββββββββββββββββββ¬βββββββββββββββββββββββ
β
ββββββββββββββββββββΌβββββββββββββββββββββββ
β Upstream (Backend) β
βββββββββββββββββββββββββββββββββββββββββββ
Basic Configuration
Envoy uses YAML or JSON configuration. Hereβs a minimal example:
static_resources:
listeners:
- name: main_listener
address:
socket_address:
address: 0.0.0.0
port_value: 10000
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
codec_type: AUTO
route_config:
name: local_route
virtual_hosts:
- name: backend
domains: ["*"]
routes:
- match:
prefix: "/"
route:
cluster: backend_service
http_filters:
- name: envoy.filters.http.router
clusters:
- name: backend_service
connect_timeout: 5s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: backend_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: backend
port_value: 8080
Dynamic Configuration with xDS
The real power of Envoy is dynamic configuration through the xDS APIs:
- LDS (Listener Discovery Service): Discover listeners
- RDS (Route Discovery Service): Discover routes
- CDS (Cluster Discovery Service): Discover clusters
- EDS (Endpoint Discovery Service): Discover endpoints
- SDS (Secret Discovery Service): Discover TLS certificates
Example dynamic cluster discovery:
node:
id: node1
cluster: service
dynamic_resources:
cds_config:
api_config_source:
api_type: GRPC
grpc_services:
- envoy_grpc:
cluster_name: xds_cluster
lds_config:
api_config_source:
api_type: GRPC
grpc_services:
- envoy_grpc:
cluster_name: xds_cluster
static_resources:
clusters:
- name: xds_cluster
connect_timeout: 1s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
http2_protocol_options: {}
load_assignment:
cluster_name: xds_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: control-plane
port_value: 18000
Load Balancing Strategies
Envoy supports multiple load balancing algorithms:
clusters:
- name: service_cluster
type: STRICT_DNS
# Round robin (default)
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: service_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: service1
port_value: 8080
- endpoint:
address:
socket_address:
address: service2
port_value: 8080
Least request load balancing:
clusters:
- name: service_cluster
type: STRICT_DNS
lb_policy: LEAST_REQUEST
least_request_lb_config:
choice_count: 2 # Select best of 2 random endpoints
Ring hash for consistent hashing:
clusters:
- name: service_cluster
type: STRICT_DNS
lb_policy: RING_HASH
ring_hash_lb_config:
minimum_ring_size: 1024
hash_function: XX_HASH
# Route configuration for hash-based routing
route_config:
virtual_hosts:
- name: service
domains: ["*"]
routes:
- match:
prefix: "/"
route:
cluster: service_cluster
hash_policy:
- header:
header_name: x-user-id
Health Checking
Active health checking:
clusters:
- name: service_cluster
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: service_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: service1
port_value: 8080
health_checks:
- timeout: 1s
interval: 5s
unhealthy_threshold: 2
healthy_threshold: 2
http_health_check:
path: /health
expected_statuses:
- start: 200
end: 299
Outlier detection for passive health checking:
clusters:
- name: service_cluster
outlier_detection:
consecutive_5xx: 5
interval: 10s
base_ejection_time: 30s
max_ejection_percent: 50
enforcing_consecutive_5xx: 100
enforcing_success_rate: 100
success_rate_minimum_hosts: 5
success_rate_request_volume: 100
success_rate_stdev_factor: 1900
Circuit Breaking
Protect upstream services from overload:
clusters:
- name: service_cluster
circuit_breakers:
thresholds:
- priority: DEFAULT
max_connections: 1000
max_pending_requests: 1000
max_requests: 1000
max_retries: 3
- priority: HIGH
max_connections: 2000
max_pending_requests: 2000
max_requests: 2000
max_retries: 5
Retry and Timeout Configuration
route_config:
virtual_hosts:
- name: backend
domains: ["*"]
routes:
- match:
prefix: "/api"
route:
cluster: backend_service
timeout: 15s
retry_policy:
retry_on: "5xx,reset,connect-failure,refused-stream"
num_retries: 3
per_try_timeout: 5s
retry_host_predicate:
- name: envoy.retry_host_predicates.previous_hosts
host_selection_retry_max_attempts: 5
Rate Limiting
Global rate limiting:
http_filters:
- name: envoy.filters.http.ratelimit
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
domain: backend_service
rate_limit_service:
grpc_service:
envoy_grpc:
cluster_name: ratelimit_service
transport_api_version: V3
route_config:
virtual_hosts:
- name: backend
domains: ["*"]
routes:
- match:
prefix: "/api"
route:
cluster: backend_service
rate_limits:
- actions:
- request_headers:
header_name: x-user-id
descriptor_key: user_id
Local rate limiting:
http_filters:
- name: envoy.filters.http.local_ratelimit
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
stat_prefix: http_local_rate_limiter
token_bucket:
max_tokens: 100
tokens_per_fill: 10
fill_interval: 1s
filter_enabled:
runtime_key: local_rate_limit_enabled
default_value:
numerator: 100
denominator: HUNDRED
filter_enforced:
runtime_key: local_rate_limit_enforced
default_value:
numerator: 100
denominator: HUNDRED
TLS Configuration
Downstream TLS (serving HTTPS):
listeners:
- name: https_listener
address:
socket_address:
address: 0.0.0.0
port_value: 443
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_https
route_config:
name: local_route
virtual_hosts:
- name: backend
domains: ["*"]
routes:
- match:
prefix: "/"
route:
cluster: backend_service
http_filters:
- name: envoy.filters.http.router
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
common_tls_context:
tls_certificates:
- certificate_chain:
filename: "/etc/envoy/certs/server.crt"
private_key:
filename: "/etc/envoy/certs/server.key"
Upstream TLS (connecting to HTTPS backends):
clusters:
- name: backend_service
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: backend_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: backend
port_value: 443
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
common_tls_context:
validation_context:
trusted_ca:
filename: "/etc/envoy/certs/ca.crt"
Observability
Metrics:
admin:
access_log_path: /tmp/admin_access.log
address:
socket_address:
address: 0.0.0.0
port_value: 9901
stats_sinks:
- name: envoy.stat_sinks.statsd
typed_config:
"@type": type.googleapis.com/envoy.config.metrics.v3.StatsdSink
tcp_cluster_name: statsd_cluster
prefix: envoy
Access logging:
http_filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
access_log:
- name: envoy.access_loggers.file
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
path: "/var/log/envoy/access.log"
log_format:
json_format:
start_time: "%START_TIME%"
method: "%REQ(:METHOD)%"
path: "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%"
protocol: "%PROTOCOL%"
response_code: "%RESPONSE_CODE%"
response_flags: "%RESPONSE_FLAGS%"
bytes_received: "%BYTES_RECEIVED%"
bytes_sent: "%BYTES_SENT%"
duration: "%DURATION%"
upstream_service_time: "%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%"
x_forwarded_for: "%REQ(X-FORWARDED-FOR)%"
user_agent: "%REQ(USER-AGENT)%"
request_id: "%REQ(X-REQUEST-ID)%"
authority: "%REQ(:AUTHORITY)%"
upstream_host: "%UPSTREAM_HOST%"
Distributed tracing:
tracing:
http:
name: envoy.tracers.zipkin
typed_config:
"@type": type.googleapis.com/envoy.config.trace.v3.ZipkinConfig
collector_cluster: jaeger
collector_endpoint: "/api/v2/spans"
collector_endpoint_version: HTTP_JSON
http_connection_manager:
tracing:
provider:
name: envoy.tracers.zipkin
random_sampling:
value: 100.0 # Trace 100% of requests
Custom Filters
Extend Envoy with Lua:
http_filters:
- name: envoy.filters.http.lua
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua
inline_code: |
function envoy_on_request(request_handle)
-- Add custom header
request_handle:headers():add("x-custom-header", "value")
-- Log request
request_handle:logInfo("Processing request: " .. request_handle:headers():get(":path"))
-- Check auth
local auth = request_handle:headers():get("authorization")
if auth == nil then
request_handle:respond(
{[":status"] = "401"},
"Unauthorized"
)
end
end
function envoy_on_response(response_handle)
-- Modify response
response_handle:headers():add("x-processed-by", "envoy")
end
- name: envoy.filters.http.router
Running Envoy
Docker deployment:
# docker-compose.yml
version: '3'
services:
envoy:
image: envoyproxy/envoy:v1.24-latest
ports:
- "10000:10000"
- "9901:9901"
volumes:
- ./envoy.yaml:/etc/envoy/envoy.yaml
- ./certs:/etc/envoy/certs
command: ["-c", "/etc/envoy/envoy.yaml", "--log-level", "info"]
backend:
image: nginx:alpine
ports:
- "8080:80"
Kubernetes deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: envoy
spec:
replicas: 3
selector:
matchLabels:
app: envoy
template:
metadata:
labels:
app: envoy
spec:
containers:
- name: envoy
image: envoyproxy/envoy:v1.24-latest
ports:
- containerPort: 10000
name: http
- containerPort: 9901
name: admin
volumeMounts:
- name: config
mountPath: /etc/envoy
command:
- /usr/local/bin/envoy
- -c
- /etc/envoy/envoy.yaml
- --log-level
- info
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 1000m
memory: 512Mi
volumes:
- name: config
configMap:
name: envoy-config
Performance Tuning
Worker threads:
static_resources:
listeners:
- name: listener_0
address:
socket_address:
address: 0.0.0.0
port_value: 10000
per_connection_buffer_limit_bytes: 32768
# Set via command line
# --concurrency 4 # Number of worker threads
Connection pooling:
clusters:
- name: backend_service
connect_timeout: 5s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
circuit_breakers:
thresholds:
- max_connections: 1000
http2_protocol_options:
max_concurrent_streams: 100
Debugging
Enable debug logging:
# Start with debug logging
envoy -c envoy.yaml --log-level debug
# Or change at runtime via admin API
curl -X POST http://localhost:9901/logging?level=debug
Admin interface:
# Get stats
curl http://localhost:9901/stats
# Get config dump
curl http://localhost:9901/config_dump
# Get clusters
curl http://localhost:9901/clusters
Conclusion
Envoyβs architecture and rich feature set make it ideal for service mesh implementations. Key takeaways:
- Dynamic configuration via xDS APIs enables zero-downtime updates
- Rich load balancing and health checking for reliability
- Circuit breaking and retries prevent cascading failures
- Observability built-in with metrics, logging, and tracing
- Extensibility through filters and Lua scripting
Start with static configuration to understand the basics, then move to dynamic configuration for production deployments. Whether youβre using Envoy standalone or as part of a service mesh, understanding its capabilities is essential for building resilient microservices architectures.