Serverless and Key Management: Evaluating AWS Lambda for Cryptographic Services

AWS Lambda and serverless computing have been getting a lot of attention lately. The promise of running code without managing servers is appealing, especially for event-driven workloads. But can serverless work for security-critical key management services? I’ve been evaluating Lambda for parts of our platform, and the answer is nuanced.

The Serverless Promise

Serverless (specifically AWS Lambda) offers compelling benefits:

No server management: AWS handles scaling, patching, availability. We write functions, not manage infrastructure.

Pay per use: Charged only for execution time, not idle capacity. For services with variable load, this could reduce costs significantly.

Automatic scaling: Lambda scales from zero to thousands of concurrent executions automatically.

Event-driven: Natural fit for event-driven architectures. S3 upload triggers function, SNS message triggers function, etc.

For certain workloads, these benefits are game-changing. But key management has specific requirements that challenge the serverless model.

Stateless by Design

Lambda functions are ephemeral and stateless. Each invocation might run on different infrastructure. State can’t be stored in memory between invocations.

For many applications, this is fine - read input, process, write output. But key management services often maintain state:

HSM connections: Establishing HSM connection requires authentication and handshake, taking 100-200ms. We cache connections across requests to avoid this overhead.

Decrypted keys: Data encryption keys decrypted from master keys are cached in memory for reuse. Eliminates repeated HSM calls.

Connection pools: Database and service connection pools improve performance by reusing connections.

Warming: JVMs and some libraries perform optimizations on “warm” startup. Cold starts are slower.

Lambda’s stateless model prevents these optimizations. Every function invocation potentially pays cold start and connection establishment costs.

Cold Start Performance

Lambda cold starts occur when a function invokes for the first time or after being idle. AWS must provision container, load function code, initialize runtime.

For Java (which most of our services use), cold starts are painful:

JVM initialization: 1-2 seconds
Loading classes: 500ms - 1s
Application initialization: 200-500ms

Total cold start: 2-4 seconds for Java Lambda functions.

For Python or Node.js functions, cold starts are much faster (100-300ms). For compiled languages like Go, somewhere in between (500ms-1s).

When customers expect sub-100ms latency for key operations, 2-4 second cold starts are unacceptable. Even 500ms is problematic.

Warming Strategies

To mitigate cold starts, we’ve explored warming strategies:

Scheduled invocations: CloudWatch Events invoke functions every 5 minutes to keep them warm.

Provisioned concurrency: AWS recently introduced this - maintain minimum number of warm function instances.

Request pipelining: Keep functions busy with continuous traffic to prevent them from going cold.

These work but add complexity and cost. Scheduled invocations mean paying for compute we don’t need. Provisioned concurrency eliminates cost savings of serverless.

HSM Integration Challenges

Our services integrate with HSMs using PKCS#11 libraries. Lambda introduces several challenges:

Native libraries: PKCS#11 libraries are compiled C/C++. Must be packaged with Lambda deployment. Works but complicates deployment.

Network access: HSMs are network appliances. Lambda functions need VPC configuration to reach them. VPC-configured Lambdas have additional cold start penalty.

Connection management: Can’t maintain persistent HSM connections across Lambda invocations. Each invocation establishes new connection, adding latency and HSM load.

Session limits: HSMs have finite concurrent sessions. Lambda’s auto-scaling could overwhelm HSM session capacity.

For services making occasional HSM calls, Lambda could work. For high-throughput HSM operations, traditional services are better suited.

Secrets Management

Lambda functions need secrets (database passwords, HSM credentials, API keys). Options:

Environment variables: Simple but secrets visible in Lambda configuration. Not suitable for highly sensitive credentials.

Parameter Store/Secrets Manager: AWS services for secret storage. Lambda fetches secrets at startup. Adds latency but more secure.

KMS encryption: Encrypt secrets and decrypt in Lambda function. Requires KMS permissions and adds complexity.

For our security requirements, environment variables aren’t sufficient. Fetching from Secrets Manager adds cold start latency but is necessary.

Use Cases Where Lambda Makes Sense

Despite challenges, some key management use cases fit Lambda well:

Audit log processing: Lambda function triggered by audit log entries. Processes, enriches, forwards to long-term storage. Event-driven, doesn’t need low latency.

Compliance reporting: Scheduled Lambda functions generate compliance reports from audit data. Runs weekly, performance not critical.

Key rotation: Scheduled Lambda triggers key rotation workflows. Happens daily or weekly, not latency-sensitive.

Webhook handlers: External systems webhook into our platform. Lambda functions handle webhooks, validate, forward to processing queue.

These are event-driven, not latency-sensitive, and don’t require persistent state. Lambda is a good fit.

What We’re Not Using Lambda For

Based on evaluation, we’re not using Lambda for:

Hot path key operations: Encrypt, decrypt, sign, verify operations that customers invoke directly. Latency and throughput requirements don’t fit Lambda’s model.

HSM proxy service: Requires persistent HSM connections and sub-millisecond latency.

Policy evaluation: Happens on every key operation. Even minor Lambda overhead adds up.

Real-time monitoring: Continuous processing of metrics and logs requires persistent processes.

These remain in containerized services on Kubernetes where we control scaling, connection management, and have predictable performance.

Cost Analysis

For services we could run on Lambda, cost analysis is interesting:

Traditional container: Fixed cost. Pay for capacity whether used or not. Predictable.

Lambda: Variable cost. Pay only for execution time. Lower at low utilization, potentially higher at very high utilization.

For our audit log processing (runs sporadically), Lambda is cheaper. For always-on services, containers are more cost-effective.

We’ve moved several low-traffic services to Lambda, saving on infrastructure costs.

Development and Testing

Lambda changes development workflow:

Local testing: Can’t run Lambda exactly as it runs in AWS. Use frameworks (SAM, Serverless Framework) that simulate Lambda environment.

Deployment: Deploy code to AWS to test in real Lambda environment. Slower iteration than local containers.

Debugging: Debugging production Lambda issues requires CloudWatch Logs analysis. No SSH access to investigate.

Versioning: Lambda supports versions and aliases. Can deploy new version to small percentage of traffic (like canary deployment).

The workflow is different than containerized services. Not worse, just different. Team had to learn new patterns.

Monitoring and Observability

Lambda monitoring relies on CloudWatch:

Logs: Function output goes to CloudWatch Logs. Can search and set alarms.

Metrics: Invocations, duration, errors, throttles automatically tracked in CloudWatch Metrics.

X-Ray: AWS X-Ray provides distributed tracing for Lambda functions.

We forward Lambda logs to our Elasticsearch cluster for unified visibility across Lambda and non-Lambda services.

Security Considerations

Lambda has interesting security properties:

Isolation: Each function execution isolated. No shared state between invocations.

IAM integration: Fine-grained IAM roles control what each function can access.

Temporary compute: After execution, nothing persists. Reduces attack surface.

Automatic patching: AWS handles OS and runtime patching. No vulnerability management needed.

But also challenges:

Cold start libraries: Functions load dependencies at cold start. Compromised dependency could affect function.

Limited visibility: Can’t inspect running Lambda containers like we can with Kubernetes pods.

Shared infrastructure: Lambda runs on multi-tenant infrastructure. Trust AWS isolation.

Overall, Lambda’s security posture is reasonable for our use cases.

Looking Forward

Serverless is evolving rapidly:

Faster cold starts: AWS is improving cold start performance.

Better connection management: Features like RDS Proxy help Lambda functions reuse database connections.

Provisioned concurrency: Addresses cold start issue for latency-sensitive functions (at additional cost).

Step Functions: Orchestrate multiple Lambda functions for complex workflows.

Container support: AWS announced Lambda container support, providing more control over environment.

We’ll continue evaluating as Lambda matures. For certain use cases, it’s already viable.

Key Takeaways

For teams considering serverless for security/key management:

Cold starts are a real issue for latency-sensitive workloads
Stateless model doesn’t fit services that benefit from persistent connections
Great for event-driven, sporadic workloads
Not suitable for hot path cryptographic operations
Cost savings depend heavily on usage patterns
Development workflow differs from containerized services
Security properties are reasonable but require different mindset

Serverless isn’t a replacement for all services. It’s another tool in the toolbox, appropriate for specific use cases. We’re using it where it fits and avoiding it where it doesn’t.

The key is understanding your requirements deeply and being pragmatic about technology choices. Serverless hype is real, but so are its limitations. Use it where it makes sense, not everywhere just because it’s new and exciting.