Encrypting data is easy. Encrypting millions of records per second while maintaining security, performance, and operational simplicity is hard.
We started with a simple encryption service: encrypt data, store it, decrypt on demand. As we scaled, we hit performance bottlenecks, operational complexity, and key management nightmares.
After rebuilding our encryption architecture three times, Iβve learned what actually works at scale.
Hereβs how to build encryption systems that scale.
The Naive Approach (And Why It Fails)
The simplest encryption architecture:
type SimpleEncryptionService struct {
masterKey []byte
}
func (s *SimpleEncryptionService) Encrypt(plaintext []byte) ([]byte, error) {
// Create cipher
block, err := aes.NewCipher(s.masterKey)
if err != nil {
return nil, err
}
// Use GCM mode
gcm, err := cipher.NewGCM(block)
if err != nil {
return nil, err
}
// Generate nonce
nonce := make([]byte, gcm.NonceSize())
if _, err := rand.Read(nonce); err != nil {
return nil, err
}
// Encrypt
ciphertext := gcm.Seal(nonce, nonce, plaintext, nil)
return ciphertext, nil
}
func (s *SimpleEncryptionService) Decrypt(ciphertext []byte) ([]byte, error) {
block, err := aes.NewCipher(s.masterKey)
if err != nil {
return nil, err
}
gcm, err := cipher.NewGCM(block)
if err != nil {
return nil, err
}
nonceSize := gcm.NonceSize()
nonce, ciphertext := ciphertext[:nonceSize], ciphertext[nonceSize:]
plaintext, err := gcm.Open(nil, nonce, ciphertext, nil)
if err != nil {
return nil, err
}
return plaintext, nil
}
This works for small scale. Problems at large scale:
Problem 1: Single Key for Everything
All data encrypted with one key. If that key is compromised, all data is exposed. Key rotation means re-encrypting everything.
Problem 2: No Key Versioning
Canβt rotate keys without breaking old ciphertexts.
Problem 3: No Access Control
Anyone with the master key can decrypt anything.
Problem 4: Performance Bottleneck
All encryption operations go through one service. Doesnβt scale horizontally.
Problem 5: Key Management Nightmare
Where do you store the master key securely? How do you backup? How do you share across services?
We need a better architecture.
Envelope Encryption Pattern
Instead of encrypting data directly with a master key, use envelope encryption:
βββββββββββββββββββββββββββββββββββββββ
β Master Key (in HSM or KMS) β
ββββββββββββββββ¬βββββββββββββββββββββββ
β
β encrypts
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Data Encryption Key (DEK) β
β (unique per record or batch) β
ββββββββββββββββ¬βββββββββββββββββββββββ
β
β encrypts
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Actual Data β
βββββββββββββββββββββββββββββββββββββββ
How it works:
- Generate unique DEK for each record (or batch)
- Encrypt data with DEK
- Encrypt DEK with master key
- Store encrypted data + encrypted DEK together
Implementation:
type EnvelopeEncryptionService struct {
kms KeyManagementService
}
type EncryptedData struct {
Ciphertext []byte
EncryptedDEK []byte
KeyID string
Algorithm string
}
func (s *EnvelopeEncryptionService) Encrypt(plaintext []byte, keyID string) (*EncryptedData, error) {
// Generate random DEK
dek := make([]byte, 32) // 256-bit key
if _, err := rand.Read(dek); err != nil {
return nil, err
}
// Encrypt data with DEK
ciphertext, err := encryptAESGCM(plaintext, dek)
if err != nil {
return nil, err
}
// Encrypt DEK with master key (via KMS)
encryptedDEK, err := s.kms.Encrypt(keyID, dek)
if err != nil {
return nil, err
}
return &EncryptedData{
Ciphertext: ciphertext,
EncryptedDEK: encryptedDEK,
KeyID: keyID,
Algorithm: "AES-256-GCM",
}, nil
}
func (s *EnvelopeEncryptionService) Decrypt(data *EncryptedData) ([]byte, error) {
// Decrypt DEK using KMS
dek, err := s.kms.Decrypt(data.KeyID, data.EncryptedDEK)
if err != nil {
return nil, err
}
defer zero(dek) // Clear key from memory
// Decrypt data with DEK
plaintext, err := decryptAESGCM(data.Ciphertext, dek)
if err != nil {
return nil, err
}
return plaintext, nil
}
func encryptAESGCM(plaintext, key []byte) ([]byte, error) {
block, err := aes.NewCipher(key)
if err != nil {
return nil, err
}
gcm, err := cipher.NewGCM(block)
if err != nil {
return nil, err
}
nonce := make([]byte, gcm.NonceSize())
if _, err := rand.Read(nonce); err != nil {
return nil, err
}
return gcm.Seal(nonce, nonce, plaintext, nil), nil
}
func zero(b []byte) {
for i := range b {
b[i] = 0
}
}
Benefits:
- Performance: Data encryption/decryption is fast (local AES), only key operations use KMS
- Key rotation: Change master key without re-encrypting all data
- Isolation: Different records use different DEKs
- Revocation: Revoke access to master key, old data becomes inaccessible
This is the foundation. Now letβs optimize further.
Key Hierarchy
Donβt use the master key directly. Build a hierarchy:
βββββββββββββββββββββββββββββββββ
β Root Key (HSM) β
ββββββββββββββββ¬βββββββββββββββββ
β
ββββββββ΄βββββββ¬βββββββββββββββ
βΌ βΌ βΌ
βββββββββββββ βββββββββββββ βββββββββββββ
βRegion Key β βRegion Key β βRegion Key β
β US-East β β US-West β β EU β
βββββββ¬ββββββ βββββββ¬ββββββ βββββββ¬ββββββ
β β β
ββββ΄βββ ββββ΄βββ ββββ΄βββ
βΌ βΌ βΌ βΌ βΌ βΌ
Service Service Service Service Service Service
Key Key Key Key Key Key
β β β β β β
βΌ βΌ βΌ βΌ βΌ βΌ
Data Data Data Data Data Data
Implementation:
type KeyHierarchy struct {
rootKey *HSMKey
regionKeys map[string]*RegionKey
serviceKeys map[string]*ServiceKey
}
type RegionKey struct {
ID string
EncryptedKey []byte
Region string
CreatedAt time.Time
}
type ServiceKey struct {
ID string
EncryptedKey []byte
Service string
Region string
CreatedAt time.Time
}
func (kh *KeyHierarchy) GetServiceKey(service, region string) (*ServiceKey, error) {
keyID := fmt.Sprintf("%s-%s", service, region)
// Check cache
if key, ok := kh.serviceKeys[keyID]; ok {
return key, nil
}
// Get region key
regionKey, err := kh.getRegionKey(region)
if err != nil {
return nil, err
}
// Decrypt service key using region key
serviceKey, err := kh.loadServiceKey(keyID, regionKey)
if err != nil {
return nil, err
}
// Cache
kh.serviceKeys[keyID] = serviceKey
return serviceKey, nil
}
func (kh *KeyHierarchy) RotateRootKey() error {
// Generate new root key
newRootKey, err := kh.generateRootKey()
if err != nil {
return err
}
// Re-encrypt all region keys with new root key
for region, regionKey := range kh.regionKeys {
// Decrypt with old root key
decryptedKey, err := kh.rootKey.Decrypt(regionKey.EncryptedKey)
if err != nil {
return err
}
// Encrypt with new root key
encryptedKey, err := newRootKey.Encrypt(decryptedKey)
if err != nil {
return err
}
// Update region key
regionKey.EncryptedKey = encryptedKey
kh.regionKeys[region] = regionKey
// Persist to database
db.Save(regionKey)
}
// Switch to new root key
kh.rootKey = newRootKey
return nil
}
Benefits:
- Geographic isolation: EU data uses EU keys, stays in EU
- Service isolation: Compromise of one service doesnβt expose all data
- Efficient rotation: Rotate root key by re-encrypting region keys (not all data)
- Access control: Grant service access to specific region keys only
Caching and Performance
KMS calls are expensive (latency and cost). Cache aggressively.
DEK Caching
Cache decrypted DEKs for frequent access:
type DEKCache struct {
cache *lru.Cache
ttl time.Duration
mu sync.RWMutex
}
type CachedDEK struct {
Key []byte
ExpiresAt time.Time
}
func NewDEKCache(size int, ttl time.Duration) *DEKCache {
cache, _ := lru.New(size)
return &DEKCache{
cache: cache,
ttl: ttl,
}
}
func (c *DEKCache) Get(encryptedDEK []byte) ([]byte, bool) {
c.mu.RLock()
defer c.mu.RUnlock()
key := string(encryptedDEK)
if val, ok := c.cache.Get(key); ok {
cached := val.(*CachedDEK)
if time.Now().Before(cached.ExpiresAt) {
return cached.Key, true
}
// Expired, remove
c.cache.Remove(key)
}
return nil, false
}
func (c *DEKCache) Put(encryptedDEK, dek []byte) {
c.mu.Lock()
defer c.mu.Unlock()
key := string(encryptedDEK)
cached := &CachedDEK{
Key: dek,
ExpiresAt: time.Now().Add(c.ttl),
}
c.cache.Add(key, cached)
}
// Use in encryption service
func (s *EnvelopeEncryptionService) DecryptWithCache(data *EncryptedData) ([]byte, error) {
// Try cache first
if dek, ok := s.dekCache.Get(data.EncryptedDEK); ok {
return decryptAESGCM(data.Ciphertext, dek)
}
// Cache miss, decrypt via KMS
dek, err := s.kms.Decrypt(data.KeyID, data.EncryptedDEK)
if err != nil {
return nil, err
}
// Cache for future use
s.dekCache.Put(data.EncryptedDEK, dek)
return decryptAESGCM(data.Ciphertext, dek)
}
Performance impact:
- Without cache: 10-50ms per decrypt (KMS call)
- With cache: <1ms per decrypt (local operation)
Cache hit rate >95% for hot data.
Batch Encryption
Encrypt multiple records with the same DEK:
type BatchEncryption struct {
DEK []byte
EncryptedDEK []byte
Records []*EncryptedRecord
}
type EncryptedRecord struct {
ID string
Ciphertext []byte
}
func (s *EnvelopeEncryptionService) EncryptBatch(records [][]byte, keyID string) (*BatchEncryption, error) {
// Generate one DEK for the batch
dek := make([]byte, 32)
if _, err := rand.Read(dek); err != nil {
return nil, err
}
// Encrypt DEK with master key
encryptedDEK, err := s.kms.Encrypt(keyID, dek)
if err != nil {
return nil, err
}
batch := &BatchEncryption{
EncryptedDEK: encryptedDEK,
Records: make([]*EncryptedRecord, len(records)),
}
// Encrypt all records with same DEK
for i, plaintext := range records {
ciphertext, err := encryptAESGCM(plaintext, dek)
if err != nil {
return nil, err
}
batch.Records[i] = &EncryptedRecord{
ID: uuid.New().String(),
Ciphertext: ciphertext,
}
}
zero(dek)
return batch, nil
}
Benefits:
- One KMS call for many records
- Throughput increases 10-100x for bulk operations
Use for batch ETL jobs, data imports, log ingestion.
Client-Side Encryption
Move encryption to clients for maximum security and performance:
ββββββββββββ
β Client β
ββββββ¬ββββββ
β 1. Request DEK
βΌ
ββββββββββββββββ
β KMS Server β
ββββββ¬ββββββββββ
β 2. Return encrypted DEK
βΌ
ββββββββββββ
β Client ββββ
ββββββββββββ β 3. Decrypt DEK locally
β 4. Encrypt data with DEK
β 5. Store encrypted data + encrypted DEK
βΌ
ββββββββββββ
β Database β
ββββββββββββ
Implementation:
// Client-side encryption library
type ClientEncryption struct {
kmsClient KMSClient
dekCache *DEKCache
}
func (c *ClientEncryption) Encrypt(plaintext []byte) (*EncryptedData, error) {
// Get DEK from KMS (with caching)
dek, encryptedDEK, err := c.getDEK()
if err != nil {
return nil, err
}
// Encrypt locally
ciphertext, err := encryptAESGCM(plaintext, dek)
if err != nil {
return nil, err
}
return &EncryptedData{
Ciphertext: ciphertext,
EncryptedDEK: encryptedDEK,
KeyID: c.keyID,
}, nil
}
func (c *ClientEncryption) getDEK() ([]byte, []byte, error) {
// Check cache
if dek := c.dekCache.GetCurrent(); dek != nil {
return dek.Plaintext, dek.Encrypted, nil
}
// Request new DEK from KMS
resp, err := c.kmsClient.GenerateDataKey(c.keyID, 32)
if err != nil {
return nil, nil, err
}
// Cache DEK
c.dekCache.SetCurrent(&DEK{
Plaintext: resp.Plaintext,
Encrypted: resp.Encrypted,
ExpiresAt: time.Now().Add(1 * time.Hour),
})
return resp.Plaintext, resp.Encrypted, nil
}
Benefits:
- Security: Data never transmitted unencrypted
- Performance: No server-side encryption overhead
- Scalability: Encryption load distributed to clients
Tradeoffs:
- Clients need access to KMS
- Key management complexity on client side
- Need to trust client implementation
Key Rotation
Rotate keys regularly without downtime.
Versioned Keys
type VersionedKey struct {
ID string
Version int
Key []byte
Active bool
CreatedAt time.Time
}
type KeyManager struct {
keys map[string][]*VersionedKey // keyID -> versions
}
func (km *KeyManager) Encrypt(keyID string, plaintext []byte) (*EncryptedData, error) {
// Get active version
activeKey := km.getActiveKey(keyID)
// Encrypt
ciphertext, err := encryptAESGCM(plaintext, activeKey.Key)
if err != nil {
return nil, err
}
return &EncryptedData{
Ciphertext: ciphertext,
KeyID: keyID,
KeyVersion: activeKey.Version,
}, nil
}
func (km *KeyManager) Decrypt(data *EncryptedData) ([]byte, error) {
// Get specific version
key := km.getKeyVersion(data.KeyID, data.KeyVersion)
if key == nil {
return nil, errors.New("key version not found")
}
return decryptAESGCM(data.Ciphertext, key.Key)
}
func (km *KeyManager) RotateKey(keyID string) error {
// Generate new version
newVersion := &VersionedKey{
ID: keyID,
Version: km.getNextVersion(keyID),
Key: generateKey(32),
Active: true,
CreatedAt: time.Now(),
}
// Deactivate old version
oldKey := km.getActiveKey(keyID)
oldKey.Active = false
// Add new version
km.keys[keyID] = append(km.keys[keyID], newVersion)
// New encryptions use new version
// Old decryptions still work with old version
return nil
}
Background Re-encryption
Gradually re-encrypt data with new key:
type ReencryptionJob struct {
keyID string
oldVersion int
newVersion int
}
func (job *ReencryptionJob) Run() error {
// Find records encrypted with old key version
var records []EncryptedRecord
db.Find(&records, "key_id = ? AND key_version = ?", job.keyID, job.oldVersion)
for _, record := range records {
// Decrypt with old key
plaintext, err := decrypt(record.Ciphertext, job.keyID, job.oldVersion)
if err != nil {
log.Error("Decryption failed", "record", record.ID, "error", err)
continue
}
// Encrypt with new key
ciphertext, err := encrypt(plaintext, job.keyID, job.newVersion)
if err != nil {
log.Error("Encryption failed", "record", record.ID, "error", err)
continue
}
// Update record
record.Ciphertext = ciphertext
record.KeyVersion = job.newVersion
db.Save(&record)
// Rate limit to avoid overload
time.Sleep(10 * time.Millisecond)
}
return nil
}
Run as background job. Old data gradually migrates to new key.
Access Control and Audit
Encryption is useless without access control.
Policy-Based Access
type EncryptionPolicy struct {
KeyID string
AllowedRoles []string
AllowedIPs []string
RequiresMFA bool
}
type AccessRequest struct {
Principal string
Roles []string
IPAddress string
MFAToken string
}
func (p *EncryptionPolicy) Authorize(req *AccessRequest) error {
// Check role
if !contains(p.AllowedRoles, req.Roles) {
return errors.New("role not authorized")
}
// Check IP
if len(p.AllowedIPs) > 0 && !contains(p.AllowedIPs, []string{req.IPAddress}) {
return errors.New("IP not authorized")
}
// Check MFA
if p.RequiresMFA && req.MFAToken == "" {
return errors.New("MFA required")
}
return nil
}
// Use in KMS
func (kms *KMS) Decrypt(req *DecryptRequest) ([]byte, error) {
// Load policy
policy := kms.getPolicy(req.KeyID)
// Authorize
if err := policy.Authorize(req.AccessRequest); err != nil {
kms.auditLog(req, "DENIED", err.Error())
return nil, err
}
// Decrypt
plaintext, err := kms.decrypt(req.KeyID, req.Ciphertext)
// Audit
kms.auditLog(req, "ALLOWED", "")
return plaintext, err
}
Audit Logging
Log every encryption operation:
type AuditLog struct {
Timestamp time.Time
Operation string // "ENCRYPT", "DECRYPT", "ROTATE"
KeyID string
Principal string
IPAddress string
Result string // "SUCCESS", "DENIED", "ERROR"
ErrorMsg string
}
func (kms *KMS) auditLog(req *Request, result, errorMsg string) {
log := &AuditLog{
Timestamp: time.Now(),
Operation: req.Operation,
KeyID: req.KeyID,
Principal: req.Principal,
IPAddress: req.IPAddress,
Result: result,
ErrorMsg: errorMsg,
}
// Write to secure audit log
kms.auditStore.Write(log)
// Send to SIEM for monitoring
kms.siem.Send(log)
}
Audit logs are immutable and tamper-evident.
Geographic Compliance
Keep data in specific regions for compliance (GDPR, data residency).
type RegionalKMS struct {
regions map[string]*KMS
}
func (r *RegionalKMS) Encrypt(region string, plaintext []byte) (*EncryptedData, error) {
kms, ok := r.regions[region]
if !ok {
return nil, errors.New("region not supported")
}
// Ensure KMS is in the correct region
if kms.Region != region {
return nil, errors.New("KMS region mismatch")
}
return kms.Encrypt(plaintext)
}
// Usage
regionalKMS := &RegionalKMS{
regions: map[string]*KMS{
"eu-west-1": NewKMS("eu-west-1", "https://kms.eu-west-1.amazonaws.com"),
"us-east-1": NewKMS("us-east-1", "https://kms.us-east-1.amazonaws.com"),
},
}
// EU data stays in EU
euData, _ := regionalKMS.Encrypt("eu-west-1", userData)
Monitoring and Alerting
Track encryption system health:
type EncryptionMetrics struct {
EncryptionLatency prometheus.Histogram
DecryptionLatency prometheus.Histogram
KMSCallLatency prometheus.Histogram
CacheHitRate prometheus.Gauge
ErrorRate prometheus.Counter
KeyRotations prometheus.Counter
}
func (s *EnvelopeEncryptionService) EncryptWithMetrics(plaintext []byte) (*EncryptedData, error) {
start := time.Now()
data, err := s.Encrypt(plaintext)
duration := time.Since(start).Seconds()
s.metrics.EncryptionLatency.Observe(duration)
if err != nil {
s.metrics.ErrorRate.Inc()
}
return data, err
}
Alert on anomalies:
- High error rate
- Increased latency
- Low cache hit rate
- Unusual access patterns
Production Architecture
Putting it all together:
βββββββββββββββββββββββββββββββββββββββββββββββ
β Application Servers (Multiple Regions) β
β βββββββββββββββββββββββββββββββββββββββ β
β β Client Encryption Library β β
β β - DEK caching β β
β β - Batch encryption β β
β β - Retry logic β β
β βββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββ
β
β HTTPS + mTLS
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β Regional KMS Services β
β βββββββββββββββ¬βββββββββββββββ¬ββββββββββββ β
β β EU-West-1 β US-East-1 β AP-SE-1 β β
β β β β β β
β β ββββββββββ β ββββββββββ β βββββββββββ β
β β βRegion β β βRegion β β βRegion ββ β
β β βKey β β βKey β β βKey ββ β
β β ββββββββββ β ββββββββββ β βββββββββββ β
β βββββββββββββββ΄βββββββββββββββ΄ββββββββββββ β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββ
β
β HSM Connection
βΌ
ββββββββββββββββββββ
β HSM Cluster β
β (Root Keys) β
ββββββββββββββββββββ
Components:
- Client Library: Handles encryption/decryption with caching
- Regional KMS: Multi-region deployment for performance and compliance
- HSM Cluster: Hardware security modules for root key protection
- Audit System: Comprehensive logging and monitoring
Performance Numbers
Our production metrics (based on cloud KMS services):
Envelope Encryption:
- Encryption: 50,000 ops/sec/instance (with DEK caching)
- Decryption: 100,000 ops/sec/instance (with DEK caching)
- KMS calls: ~100-500/sec (cache hit rate >99%)
Batch Encryption:
- Throughput: 500,000 records/sec
- KMS calls: 10/sec (50,000 records per DEK)
Latency:
- Local encryption/decryption: <1ms (p99)
- KMS call: 10-50ms (p99)
- End-to-end with cache: <2ms (p99)
Key Takeaways
- Use envelope encryption: Donβt encrypt directly with master keys
- Build key hierarchies: Layer keys for isolation and rotation
- Cache aggressively: DEK caching reduces KMS load by 99%
- Version keys: Enable rotation without breaking old data
- Audit everything: Log all encryption operations
- Regional deployment: Keep data in required jurisdictions
- Client-side when possible: Reduce server load and improve security
Start simple, optimize based on actual performance requirements.
In my next post, Iβll do a year-end review of 2016βthe biggest trends in cloud-native security and what to expect in 2017.
Encryption at scale is possible. Build it right from the start.