Encrypting data is easy. Encrypting millions of records per second while maintaining security, performance, and operational simplicity is hard.

We started with a simple encryption service: encrypt data, store it, decrypt on demand. As we scaled, we hit performance bottlenecks, operational complexity, and key management nightmares.

After rebuilding our encryption architecture three times, I’ve learned what actually works at scale.

Here’s how to build encryption systems that scale.

The Naive Approach (And Why It Fails)

The simplest encryption architecture:

type SimpleEncryptionService struct {
    masterKey []byte
}

func (s *SimpleEncryptionService) Encrypt(plaintext []byte) ([]byte, error) {
    // Create cipher
    block, err := aes.NewCipher(s.masterKey)
    if err != nil {
        return nil, err
    }

    // Use GCM mode
    gcm, err := cipher.NewGCM(block)
    if err != nil {
        return nil, err
    }

    // Generate nonce
    nonce := make([]byte, gcm.NonceSize())
    if _, err := rand.Read(nonce); err != nil {
        return nil, err
    }

    // Encrypt
    ciphertext := gcm.Seal(nonce, nonce, plaintext, nil)
    return ciphertext, nil
}

func (s *SimpleEncryptionService) Decrypt(ciphertext []byte) ([]byte, error) {
    block, err := aes.NewCipher(s.masterKey)
    if err != nil {
        return nil, err
    }

    gcm, err := cipher.NewGCM(block)
    if err != nil {
        return nil, err
    }

    nonceSize := gcm.NonceSize()
    nonce, ciphertext := ciphertext[:nonceSize], ciphertext[nonceSize:]

    plaintext, err := gcm.Open(nil, nonce, ciphertext, nil)
    if err != nil {
        return nil, err
    }

    return plaintext, nil
}

This works for small scale. Problems at large scale:

Problem 1: Single Key for Everything

All data encrypted with one key. If that key is compromised, all data is exposed. Key rotation means re-encrypting everything.

Problem 2: No Key Versioning

Can’t rotate keys without breaking old ciphertexts.

Problem 3: No Access Control

Anyone with the master key can decrypt anything.

Problem 4: Performance Bottleneck

All encryption operations go through one service. Doesn’t scale horizontally.

Problem 5: Key Management Nightmare

Where do you store the master key securely? How do you backup? How do you share across services?

We need a better architecture.

Envelope Encryption Pattern

Instead of encrypting data directly with a master key, use envelope encryption:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Master Key (in HSM or KMS)         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β”‚ encrypts
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Data Encryption Key (DEK)          β”‚
β”‚  (unique per record or batch)       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β”‚ encrypts
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Actual Data                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

How it works:

  1. Generate unique DEK for each record (or batch)
  2. Encrypt data with DEK
  3. Encrypt DEK with master key
  4. Store encrypted data + encrypted DEK together

Implementation:

type EnvelopeEncryptionService struct {
    kms KeyManagementService
}

type EncryptedData struct {
    Ciphertext     []byte
    EncryptedDEK   []byte
    KeyID          string
    Algorithm      string
}

func (s *EnvelopeEncryptionService) Encrypt(plaintext []byte, keyID string) (*EncryptedData, error) {
    // Generate random DEK
    dek := make([]byte, 32) // 256-bit key
    if _, err := rand.Read(dek); err != nil {
        return nil, err
    }

    // Encrypt data with DEK
    ciphertext, err := encryptAESGCM(plaintext, dek)
    if err != nil {
        return nil, err
    }

    // Encrypt DEK with master key (via KMS)
    encryptedDEK, err := s.kms.Encrypt(keyID, dek)
    if err != nil {
        return nil, err
    }

    return &EncryptedData{
        Ciphertext:   ciphertext,
        EncryptedDEK: encryptedDEK,
        KeyID:        keyID,
        Algorithm:    "AES-256-GCM",
    }, nil
}

func (s *EnvelopeEncryptionService) Decrypt(data *EncryptedData) ([]byte, error) {
    // Decrypt DEK using KMS
    dek, err := s.kms.Decrypt(data.KeyID, data.EncryptedDEK)
    if err != nil {
        return nil, err
    }
    defer zero(dek) // Clear key from memory

    // Decrypt data with DEK
    plaintext, err := decryptAESGCM(data.Ciphertext, dek)
    if err != nil {
        return nil, err
    }

    return plaintext, nil
}

func encryptAESGCM(plaintext, key []byte) ([]byte, error) {
    block, err := aes.NewCipher(key)
    if err != nil {
        return nil, err
    }

    gcm, err := cipher.NewGCM(block)
    if err != nil {
        return nil, err
    }

    nonce := make([]byte, gcm.NonceSize())
    if _, err := rand.Read(nonce); err != nil {
        return nil, err
    }

    return gcm.Seal(nonce, nonce, plaintext, nil), nil
}

func zero(b []byte) {
    for i := range b {
        b[i] = 0
    }
}

Benefits:

  • Performance: Data encryption/decryption is fast (local AES), only key operations use KMS
  • Key rotation: Change master key without re-encrypting all data
  • Isolation: Different records use different DEKs
  • Revocation: Revoke access to master key, old data becomes inaccessible

This is the foundation. Now let’s optimize further.

Key Hierarchy

Don’t use the master key directly. Build a hierarchy:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Root Key (HSM)               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β–Ό             β–Ό              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Region Key β”‚  β”‚Region Key β”‚  β”‚Region Key β”‚
β”‚  US-East  β”‚  β”‚  US-West  β”‚  β”‚   EU      β”‚
β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
      β”‚              β”‚              β”‚
   β”Œβ”€β”€β”΄β”€β”€β”        β”Œβ”€β”€β”΄β”€β”€β”        β”Œβ”€β”€β”΄β”€β”€β”
   β–Ό     β–Ό        β–Ό     β–Ό        β–Ό     β–Ό
Service Service Service Service Service Service
  Key     Key     Key     Key     Key     Key
   β”‚       β”‚       β”‚       β”‚       β”‚       β”‚
   β–Ό       β–Ό       β–Ό       β–Ό       β–Ό       β–Ό
  Data    Data    Data    Data    Data    Data

Implementation:

type KeyHierarchy struct {
    rootKey    *HSMKey
    regionKeys map[string]*RegionKey
    serviceKeys map[string]*ServiceKey
}

type RegionKey struct {
    ID              string
    EncryptedKey    []byte
    Region          string
    CreatedAt       time.Time
}

type ServiceKey struct {
    ID              string
    EncryptedKey    []byte
    Service         string
    Region          string
    CreatedAt       time.Time
}

func (kh *KeyHierarchy) GetServiceKey(service, region string) (*ServiceKey, error) {
    keyID := fmt.Sprintf("%s-%s", service, region)

    // Check cache
    if key, ok := kh.serviceKeys[keyID]; ok {
        return key, nil
    }

    // Get region key
    regionKey, err := kh.getRegionKey(region)
    if err != nil {
        return nil, err
    }

    // Decrypt service key using region key
    serviceKey, err := kh.loadServiceKey(keyID, regionKey)
    if err != nil {
        return nil, err
    }

    // Cache
    kh.serviceKeys[keyID] = serviceKey

    return serviceKey, nil
}

func (kh *KeyHierarchy) RotateRootKey() error {
    // Generate new root key
    newRootKey, err := kh.generateRootKey()
    if err != nil {
        return err
    }

    // Re-encrypt all region keys with new root key
    for region, regionKey := range kh.regionKeys {
        // Decrypt with old root key
        decryptedKey, err := kh.rootKey.Decrypt(regionKey.EncryptedKey)
        if err != nil {
            return err
        }

        // Encrypt with new root key
        encryptedKey, err := newRootKey.Encrypt(decryptedKey)
        if err != nil {
            return err
        }

        // Update region key
        regionKey.EncryptedKey = encryptedKey
        kh.regionKeys[region] = regionKey

        // Persist to database
        db.Save(regionKey)
    }

    // Switch to new root key
    kh.rootKey = newRootKey

    return nil
}

Benefits:

  • Geographic isolation: EU data uses EU keys, stays in EU
  • Service isolation: Compromise of one service doesn’t expose all data
  • Efficient rotation: Rotate root key by re-encrypting region keys (not all data)
  • Access control: Grant service access to specific region keys only

Caching and Performance

KMS calls are expensive (latency and cost). Cache aggressively.

DEK Caching

Cache decrypted DEKs for frequent access:

type DEKCache struct {
    cache *lru.Cache
    ttl   time.Duration
    mu    sync.RWMutex
}

type CachedDEK struct {
    Key       []byte
    ExpiresAt time.Time
}

func NewDEKCache(size int, ttl time.Duration) *DEKCache {
    cache, _ := lru.New(size)
    return &DEKCache{
        cache: cache,
        ttl:   ttl,
    }
}

func (c *DEKCache) Get(encryptedDEK []byte) ([]byte, bool) {
    c.mu.RLock()
    defer c.mu.RUnlock()

    key := string(encryptedDEK)
    if val, ok := c.cache.Get(key); ok {
        cached := val.(*CachedDEK)
        if time.Now().Before(cached.ExpiresAt) {
            return cached.Key, true
        }
        // Expired, remove
        c.cache.Remove(key)
    }

    return nil, false
}

func (c *DEKCache) Put(encryptedDEK, dek []byte) {
    c.mu.Lock()
    defer c.mu.Unlock()

    key := string(encryptedDEK)
    cached := &CachedDEK{
        Key:       dek,
        ExpiresAt: time.Now().Add(c.ttl),
    }
    c.cache.Add(key, cached)
}

// Use in encryption service
func (s *EnvelopeEncryptionService) DecryptWithCache(data *EncryptedData) ([]byte, error) {
    // Try cache first
    if dek, ok := s.dekCache.Get(data.EncryptedDEK); ok {
        return decryptAESGCM(data.Ciphertext, dek)
    }

    // Cache miss, decrypt via KMS
    dek, err := s.kms.Decrypt(data.KeyID, data.EncryptedDEK)
    if err != nil {
        return nil, err
    }

    // Cache for future use
    s.dekCache.Put(data.EncryptedDEK, dek)

    return decryptAESGCM(data.Ciphertext, dek)
}

Performance impact:

  • Without cache: 10-50ms per decrypt (KMS call)
  • With cache: <1ms per decrypt (local operation)

Cache hit rate >95% for hot data.

Batch Encryption

Encrypt multiple records with the same DEK:

type BatchEncryption struct {
    DEK          []byte
    EncryptedDEK []byte
    Records      []*EncryptedRecord
}

type EncryptedRecord struct {
    ID         string
    Ciphertext []byte
}

func (s *EnvelopeEncryptionService) EncryptBatch(records [][]byte, keyID string) (*BatchEncryption, error) {
    // Generate one DEK for the batch
    dek := make([]byte, 32)
    if _, err := rand.Read(dek); err != nil {
        return nil, err
    }

    // Encrypt DEK with master key
    encryptedDEK, err := s.kms.Encrypt(keyID, dek)
    if err != nil {
        return nil, err
    }

    batch := &BatchEncryption{
        EncryptedDEK: encryptedDEK,
        Records:      make([]*EncryptedRecord, len(records)),
    }

    // Encrypt all records with same DEK
    for i, plaintext := range records {
        ciphertext, err := encryptAESGCM(plaintext, dek)
        if err != nil {
            return nil, err
        }

        batch.Records[i] = &EncryptedRecord{
            ID:         uuid.New().String(),
            Ciphertext: ciphertext,
        }
    }

    zero(dek)

    return batch, nil
}

Benefits:

  • One KMS call for many records
  • Throughput increases 10-100x for bulk operations

Use for batch ETL jobs, data imports, log ingestion.

Client-Side Encryption

Move encryption to clients for maximum security and performance:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Client  β”‚
β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
     β”‚ 1. Request DEK
     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  KMS Server  β”‚
β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
     β”‚ 2. Return encrypted DEK
     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Client  │──┐
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚ 3. Decrypt DEK locally
              β”‚ 4. Encrypt data with DEK
              β”‚ 5. Store encrypted data + encrypted DEK
              β–Ό
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚ Database β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Implementation:

// Client-side encryption library
type ClientEncryption struct {
    kmsClient  KMSClient
    dekCache   *DEKCache
}

func (c *ClientEncryption) Encrypt(plaintext []byte) (*EncryptedData, error) {
    // Get DEK from KMS (with caching)
    dek, encryptedDEK, err := c.getDEK()
    if err != nil {
        return nil, err
    }

    // Encrypt locally
    ciphertext, err := encryptAESGCM(plaintext, dek)
    if err != nil {
        return nil, err
    }

    return &EncryptedData{
        Ciphertext:   ciphertext,
        EncryptedDEK: encryptedDEK,
        KeyID:        c.keyID,
    }, nil
}

func (c *ClientEncryption) getDEK() ([]byte, []byte, error) {
    // Check cache
    if dek := c.dekCache.GetCurrent(); dek != nil {
        return dek.Plaintext, dek.Encrypted, nil
    }

    // Request new DEK from KMS
    resp, err := c.kmsClient.GenerateDataKey(c.keyID, 32)
    if err != nil {
        return nil, nil, err
    }

    // Cache DEK
    c.dekCache.SetCurrent(&DEK{
        Plaintext: resp.Plaintext,
        Encrypted: resp.Encrypted,
        ExpiresAt: time.Now().Add(1 * time.Hour),
    })

    return resp.Plaintext, resp.Encrypted, nil
}

Benefits:

  • Security: Data never transmitted unencrypted
  • Performance: No server-side encryption overhead
  • Scalability: Encryption load distributed to clients

Tradeoffs:

  • Clients need access to KMS
  • Key management complexity on client side
  • Need to trust client implementation

Key Rotation

Rotate keys regularly without downtime.

Versioned Keys

type VersionedKey struct {
    ID       string
    Version  int
    Key      []byte
    Active   bool
    CreatedAt time.Time
}

type KeyManager struct {
    keys map[string][]*VersionedKey // keyID -> versions
}

func (km *KeyManager) Encrypt(keyID string, plaintext []byte) (*EncryptedData, error) {
    // Get active version
    activeKey := km.getActiveKey(keyID)

    // Encrypt
    ciphertext, err := encryptAESGCM(plaintext, activeKey.Key)
    if err != nil {
        return nil, err
    }

    return &EncryptedData{
        Ciphertext: ciphertext,
        KeyID:      keyID,
        KeyVersion: activeKey.Version,
    }, nil
}

func (km *KeyManager) Decrypt(data *EncryptedData) ([]byte, error) {
    // Get specific version
    key := km.getKeyVersion(data.KeyID, data.KeyVersion)
    if key == nil {
        return nil, errors.New("key version not found")
    }

    return decryptAESGCM(data.Ciphertext, key.Key)
}

func (km *KeyManager) RotateKey(keyID string) error {
    // Generate new version
    newVersion := &VersionedKey{
        ID:      keyID,
        Version: km.getNextVersion(keyID),
        Key:     generateKey(32),
        Active:  true,
        CreatedAt: time.Now(),
    }

    // Deactivate old version
    oldKey := km.getActiveKey(keyID)
    oldKey.Active = false

    // Add new version
    km.keys[keyID] = append(km.keys[keyID], newVersion)

    // New encryptions use new version
    // Old decryptions still work with old version

    return nil
}

Background Re-encryption

Gradually re-encrypt data with new key:

type ReencryptionJob struct {
    keyID      string
    oldVersion int
    newVersion int
}

func (job *ReencryptionJob) Run() error {
    // Find records encrypted with old key version
    var records []EncryptedRecord
    db.Find(&records, "key_id = ? AND key_version = ?", job.keyID, job.oldVersion)

    for _, record := range records {
        // Decrypt with old key
        plaintext, err := decrypt(record.Ciphertext, job.keyID, job.oldVersion)
        if err != nil {
            log.Error("Decryption failed", "record", record.ID, "error", err)
            continue
        }

        // Encrypt with new key
        ciphertext, err := encrypt(plaintext, job.keyID, job.newVersion)
        if err != nil {
            log.Error("Encryption failed", "record", record.ID, "error", err)
            continue
        }

        // Update record
        record.Ciphertext = ciphertext
        record.KeyVersion = job.newVersion
        db.Save(&record)

        // Rate limit to avoid overload
        time.Sleep(10 * time.Millisecond)
    }

    return nil
}

Run as background job. Old data gradually migrates to new key.

Access Control and Audit

Encryption is useless without access control.

Policy-Based Access

type EncryptionPolicy struct {
    KeyID         string
    AllowedRoles  []string
    AllowedIPs    []string
    RequiresMFA   bool
}

type AccessRequest struct {
    Principal string
    Roles     []string
    IPAddress string
    MFAToken  string
}

func (p *EncryptionPolicy) Authorize(req *AccessRequest) error {
    // Check role
    if !contains(p.AllowedRoles, req.Roles) {
        return errors.New("role not authorized")
    }

    // Check IP
    if len(p.AllowedIPs) > 0 && !contains(p.AllowedIPs, []string{req.IPAddress}) {
        return errors.New("IP not authorized")
    }

    // Check MFA
    if p.RequiresMFA && req.MFAToken == "" {
        return errors.New("MFA required")
    }

    return nil
}

// Use in KMS
func (kms *KMS) Decrypt(req *DecryptRequest) ([]byte, error) {
    // Load policy
    policy := kms.getPolicy(req.KeyID)

    // Authorize
    if err := policy.Authorize(req.AccessRequest); err != nil {
        kms.auditLog(req, "DENIED", err.Error())
        return nil, err
    }

    // Decrypt
    plaintext, err := kms.decrypt(req.KeyID, req.Ciphertext)

    // Audit
    kms.auditLog(req, "ALLOWED", "")

    return plaintext, err
}

Audit Logging

Log every encryption operation:

type AuditLog struct {
    Timestamp   time.Time
    Operation   string // "ENCRYPT", "DECRYPT", "ROTATE"
    KeyID       string
    Principal   string
    IPAddress   string
    Result      string // "SUCCESS", "DENIED", "ERROR"
    ErrorMsg    string
}

func (kms *KMS) auditLog(req *Request, result, errorMsg string) {
    log := &AuditLog{
        Timestamp: time.Now(),
        Operation: req.Operation,
        KeyID:     req.KeyID,
        Principal: req.Principal,
        IPAddress: req.IPAddress,
        Result:    result,
        ErrorMsg:  errorMsg,
    }

    // Write to secure audit log
    kms.auditStore.Write(log)

    // Send to SIEM for monitoring
    kms.siem.Send(log)
}

Audit logs are immutable and tamper-evident.

Geographic Compliance

Keep data in specific regions for compliance (GDPR, data residency).

type RegionalKMS struct {
    regions map[string]*KMS
}

func (r *RegionalKMS) Encrypt(region string, plaintext []byte) (*EncryptedData, error) {
    kms, ok := r.regions[region]
    if !ok {
        return nil, errors.New("region not supported")
    }

    // Ensure KMS is in the correct region
    if kms.Region != region {
        return nil, errors.New("KMS region mismatch")
    }

    return kms.Encrypt(plaintext)
}

// Usage
regionalKMS := &RegionalKMS{
    regions: map[string]*KMS{
        "eu-west-1": NewKMS("eu-west-1", "https://kms.eu-west-1.amazonaws.com"),
        "us-east-1": NewKMS("us-east-1", "https://kms.us-east-1.amazonaws.com"),
    },
}

// EU data stays in EU
euData, _ := regionalKMS.Encrypt("eu-west-1", userData)

Monitoring and Alerting

Track encryption system health:

type EncryptionMetrics struct {
    EncryptionLatency   prometheus.Histogram
    DecryptionLatency   prometheus.Histogram
    KMSCallLatency      prometheus.Histogram
    CacheHitRate        prometheus.Gauge
    ErrorRate           prometheus.Counter
    KeyRotations        prometheus.Counter
}

func (s *EnvelopeEncryptionService) EncryptWithMetrics(plaintext []byte) (*EncryptedData, error) {
    start := time.Now()

    data, err := s.Encrypt(plaintext)

    duration := time.Since(start).Seconds()
    s.metrics.EncryptionLatency.Observe(duration)

    if err != nil {
        s.metrics.ErrorRate.Inc()
    }

    return data, err
}

Alert on anomalies:

  • High error rate
  • Increased latency
  • Low cache hit rate
  • Unusual access patterns

Production Architecture

Putting it all together:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Application Servers (Multiple Regions)     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Client Encryption Library          β”‚   β”‚
β”‚  β”‚  - DEK caching                      β”‚   β”‚
β”‚  β”‚  - Batch encryption                 β”‚   β”‚
β”‚  β”‚  - Retry logic                      β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
                   β”‚ HTTPS + mTLS
                   β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Regional KMS Services                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  EU-West-1  β”‚  US-East-1   β”‚  AP-SE-1  β”‚ β”‚
β”‚  β”‚             β”‚              β”‚           β”‚ β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”β”‚ β”‚
β”‚  β”‚  β”‚Region  β”‚ β”‚  β”‚Region  β”‚ β”‚ β”‚Region  β”‚β”‚ β”‚
β”‚  β”‚  β”‚Key     β”‚ β”‚  β”‚Key     β”‚ β”‚ β”‚Key     β”‚β”‚ β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
                   β”‚ HSM Connection
                   β–Ό
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚  HSM Cluster      β”‚
         β”‚  (Root Keys)      β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Components:

  1. Client Library: Handles encryption/decryption with caching
  2. Regional KMS: Multi-region deployment for performance and compliance
  3. HSM Cluster: Hardware security modules for root key protection
  4. Audit System: Comprehensive logging and monitoring

Performance Numbers

Our production metrics (based on cloud KMS services):

Envelope Encryption:

  • Encryption: 50,000 ops/sec/instance (with DEK caching)
  • Decryption: 100,000 ops/sec/instance (with DEK caching)
  • KMS calls: ~100-500/sec (cache hit rate >99%)

Batch Encryption:

  • Throughput: 500,000 records/sec
  • KMS calls: 10/sec (50,000 records per DEK)

Latency:

  • Local encryption/decryption: <1ms (p99)
  • KMS call: 10-50ms (p99)
  • End-to-end with cache: <2ms (p99)

Key Takeaways

  1. Use envelope encryption: Don’t encrypt directly with master keys
  2. Build key hierarchies: Layer keys for isolation and rotation
  3. Cache aggressively: DEK caching reduces KMS load by 99%
  4. Version keys: Enable rotation without breaking old data
  5. Audit everything: Log all encryption operations
  6. Regional deployment: Keep data in required jurisdictions
  7. Client-side when possible: Reduce server load and improve security

Start simple, optimize based on actual performance requirements.

In my next post, I’ll do a year-end review of 2016β€”the biggest trends in cloud-native security and what to expect in 2017.

Encryption at scale is possible. Build it right from the start.