Encryption is straightforward in theory: take plaintext, apply a cryptographic algorithm with a key, get ciphertext. In practice, managing encryption at scale is complex. The hard part isn’t the encryption itself—it’s managing the keys.
I’ve spent considerable time building encryption systems for distributed infrastructure, and key management is always the most challenging aspect. Today, I want to share what I’ve learned about key rotation, hierarchical key management, and practical patterns for encrypting data at scale.
The Key Management Problem
When you encrypt data, you create a dependency: you must keep the key safe forever, or the data becomes inaccessible. This creates several challenges:
Long-lived keys are risky: The longer a key exists, the greater the chance it’s compromised. Former employees might have had access. It might be in old backups. A vulnerability might have leaked it.
Key distribution is hard: How do you securely give keys to the services that need them? Configuration files? Environment variables? These aren’t secure at scale.
Rotation is painful: If you encrypt terabytes of data with one key, rotating that key means re-encrypting everything. This can take days and risks availability.
Compliance requirements: Regulations often mandate key rotation. GDPR, PCI DSS, and HIPAA all have key management requirements.
The solution is a combination of patterns: envelope encryption, automatic rotation, and hierarchical key management.
Envelope Encryption
The core pattern for scalable encryption is envelope encryption. Instead of encrypting all data with a single master key, you use a hierarchy:
- A master key (or Key Encryption Key - KEK) lives in a hardened system
- For each piece of data, generate a unique Data Encryption Key (DEK)
- Encrypt the data with the DEK
- Encrypt the DEK with the KEK
- Store the encrypted DEK alongside the encrypted data
This solves several problems at once:
- Rotating the KEK doesn’t require re-encrypting data, only re-encrypting DEKs
- Each piece of data has a unique key, limiting blast radius
- The master key never leaves the key management system
Here’s a practical implementation:
package encryption
import (
"crypto/aes"
"crypto/cipher"
"crypto/rand"
"encoding/binary"
"errors"
"io"
)
// Envelope contains an encrypted DEK and ciphertext
type Envelope struct {
KEKVersion uint32 // Which KEK version was used
EncryptedDEK []byte // DEK encrypted with KEK
Ciphertext []byte // Data encrypted with DEK
}
type EnvelopeEncryptor struct {
kms KeyManagementService
}
func (e *EnvelopeEncryptor) Encrypt(plaintext []byte, kekID string) (*Envelope, error) {
// Generate a random DEK
dek := make([]byte, 32) // 256-bit key
if _, err := io.ReadFull(rand.Reader, dek); err != nil {
return nil, err
}
defer zeroBytes(dek) // Clear DEK from memory when done
// Encrypt plaintext with DEK using AES-GCM
block, err := aes.NewCipher(dek)
if err != nil {
return nil, err
}
gcm, err := cipher.NewGCM(block)
if err != nil {
return nil, err
}
nonce := make([]byte, gcm.NonceSize())
if _, err := io.ReadFull(rand.Reader, nonce); err != nil {
return nil, err
}
ciphertext := gcm.Seal(nonce, nonce, plaintext, nil)
// Encrypt DEK with KEK via KMS
kekVersion, encryptedDEK, err := e.kms.Encrypt(kekID, dek)
if err != nil {
return nil, err
}
return &Envelope{
KEKVersion: kekVersion,
EncryptedDEK: encryptedDEK,
Ciphertext: ciphertext,
}, nil
}
func (e *EnvelopeEncryptor) Decrypt(envelope *Envelope, kekID string) ([]byte, error) {
// Decrypt DEK using KMS
dek, err := e.kms.Decrypt(kekID, envelope.KEKVersion, envelope.EncryptedDEK)
if err != nil {
return nil, err
}
defer zeroBytes(dek)
// Decrypt ciphertext with DEK
block, err := aes.NewCipher(dek)
if err != nil {
return nil, err
}
gcm, err := cipher.NewGCM(block)
if err != nil {
return nil, err
}
nonceSize := gcm.NonceSize()
if len(envelope.Ciphertext) < nonceSize {
return nil, errors.New("ciphertext too short")
}
nonce, ciphertext := envelope.Ciphertext[:nonceSize], envelope.Ciphertext[nonceSize:]
plaintext, err := gcm.Open(nil, nonce, ciphertext, nil)
if err != nil {
return nil, err
}
return plaintext, nil
}
func zeroBytes(b []byte) {
for i := range b {
b[i] = 0
}
}
The beauty of this approach: when you rotate the KEK, you just re-encrypt the DEKs, not the actual data. If you have a terabyte database, rotation might only touch a few kilobytes of encrypted DEKs.
Key Rotation Strategies
There are two types of rotation: proactive (scheduled) and reactive (emergency).
Proactive Rotation
Rotate keys on a schedule to limit exposure window. For example, rotate KEKs every 90 days:
type KeyRotationScheduler struct {
kms KeyManagementService
rotationPeriod time.Duration
}
func (s *KeyRotationScheduler) RotateIfNeeded(kekID string) error {
metadata, err := s.kms.GetKeyMetadata(kekID)
if err != nil {
return err
}
if time.Since(metadata.CreatedAt) < s.rotationPeriod {
return nil // Not time to rotate yet
}
log.Printf("Rotating KEK %s (age: %v)", kekID, time.Since(metadata.CreatedAt))
// Create new version of the KEK
newVersion, err := s.kms.CreateKeyVersion(kekID)
if err != nil {
return err
}
// Update default version for new encryptions
if err := s.kms.SetPrimaryVersion(kekID, newVersion); err != nil {
return err
}
// Schedule re-encryption of existing DEKs (can be async)
go s.reencryptDEKs(kekID, newVersion)
log.Printf("KEK %s rotated to version %d", kekID, newVersion)
return nil
}
func (s *KeyRotationScheduler) reencryptDEKs(kekID string, newVersion uint32) {
// Find all envelopes encrypted with old versions
envelopes, err := s.findEnvelopesForKey(kekID)
if err != nil {
log.Printf("Failed to find envelopes: %v", err)
return
}
for _, env := range envelopes {
if env.KEKVersion == newVersion {
continue // Already using latest version
}
// Decrypt DEK with old version, re-encrypt with new version
if err := s.reencryptEnvelope(env, kekID, newVersion); err != nil {
log.Printf("Failed to re-encrypt envelope: %v", err)
// Continue with others, log failures for retry
}
}
log.Printf("Re-encrypted %d envelopes to KEK version %d", len(envelopes), newVersion)
}
This approach rotates keys automatically without downtime. Old versions remain available for decryption while you gradually migrate data.
Reactive Rotation
When a key might be compromised, rotate immediately:
func (s *KeyRotationScheduler) EmergencyRotation(kekID string, reason string) error {
log.Printf("EMERGENCY ROTATION: KEK %s - Reason: %s", kekID, reason)
// Create new version immediately
newVersion, err := s.kms.CreateKeyVersion(kekID)
if err != nil {
return err
}
// Switch to new version for all new operations
if err := s.kms.SetPrimaryVersion(kekID, newVersion); err != nil {
return err
}
// Disable old versions to prevent their use
oldVersions, err := s.kms.ListKeyVersions(kekID)
if err != nil {
return err
}
for _, v := range oldVersions {
if v != newVersion {
if err := s.kms.DisableKeyVersion(kekID, v); err != nil {
log.Printf("Failed to disable version %d: %v", v, err)
}
}
}
// Re-encrypt all DEKs synchronously (this is urgent)
if err := s.reencryptAllDEKsSync(kekID, newVersion); err != nil {
return err
}
// Send alerts
s.sendSecurityAlert("Key Rotation", fmt.Sprintf(
"Emergency rotation completed for %s. Reason: %s", kekID, reason,
))
return nil
}
Hierarchical Key Management
For large systems, use a key hierarchy:
Root Key (hardware security module)
├─ Region KEK (us-east-1)
│ ├─ Service KEK (payment-service)
│ │ ├─ DEK (record 1)
│ │ ├─ DEK (record 2)
│ │ └─ DEK (record 3)
│ └─ Service KEK (user-service)
│ └─ DEKs...
└─ Region KEK (eu-west-1)
└─ Service KEKs...
This allows granular rotation and access control. You can rotate a service’s KEK without affecting other services, or rotate a region’s keys independently.
Key Management Service Interface
Abstract your KMS behind an interface so you can swap implementations:
type KeyManagementService interface {
// Create a new encryption key
CreateKey(keyID string, purpose KeyPurpose) error
// Create a new version of an existing key
CreateKeyVersion(keyID string) (uint32, error)
// Encrypt data with specified key (returns version used and ciphertext)
Encrypt(keyID string, plaintext []byte) (version uint32, ciphertext []byte, err error)
// Decrypt data with specified key version
Decrypt(keyID string, version uint32, ciphertext []byte) ([]byte, error)
// Get key metadata
GetKeyMetadata(keyID string) (*KeyMetadata, error)
// Set which version is used for new encryptions
SetPrimaryVersion(keyID string, version uint32) error
// Disable a key version (for rotation)
DisableKeyVersion(keyID string, version uint32) error
// List all versions of a key
ListKeyVersions(keyID string) ([]uint32, error)
}
type KeyPurpose string
const (
PurposeEncryption KeyPurpose = "encryption"
PurposeSigning KeyPurpose = "signing"
)
type KeyMetadata struct {
KeyID string
Purpose KeyPurpose
CurrentVersion uint32
CreatedAt time.Time
RotationPeriod time.Duration
}
This interface can be implemented by cloud KMS services (AWS KMS, Google Cloud KMS, Azure Key Vault) or on-premises solutions like HashiCorp Vault.
Practical Encryption Patterns
Database Column Encryption
Encrypt sensitive columns while keeping the rest queryable:
type User struct {
ID string
Email string
EncryptedSSN []byte // encrypted
SSNKeyVersion uint32
CreatedAt time.Time
}
func (s *UserService) CreateUser(email, ssn string) error {
// Encrypt SSN
envelope, err := s.encryptor.Encrypt([]byte(ssn), "user-data-key")
if err != nil {
return err
}
// Serialize envelope
envelopeBytes, err := serializeEnvelope(envelope)
if err != nil {
return err
}
user := &User{
ID: generateID(),
Email: email,
EncryptedSSN: envelopeBytes,
SSNKeyVersion: envelope.KEKVersion,
CreatedAt: time.Now(),
}
return s.db.Create(user)
}
func (s *UserService) GetUserSSN(userID string) (string, error) {
var user User
if err := s.db.Find(&user, "id = ?", userID); err != nil {
return "", err
}
envelope, err := deserializeEnvelope(user.EncryptedSSN)
if err != nil {
return "", err
}
plaintext, err := s.encryptor.Decrypt(envelope, "user-data-key")
if err != nil {
return "", err
}
return string(plaintext), nil
}
File Encryption
For large files, use streaming encryption:
func EncryptFile(src io.Reader, dst io.Writer, kms KeyManagementService, kekID string) error {
// Generate DEK
dek := make([]byte, 32)
if _, err := io.ReadFull(rand.Reader, dek); err != nil {
return err
}
defer zeroBytes(dek)
// Create cipher
block, err := aes.NewCipher(dek)
if err != nil {
return err
}
gcm, err := cipher.NewGCM(block)
if err != nil {
return err
}
// Encrypt DEK
version, encryptedDEK, err := kms.Encrypt(kekID, dek)
if err != nil {
return err
}
// Write header: KEK version, encrypted DEK length, encrypted DEK
if err := binary.Write(dst, binary.LittleEndian, version); err != nil {
return err
}
if err := binary.Write(dst, binary.LittleEndian, uint32(len(encryptedDEK))); err != nil {
return err
}
if _, err := dst.Write(encryptedDEK); err != nil {
return err
}
// Encrypt file content in chunks
buffer := make([]byte, 64*1024) // 64KB chunks
nonce := make([]byte, gcm.NonceSize())
for {
n, err := src.Read(buffer)
if n > 0 {
// Generate unique nonce for this chunk
if _, err := io.ReadFull(rand.Reader, nonce); err != nil {
return err
}
ciphertext := gcm.Seal(nonce, nonce, buffer[:n], nil)
if _, err := dst.Write(ciphertext); err != nil {
return err
}
}
if err == io.EOF {
break
}
if err != nil {
return err
}
}
return nil
}
Key Access Control
Control which services can use which keys:
type KeyAccessPolicy struct {
KeyID string
AllowedServices []string
AllowedActions []KeyAction
}
type KeyAction string
const (
ActionEncrypt KeyAction = "encrypt"
ActionDecrypt KeyAction = "decrypt"
)
func (p *KeyAccessPolicy) CanAccess(serviceID string, action KeyAction) bool {
// Check if service is allowed
serviceAllowed := false
for _, s := range p.AllowedServices {
if s == serviceID {
serviceAllowed = true
break
}
}
if !serviceAllowed {
return false
}
// Check if action is allowed
for _, a := range p.AllowedActions {
if a == action {
return true
}
}
return false
}
// Wrap KMS with access control
type AccessControlledKMS struct {
kms KeyManagementService
policies map[string]*KeyAccessPolicy
identity string // This service's identity
}
func (k *AccessControlledKMS) Encrypt(keyID string, plaintext []byte) (uint32, []byte, error) {
policy, ok := k.policies[keyID]
if !ok {
return 0, nil, errors.New("no policy for key")
}
if !policy.CanAccess(k.identity, ActionEncrypt) {
return 0, nil, errors.New("access denied")
}
return k.kms.Encrypt(keyID, plaintext)
}
Monitoring and Auditing
Log all key operations for security auditing:
type KeyOperation struct {
Timestamp time.Time
Operation string
KeyID string
KeyVersion uint32
ServiceID string
Success bool
ErrorMsg string
}
func (k *AuditedKMS) Decrypt(keyID string, version uint32, ciphertext []byte) ([]byte, error) {
op := KeyOperation{
Timestamp: time.Now(),
Operation: "decrypt",
KeyID: keyID,
KeyVersion: version,
ServiceID: k.identity,
}
plaintext, err := k.kms.Decrypt(keyID, version, ciphertext)
if err != nil {
op.Success = false
op.ErrorMsg = err.Error()
k.logOperation(op)
return nil, err
}
op.Success = true
k.logOperation(op)
return plaintext, nil
}
Alert on suspicious patterns:
- High volume of decryption failures
- Access to keys outside normal hours
- Decryption of old key versions
- Access from unexpected services
Best Practices
Use strong algorithms: AES-256-GCM is the current standard. Avoid deprecated algorithms like DES, RC4, or MD5.
Generate keys cryptographically: Use crypto/rand, never math/rand or predictable sources.
Clear keys from memory: Overwrite key material with zeros when done.
Separate key storage: Never store keys alongside encrypted data. Use a dedicated KMS.
Automate rotation: Manual rotation doesn’t scale. Build it into your system.
Plan for compromise: Have an emergency rotation procedure ready.
Encrypt backups: Backed-up data should be encrypted with the same rigor as production data.
Looking Forward
Encryption and key management continue to evolve. We’re seeing:
- Hardware security modules becoming more accessible
- Homomorphic encryption allowing computation on encrypted data
- Quantum-resistant algorithms preparing for post-quantum cryptography
- Better tooling for automated key lifecycle management
For any system handling sensitive data, robust encryption and key management are essential. The patterns I’ve shared—envelope encryption, automatic rotation, hierarchical keys—are battle-tested and scalable.
Start with a good KMS, implement envelope encryption, and automate rotation from the beginning. It’s much harder to retrofit security than to build it in from the start.