Multi-tenancy is fundamental to SaaS architectures, but implementing it correctly requires careful consideration of isolation, performance, cost, and operational complexity. After building and operating multi-tenant systems serving thousands of organizations, I’ve learned which patterns work in different scenarios and how to navigate the inherent trade-offs.
Understanding Multi-Tenancy
Multi-tenancy means serving multiple customers (tenants) from a shared infrastructure while ensuring data isolation, security, and customization. The key challenges:
- Data Isolation: Preventing tenant A from accessing tenant B’s data
- Performance Isolation: Preventing one tenant from affecting others
- Resource Allocation: Fairly distributing compute and storage
- Customization: Supporting tenant-specific configurations
- Cost Efficiency: Sharing infrastructure to reduce costs
- Operational Simplicity: Managing many tenants efficiently
Isolation Models
Database Per Tenant
Each tenant gets a dedicated database:
# Tenant routing configuration
tenants:
tenant-a:
database:
host: db-tenant-a.example.com
name: tenant_a_prod
tenant-b:
database:
host: db-tenant-b.example.com
name: tenant_b_prod
Application code:
type TenantResolver struct {
configs map[string]*DatabaseConfig
}
func (tr *TenantResolver) GetConnection(tenantID string) (*sql.DB, error) {
config, ok := tr.configs[tenantID]
if !ok {
return nil, fmt.Errorf("unknown tenant: %s", tenantID)
}
connStr := fmt.Sprintf(
"host=%s dbname=%s user=%s password=%s",
config.Host,
config.Name,
config.User,
config.Password,
)
return sql.Open("postgres", connStr)
}
func (s *Service) CreateUser(ctx context.Context, tenantID string, user *User) error {
db, err := s.tenantResolver.GetConnection(tenantID)
if err != nil {
return err
}
defer db.Close()
_, err = db.ExecContext(ctx,
"INSERT INTO users (id, name, email) VALUES ($1, $2, $3)",
user.ID, user.Name, user.Email,
)
return err
}
Advantages:
- Maximum data isolation
- Easy backup/restore per tenant
- Simple tenant migration
- Can use different database versions per tenant
Disadvantages:
- High operational overhead
- Expensive at scale
- Complex connection pool management
- Schema migrations across many databases
Schema Per Tenant
Multiple tenants in one database, separate schemas:
-- Create schemas for each tenant
CREATE SCHEMA tenant_a;
CREATE SCHEMA tenant_b;
-- Grant permissions
GRANT USAGE ON SCHEMA tenant_a TO app_user;
GRANT USAGE ON SCHEMA tenant_b TO app_user;
-- Tables in each schema
CREATE TABLE tenant_a.users (
id UUID PRIMARY KEY,
name VARCHAR(255),
email VARCHAR(255)
);
CREATE TABLE tenant_b.users (
id UUID PRIMARY KEY,
name VARCHAR(255),
email VARCHAR(255)
);
Application implementation:
func (s *Service) CreateUser(ctx context.Context, tenantID string, user *User) error {
schema := fmt.Sprintf("tenant_%s", tenantID)
// Set search path for this connection
_, err := s.db.ExecContext(ctx, fmt.Sprintf("SET search_path TO %s", schema))
if err != nil {
return err
}
_, err = s.db.ExecContext(ctx,
"INSERT INTO users (id, name, email) VALUES ($1, $2, $3)",
user.ID, user.Name, user.Email,
)
return err
}
Advantages:
- Good isolation with shared infrastructure
- Easier operations than database-per-tenant
- Single connection pool
Disadvantages:
- Schema limit constraints (PostgreSQL: thousands)
- More complex than shared schema
- Migrations still complex
Shared Schema with Tenant Column
All tenants share tables with a tenant_id column:
CREATE TABLE users (
id UUID PRIMARY KEY,
tenant_id UUID NOT NULL,
name VARCHAR(255),
email VARCHAR(255),
created_at TIMESTAMP DEFAULT NOW()
);
-- Critical: Index on tenant_id
CREATE INDEX idx_users_tenant_id ON users(tenant_id);
-- Row-level security
ALTER TABLE users ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON users
USING (tenant_id = current_setting('app.current_tenant')::UUID);
Application implementation:
type TenantMiddleware struct {
next http.Handler
}
func (tm *TenantMiddleware) ServeHTTP(w http.ResponseWriter, r *http.Request) {
tenantID := r.Header.Get("X-Tenant-ID")
if tenantID == "" {
http.Error(w, "Missing tenant ID", http.StatusBadRequest)
return
}
// Add tenant to context
ctx := context.WithValue(r.Context(), "tenant_id", tenantID)
tm.next.ServeHTTP(w, r.WithContext(ctx))
}
func (s *Service) CreateUser(ctx context.Context, user *User) error {
tenantID := ctx.Value("tenant_id").(string)
// CRITICAL: Always include tenant_id in queries
_, err := s.db.ExecContext(ctx,
"INSERT INTO users (id, tenant_id, name, email) VALUES ($1, $2, $3, $4)",
user.ID, tenantID, user.Name, user.Email,
)
return err
}
func (s *Service) GetUser(ctx context.Context, userID string) (*User, error) {
tenantID := ctx.Value("tenant_id").(string)
var user User
// CRITICAL: Always filter by tenant_id
err := s.db.QueryRowContext(ctx,
"SELECT id, name, email FROM users WHERE id = $1 AND tenant_id = $2",
userID, tenantID,
).Scan(&user.ID, &user.Name, &user.Email)
return &user, err
}
Advantages:
- Maximum efficiency
- Simple operations
- Unlimited tenants
- Easy cross-tenant analytics
Disadvantages:
- Risk of tenant data leakage
- Must enforce tenant_id everywhere
- Noisy neighbor problems
- Complex backup/restore per tenant
Kubernetes Multi-Tenancy
Namespace Per Tenant
# tenant-a-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: tenant-a
labels:
tenant: tenant-a
---
apiVersion: v1
kind: ResourceQuota
metadata:
name: tenant-a-quota
namespace: tenant-a
spec:
hard:
requests.cpu: "10"
requests.memory: 20Gi
limits.cpu: "20"
limits.memory: 40Gi
pods: "50"
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-other-tenants
namespace: tenant-a
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
tenant: tenant-a
egress:
- to:
- namespaceSelector:
matchLabels:
tenant: tenant-a
Deployment per tenant:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
namespace: tenant-a
spec:
replicas: 3
selector:
matchLabels:
app: myapp
tenant: tenant-a
template:
metadata:
labels:
app: myapp
tenant: tenant-a
spec:
containers:
- name: app
image: myapp:latest
env:
- name: TENANT_ID
value: "tenant-a"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: tenant-a-db
key: url
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
Shared Cluster with Pod Security
Single namespace, isolation via Pod Security:
apiVersion: v1
kind: Pod
metadata:
name: tenant-a-app
namespace: production
labels:
tenant: tenant-a
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: myapp:latest
env:
- name: TENANT_ID
value: "tenant-a"
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
Tenant Context Propagation
Ensure tenant context flows through the entire stack:
// Middleware to extract tenant
func TenantMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
var tenantID string
// Try JWT claim first
if token := extractToken(r); token != nil {
tenantID = token.TenantID
}
// Fallback to header
if tenantID == "" {
tenantID = r.Header.Get("X-Tenant-ID")
}
// Validate tenant exists
if !isValidTenant(tenantID) {
http.Error(w, "Invalid tenant", http.StatusUnauthorized)
return
}
ctx := context.WithValue(r.Context(), "tenant_id", tenantID)
next.ServeHTTP(w, r.WithContext(ctx))
})
}
// Propagate to downstream services
func (c *HTTPClient) Do(ctx context.Context, req *http.Request) (*http.Response, error) {
tenantID := ctx.Value("tenant_id").(string)
// Add tenant header
req.Header.Set("X-Tenant-ID", tenantID)
// Add to distributed tracing
span := trace.SpanFromContext(ctx)
span.SetAttributes(attribute.String("tenant.id", tenantID))
return c.client.Do(req)
}
Rate Limiting Per Tenant
Prevent noisy neighbors:
import "golang.org/x/time/rate"
type TenantRateLimiter struct {
limiters sync.Map // map[string]*rate.Limiter
rate rate.Limit
burst int
}
func NewTenantRateLimiter(r rate.Limit, b int) *TenantRateLimiter {
return &TenantRateLimiter{
rate: r,
burst: b,
}
}
func (trl *TenantRateLimiter) getLimiter(tenantID string) *rate.Limiter {
limiter, exists := trl.limiters.Load(tenantID)
if !exists {
limiter = rate.NewLimiter(trl.rate, trl.burst)
trl.limiters.Store(tenantID, limiter)
}
return limiter.(*rate.Limiter)
}
func (trl *TenantRateLimiter) Allow(tenantID string) bool {
return trl.getLimiter(tenantID).Allow()
}
// Middleware
func RateLimitMiddleware(trl *TenantRateLimiter) func(http.Handler) http.Handler {
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
tenantID := r.Context().Value("tenant_id").(string)
if !trl.Allow(tenantID) {
http.Error(w, "Rate limit exceeded", http.StatusTooManyRequests)
return
}
next.ServeHTTP(w, r)
})
}
}
Tenant-Specific Configuration
Support customization:
type TenantConfig struct {
Features map[string]bool
Limits Limits
Customization Customization
}
type Limits struct {
MaxUsers int
MaxStorage int64
MaxRequests int
}
type Customization struct {
LogoURL string
Theme string
Domain string
}
type ConfigStore struct {
cache map[string]*TenantConfig
db *sql.DB
mu sync.RWMutex
}
func (cs *ConfigStore) Get(ctx context.Context, tenantID string) (*TenantConfig, error) {
cs.mu.RLock()
if config, ok := cs.cache[tenantID]; ok {
cs.mu.RUnlock()
return config, nil
}
cs.mu.RUnlock()
// Load from database
config, err := cs.loadFromDB(ctx, tenantID)
if err != nil {
return nil, err
}
// Cache it
cs.mu.Lock()
cs.cache[tenantID] = config
cs.mu.Unlock()
return config, nil
}
// Usage
func (s *Service) CreateUser(ctx context.Context, user *User) error {
tenantID := ctx.Value("tenant_id").(string)
config, err := s.configStore.Get(ctx, tenantID)
if err != nil {
return err
}
// Check tenant limits
currentUsers, err := s.countUsers(ctx, tenantID)
if err != nil {
return err
}
if currentUsers >= config.Limits.MaxUsers {
return fmt.Errorf("user limit exceeded")
}
// Check feature flags
if user.Role == "admin" && !config.Features["admin_users"] {
return fmt.Errorf("admin users not enabled for this tenant")
}
// Create user
return s.createUser(ctx, tenantID, user)
}
Monitoring Multi-Tenant Systems
Track metrics per tenant:
import "github.com/prometheus/client_golang/prometheus"
var (
requestsPerTenant = prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "http_requests_total",
Help: "Total HTTP requests",
},
[]string{"tenant", "method", "endpoint", "status"},
)
latencyPerTenant = prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "Request duration",
Buckets: prometheus.DefBuckets,
},
[]string{"tenant", "method", "endpoint"},
)
activeUsersPerTenant = prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "active_users",
Help: "Currently active users",
},
[]string{"tenant"},
)
)
func init() {
prometheus.MustRegister(requestsPerTenant)
prometheus.MustRegister(latencyPerTenant)
prometheus.MustRegister(activeUsersPerTenant)
}
func MetricsMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
start := time.Now()
tenantID := r.Context().Value("tenant_id").(string)
wrapped := &statusRecorder{ResponseWriter: w, statusCode: 200}
next.ServeHTTP(wrapped, r)
duration := time.Since(start).Seconds()
requestsPerTenant.WithLabelValues(
tenantID,
r.Method,
r.URL.Path,
fmt.Sprintf("%d", wrapped.statusCode),
).Inc()
latencyPerTenant.WithLabelValues(
tenantID,
r.Method,
r.URL.Path,
).Observe(duration)
})
}
Cost Allocation
Track resource usage per tenant:
type UsageTracker struct {
db *sql.DB
}
type Usage struct {
TenantID string
Period time.Time
StorageBytes int64
RequestCount int64
CPUSeconds float64
NetworkBytes int64
}
func (ut *UsageTracker) Record(ctx context.Context, usage *Usage) error {
_, err := ut.db.ExecContext(ctx, `
INSERT INTO tenant_usage (
tenant_id, period, storage_bytes, request_count, cpu_seconds, network_bytes
) VALUES ($1, $2, $3, $4, $5, $6)
ON CONFLICT (tenant_id, period)
DO UPDATE SET
storage_bytes = tenant_usage.storage_bytes + EXCLUDED.storage_bytes,
request_count = tenant_usage.request_count + EXCLUDED.request_count,
cpu_seconds = tenant_usage.cpu_seconds + EXCLUDED.cpu_seconds,
network_bytes = tenant_usage.network_bytes + EXCLUDED.network_bytes
`, usage.TenantID, usage.Period, usage.StorageBytes, usage.RequestCount,
usage.CPUSeconds, usage.NetworkBytes)
return err
}
func (ut *UsageTracker) Calculate(ctx context.Context, tenantID string, period time.Time) (float64, error) {
var usage Usage
err := ut.db.QueryRowContext(ctx, `
SELECT storage_bytes, request_count, cpu_seconds, network_bytes
FROM tenant_usage
WHERE tenant_id = $1 AND period = $2
`, tenantID, period).Scan(
&usage.StorageBytes,
&usage.RequestCount,
&usage.CPUSeconds,
&usage.NetworkBytes,
)
if err != nil {
return 0, err
}
// Cost calculation
storageCost := float64(usage.StorageBytes) / (1024*1024*1024) * 0.10 // $0.10/GB
requestCost := float64(usage.RequestCount) / 1000000 * 0.40 // $0.40/M requests
computeCost := usage.CPUSeconds / 3600 * 0.05 // $0.05/CPU hour
networkCost := float64(usage.NetworkBytes) / (1024*1024*1024) * 0.09 // $0.09/GB
return storageCost + requestCost + computeCost + networkCost, nil
}
Choosing the Right Model
Decision matrix:
Database-per-tenant when:
- Strict regulatory requirements
- Small number of large tenants
- Need different database versions
- Tenant migration is common
Schema-per-tenant when:
- Moderate isolation needs
- Medium number of tenants (hundreds)
- Shared infrastructure preferred
- Some customization needed
Shared-schema when:
- Large number of small tenants (thousands+)
- Cost efficiency critical
- Strong application-level controls
- Standardized experience
Kubernetes namespace-per-tenant when:
- Compute isolation critical
- Different resource requirements
- Network isolation needed
- Willing to pay overhead
Conclusion
Multi-tenancy is about balancing competing concerns: isolation vs efficiency, customization vs standardization, complexity vs cost. There’s no one-size-fits-all solution.
Start with the simplest model that meets your requirements. Many successful SaaS companies use shared-schema models with strong application-level controls. Reserve more complex isolation for tenants that truly need it.
Key principles:
- Enforce tenant context everywhere
- Monitor per-tenant metrics
- Implement resource limits
- Test isolation thoroughly
- Plan for growth
- Automate tenant provisioning
The right architecture depends on your specific requirements, but understanding the trade-offs allows you to make informed decisions as your multi-tenant system evolves.