Building High-Performance Microservices with Go

We’ve been gradually rewriting performance-critical components of our key management platform from Java to Go. This wasn’t about following trends - we had specific performance and operational requirements that Go addresses better than Java. After six months of production Go services, I want to share why we made this choice and what we’ve learned.

Why Go for Key Management Services?

Our initial platform was built entirely in Java. Java served us well, but we hit limitations:

Garbage collection pauses: JVM GC pauses could reach 100-200ms. For cryptographic operations where customers expect single-digit millisecond latency, this was problematic.

Memory overhead: JVM heap requires significant memory. Our smallest Java services needed 2GB heap. Go services use 50-200MB.

Startup time: JVM startup takes several seconds. Go services start in milliseconds. This matters for rapid scaling and fast deployment.

Container efficiency: In Kubernetes, small memory footprint means more pods per node. Go services pack more densely than Java services.

Concurrency model: Go’s goroutines and channels provide cleaner concurrency patterns than Java threads, important for highly concurrent key management services.

We didn’t rewrite everything - services that aren’t performance-critical remain in Java. But for hot paths (HSM proxy, policy evaluation, cryptographic operations), Go was clearly better.

The HSM Proxy Service

Our first production Go service was the HSM proxy, which fronts all HSM communication. Requirements:

Handle 10,000+ operations per second
Sub-millisecond latency for most operations
Maintain connection pools to multiple HSMs
Route operations based on load and HSM capabilities

In Java, this service was constantly fighting GC. Every operation allocated objects that needed collection. GC pauses caused latency spikes visible in our p99 metrics.

The Go rewrite reduced memory usage from 4GB to 400MB and eliminated GC-related latency spikes. Our p99 latency improved from 15ms to 3ms. p999 went from 50ms to 8ms.

Learning Go from Java

Coming from Java, Go felt simultaneously liberating and constraining:

No exceptions: Go uses explicit error returns. Every function that can fail returns (result, error). Initially felt verbose, but forces explicit error handling that caught bugs we would have missed.

No inheritance: Go has interfaces and composition but not class inheritance. This led to simpler, flatter designs without deep inheritance hierarchies.

Implicit interfaces: Types satisfy interfaces automatically without declaring it. This enables powerful abstractions without tight coupling.

Pointers: Go has pointers like C but garbage collection like Java. Best of both worlds for performance-sensitive code.

Standard library: Go’s standard library is excellent. HTTP server, JSON encoding, cryptographic operations - all built-in and performant.

The learning curve was shorter than expected. Productive Go code within weeks, idiomatic Go within a few months.

Goroutines and Concurrency

Go’s goroutines are a game-changer for concurrent services. Our services routinely run thousands of goroutines:

One goroutine per inbound request
Background goroutines for health checks
Goroutines managing HSM connection pools
Goroutines for metrics collection
Goroutines handling cleanup tasks

Creating thousands of goroutines is cheap (2KB stack vs. 1MB per thread in Java). We don’t need thread pools or complex async patterns - just spawn a goroutine.

Channels provide communication between goroutines without explicit locking:

requests := make(chan Request, 100)
results := make(chan Result, 100)

// Producer goroutines
for i := 0; i < 10; i++ {
    go func() {
        for req := range requests {
            result := processRequest(req)
            results <- result
        }
    }()
}

This pattern appears throughout our services. Clear, maintainable, and performant.

Error Handling Patterns

Go’s error handling takes discipline but yields reliable code. Our patterns:

Check every error: We review code specifically looking for unchecked errors. Lint tools (errcheck) catch most instances.

Wrap errors with context: Use fmt.Errorf to add context while propagating errors:

if err := hsm.Decrypt(key); err != nil {
    return fmt.Errorf("decrypt with key %s: %w", keyID, err)
}

Custom error types: For errors that need special handling, we define custom types implementing the error interface.

Error logging: Errors are logged at the point they’re handled, not at every propagation level.

This approach catches errors early and provides context for debugging without noisy logs.

Interfacing with C Libraries

HSM client libraries are typically C/C++. Go’s cgo allows calling C code, which we use for HSM integration.

Challenges we encountered:

Performance overhead: Crossing the Go/C boundary has overhead. We minimize cgo calls by batching operations where possible.

Memory management: C code doesn’t know about Go’s GC. Must manually free C-allocated memory using defer C.free().

Thread pinning: Cgo calls pin goroutines to OS threads. For long-running C calls, this limits concurrency. We use goroutine pools to manage this.

Debugging: Debugging cgo code is harder than pure Go. We isolate cgo to small, well-tested packages.

Despite challenges, cgo successfully bridges our Go services to C HSM libraries.

Dependency Management

Go’s dependency management has improved dramatically. We started with vendoring (copying dependencies into our repo), migrated to dep (experimental dependency tool), and now use Go modules (official solution as of Go 1.11).

Our practices:

Pinned versions: Production code uses specific versions of dependencies, not “latest.”

Dependency review: New dependencies are reviewed for security and code quality.

Minimal dependencies: We prefer standard library over third-party where possible. Fewer dependencies = fewer vulnerabilities and less maintenance.

Private modules: Our internal shared code is in private Go modules using Go module proxy.

This keeps dependencies manageable and builds reproducible.

Testing in Go

Go’s testing story is simpler than Java:

Built-in testing: No need for JUnit or similar frameworks. Standard testing package covers most needs.

Table-driven tests: Go’s struct syntax makes table-driven tests clean:

func TestDecrypt(t *testing.T) {
    tests := []struct {
        name      string
        input     []byte
        want      []byte
        wantErr   bool
    }{
        {"valid", []byte("encrypted"), []byte("decrypted"), false},
        {"invalid", []byte("bad"), nil, true},
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            got, err := Decrypt(tt.input)
            if (err != nil) != tt.wantErr {
                t.Errorf("got error %v, wantErr %v", err, tt.wantErr)
            }
            if !bytes.Equal(got, tt.want) {
                t.Errorf("got %v, want %v", got, tt.want)
            }
        })
    }
}

Interface mocking: Go interfaces make mocking straightforward without frameworks.

Benchmarking: Built-in benchmarking tools helped us optimize hot paths.

Our test coverage is as good or better than Java code, with simpler test code.

Deployment and Operations

Go services have operational advantages:

Single binary: Go compiles to a single static binary. Deployment is copying one file. No runtime dependencies, classpath issues, or version conflicts.

Cross-compilation: Build binaries for different platforms from one machine. Build Linux binaries on Mac for Docker images.

Small containers: Our Go service containers are 10-20MB (using Alpine base). Java containers are 200-300MB. Faster deployment, less bandwidth.

Fast startup: Go services start in milliseconds. Kubernetes health checks pass immediately. Enables rapid scaling.

Low resource usage: Go services run comfortably with 512MB memory limits in Kubernetes. Java services needed 2-4GB.

These advantages compound. We can run more services per node, scale faster, and deploy more frequently.

Performance Characteristics

Comparing our Go rewrite of the HSM proxy to the original Java version:

Throughput: 3x higher (10k ops/sec vs 3k ops/sec on same hardware)

Latency p50: 1ms vs 3ms Latency p99: 3ms vs 15ms Latency p999: 8ms vs 50ms

Memory usage: 400MB vs 4GB

CPU utilization: 40% lower at same load

The performance improvement is dramatic and directly impacts customer experience.

Where Java Still Makes Sense

We haven’t abandoned Java. It remains appropriate for:

CRUD services: Services that mainly shuffle data don’t benefit much from Go’s performance.

Existing code: Rewriting working code just for language change isn’t always justified.

Team expertise: Teams fluent in Java, not Go, are more productive in Java.

Library ecosystem: Some domains have richer Java libraries.

We’re pragmatic, not dogmatic. Right tool for the right job.

Challenges with Go

Go isn’t perfect. Challenges we’ve encountered:

Error handling verbosity: Explicit error checks everywhere makes code longer. You get used to it, but it’s real.

Generics absence: No generics (until Go 1.18 which is coming) means code duplication or interface{} type assertions. We’ve worked around this but it’s annoying.

Dependency on cgo: Our reliance on C libraries means cgo, which complicates builds and limits portability.

Smaller talent pool: Finding experienced Go developers is harder than Java developers.

Less mature libraries: Some domains have fewer Go libraries than Java. We’ve had to build more from scratch.

These are manageable but worth considering.

Team Adoption

Transitioning the team to Go required investment:

Training: Dedicated Go training sessions and documentation.

Mentorship: Experienced Go developers review code and provide guidance.

Shared libraries: Common patterns captured in shared libraries so teams don’t reinvent.

Best practices: Documented Go-specific best practices for our domain.

Gradual rollout: Not a flag day. Services migrate to Go as they’re rewritten, not all at once.

Team is now productive in both Java and Go, choosing appropriately for each service.

Looking Forward

We’ll continue expanding Go usage:

More services: Additional performance-critical services being rewritten.

Shared libraries: Building more shared Go libraries for common patterns.

Tooling: Internal tools increasingly written in Go for fast, distributable binaries.

Generics: Go 1.18 will add generics, addressing one of our pain points.

Go has become our default for new performance-critical services. It’s proven itself in production for the workloads we care about.

Key Takeaways

For teams considering Go for microservices:

Go excels for I/O-intensive, concurrent services
Expect significant performance and resource usage improvements
Learning curve from Java is manageable - weeks to productivity
Error handling requires discipline but yields reliable code
Single-binary deployment simplifies operations considerably
Goroutines make concurrent programming more accessible
Go isn’t appropriate for everything - be pragmatic
Invest in team training and shared libraries for consistency

Go has been transformative for our performance-critical services. The improved latency, reduced resource usage, and operational simplicity justify the investment in a new language and ecosystem. For teams building high-performance microservices, especially in cloud-native environments, Go deserves serious consideration.