From Fork to Framework, Part 4: Why Go Won the Backend
Goroutines, sqlc, and 10-megabyte containers. Here is why Go is purpose-built for an LLM gateway and multi-tenant AI platform.

The decision to adopt Go was driven by architectural foresight, not a reaction to production fires. We had built the same backend three times in three different languages and understood exactly where each one broke down. Python added latency to streaming proxy workloads. Ruby’s memory model could not sustain thousands of concurrent connections. Both produced container images measured in hundreds of megabytes. Go eliminates all three problems with a single design primitive: the goroutine.
This is the story of a deliberate migration, executed in a specific order, with a specific thesis about where Go’s strengths matter most.
The Goroutine Advantage
A goroutine starts at approximately 2 kilobytes of stack memory. A Puma worker, backed by an OS thread, reserves roughly 1 megabyte. At scale, this difference is not incremental. It is categorical.
An LLM gateway holds open connections while tokens stream from upstream providers to downstream clients. Each connection lives for seconds to minutes. Under load, a gateway serving a thousand concurrent streaming connections in Rails requires a thousand Puma workers at roughly 150 megabytes each. The same workload in Go requires a thousand goroutines at roughly 2 megabytes total. That is not a minor optimization. It is a 75x reduction in memory consumption for the same concurrency level.
Go’s runtime multiplexes goroutines across a small number of OS threads using a work-stealing scheduler. When a goroutine blocks on network IO, which is the dominant operation in an LLM streaming proxy, the runtime parks it and schedules another goroutine on the same thread. There is no Global Interpreter Lock. There is no thread-per-request ceiling. The concurrency model was designed for exactly this workload pattern.
Benchmarks That Mattered
We did not adopt Go based on microbenchmarks. We adopted it based on benchmarks that matched our production workload profile.
Bifrost, a Go-based LLM gateway, published benchmarks showing 54x lower P99 latency and 9.5x higher throughput compared to Python equivalents under sustained streaming load. These numbers aligned with what we observed in our own load testing during the Rails phase: as concurrent streaming connections increased, Python and Ruby degraded predictably while Go implementations maintained flat latency curves.
TechEmpower’s Round 23 framework benchmarks provided a broader perspective. Go frameworks achieved approximately 20x baseline throughput. Rails achieved approximately 2.5x. That is an 8x gap in raw request handling capacity before accounting for the streaming connection pattern, which widens the gap further because Go’s advantage compounds with connection duration.
| Metric | Go (chi + pgx) | Rails 8 (Puma) | Difference |
|---|---|---|---|
| Throughput (requests/sec, TechEmpower R23 baseline multiple) | ~20x | ~2.5x | 8x advantage |
| P99 latency under streaming load | Baseline | 54x higher | 54x advantage |
| Memory per concurrent connection | ~2 KB (goroutine) | ~150 MB (Puma worker) | ~75,000x advantage |
| Container image size | 10-20 MB (distroless) | 600+ MB | 30-60x smaller |
| Cold start time | Milliseconds | Seconds | Orders of magnitude |
| Type safety | Compile-time (sqlc, Go compiler) | Runtime (ActiveRecord) | Compile-time |
These numbers told a clear story. For a product whose core workload is proxying long-lived streaming connections, Go is not marginally better. It is a different category of tool.
sqlc: Compile-Time SQL
ActiveRecord discovers the database schema at runtime. It reads column names and types when the application boots, then generates methods dynamically. This is convenient and fast for development. It is also a source of production errors that no test suite can fully prevent, because the contract between application code and database schema is implicit.
sqlc takes the opposite approach. You write raw SQL queries in .sql files. sqlc reads those files alongside your PostgreSQL schema and generates type-safe Go code at compile time. If a query references a column that does not exist, the build fails. If a query returns columns that do not match the expected Go struct, the build fails. The database contract is explicit and verified before the code runs.
This mattered for our migration because the PostgreSQL schema we built during the Rails phase transferred directly. The same tables, the same columns, the same indexes, the same constraints. We wrote .sql files containing the queries our Rails models had executed, ran sqlc generate, and received type-safe Go functions that operated on the same data. goose handled SQL migrations using the same migration pattern Rails had established, just with raw SQL files instead of Ruby DSL.
The pgx driver gave us native PostgreSQL protocol support with connection pooling, prepared statement caching, and native pgvector support for our embedding workloads. No ORM abstraction layer. No query builder. Direct SQL with compile-time guarantees.
The Stack
Both services follow the same architectural pattern. The chi router handles HTTP routing with middleware for authentication, rate limiting, and request logging. Handlers call sqlc-generated functions for database operations. River provides PostgreSQL-backed background job processing, following the same philosophy as Rails’ Solid Queue: no Redis, no external broker, just PostgreSQL tables. nhooyr.io/websocket handles WebSocket connections for real-time streaming.
The entire application compiles to a single static binary. That binary is copied into a distroless container image that contains nothing except the binary and CA certificates. No operating system shell. No package manager. No runtime. The attack surface is minimal and the image size reflects it.
PostgreSQL-Only Infrastructure
One of the most important continuities between the Rails and Go phases is the infrastructure topology. Both use PostgreSQL as the sole external dependency. Rails achieved this through Solid Queue, Solid Cache, and Solid Cable. Go achieves it through River for job queues, pgx for connection management, and direct PostgreSQL queries for everything else.
This was deliberate. We evaluated Redis-backed alternatives at every phase and consistently concluded that the operational complexity of a second stateful dependency was not justified by its performance advantages for our workload. PostgreSQL’s LISTEN/NOTIFY provides real-time event delivery. PostgreSQL’s row-level locking provides job queue semantics. The additional latency compared to Redis is measured in single-digit milliseconds, which is invisible against LLM response times measured in seconds.
The Migration Strategy
We did not migrate both products simultaneously. The order was deliberate and the reasoning was specific.
| Phase | Product | Rationale | Timeline |
|---|---|---|---|
| Phase 1 | AOSentry (gateway) | Highest throughput requirements. Smallest application surface area. Most to gain from goroutines. Streaming proxy workload is Go’s strongest use case. | Active: Mar 2026 |
| Phase 2 | AODex (platform) | Larger feature surface. More complex business logic. Benefits from patterns established in AOSentry migration. | Planning: TRDs written Mar 21, initial commits Mar 24 |
AOSentry was the right first target for three reasons. First, it has the highest throughput requirements. Every LLM request in the system flows through AOSentry, making it the hottest path in the architecture. Second, it has the smallest application surface area. AOSentry is primarily a routing and policy engine, not a feature-rich application. The migration surface was bounded. Third, it has the most to gain from goroutines. Every request to AOSentry is a streaming proxy connection, which is the exact workload pattern where Go’s concurrency model provides its largest advantage.
The AODex migration followed a more methodical planning process. Starting March 21, we wrote Technical Requirements Documents covering 10 migration objectives. Each TRD specified the Rails feature being replaced, the Go equivalent, the data migration strategy, and the acceptance criteria. On March 24, the first three commits landed in aodex-go, beginning the full Rails replacement.
This phased approach meant we never had a period where both products were mid-migration simultaneously. AOSentry was stable in Go before AODex migration began. Lessons learned from the first migration, particularly around sqlc patterns, middleware design, and River job configuration, directly informed the second.
Container Size and Deployment Speed
The difference in container images was not cosmetic. Rails images exceeded 600 megabytes because they included Ruby, Bundler, compiled native extensions, precompiled assets, and the Rails framework. Go images weighed 10 to 20 megabytes because they contained a statically linked binary and nothing else.
On DigitalOcean App Platform, this translated to measurable deployment improvements. Image pulls that took 30 to 60 seconds with Rails completed in under 5 seconds with Go. New instances reached healthy status in milliseconds instead of seconds because Go binaries start instantly while Rails applications boot the framework, connect to the database, load the schema, and warm caches.
For a product that handles bursty LLM traffic, fast scaling is not a luxury. When an enterprise customer triggers a batch of AI requests, the system needs to scale horizontally within seconds. A 600-megabyte image with a multi-second startup time creates a window where traffic exceeds capacity. A 15-megabyte image with a millisecond startup time closes that window almost entirely.
What Transferred and What Changed
The PostgreSQL schema transferred completely. Every table, column, index, and constraint we defined in Rails migrations was replicated in goose migration files as raw SQL. The data model was the same. The database was the same. Only the application layer changed.
Business logic patterns transferred conceptually but not syntactically. Rails’ ActiveRecord callbacks became explicit function calls in Go handlers. Pundit policies became middleware functions on chi routes. Devise authentication became custom JWT validation. Solid Queue jobs became River workers. In every case, the Go version was more verbose and more explicit. There was no convention to hide behind, and we considered that a feature. Every database query was visible in a .sql file. Every authorization check was visible in middleware. Every background job was a concrete type with concrete methods.
What changed was the error model. Rails rescues exceptions and renders error pages. Go returns errors as values that must be handled at every call site. This added verbosity but eliminated an entire category of bugs where unhandled exceptions produced generic 500 responses in production. When every function returns an error, and the compiler ensures you acknowledge it, error handling becomes a design decision rather than an afterthought.
The Lesson
Go’s concurrency model is purpose-built for the LLM gateway pattern. When every request is a long-lived streaming connection, goroutines are not a nice-to-have optimization. They are the right primitive for the problem.
The decision was not about Go being universally better than Rails. Rails built our security platform faster than Go would have. Rails’ conventions accelerated feature development in ways that Go’s explicit style cannot match. But conventions optimize for developer speed. Goroutines optimize for runtime efficiency. When your product is a high-throughput streaming proxy that needs to scale elastically on commodity infrastructure, runtime efficiency is the constraint that matters.
We chose Go because we had already built the product in Rails and understood exactly which architectural constraints were binding. The migration was not a leap of faith. It was an engineering decision backed by four weeks of production data, three previous iterations, and a clear thesis about where our system spent its resources. The goroutine is a small thing. At scale, small things compound.