{"componentChunkName":"component---src-templates-post-js","path":"/blog/scaling-a-typescript-gateway","result":{"data":{"markdownRemark":{"html":"<p>When we launched <a href=\"https://llmgateway.io\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">LLMGateway</a>, a competitor wished us luck. Friendly, with just a hint of a smirk: <em>\"Good luck with high RPS on a TypeScript proxy.\"</em></p>\n<p>It's the obvious bet. TypeScript, in front of every LLM call, on the hot path of a high-throughput proxy? Bold. Go rewrite incoming.</p>\n<p>But the choice was deliberate. Our whole product is one TypeScript monorepo — dashboard, API, billing — and we weren't about to bolt on a second language just for the gateway. One stack, one toolchain, shared types end to end. We bet the runtime would be fast enough, and that staying unified would pay off everywhere it counted.</p>\n<p>We're now moving around 100 billion tokens a month at over $100k in spend. Here's the part nobody wants to hear: the runtime was the least of it.</p>\n<h2>The thesis</h2>\n<p>The hot path — take a request, forward it to a provider, stream the response back — is cheap. It's a proxy. It barely touches CPU, holds almost nothing in memory, and Node streams bytes just fine. On our bill, the compute for actually handling requests is a rounding error.</p>\n<p>The hard, slow, expensive engineering was everywhere <em>else</em>. Every interesting problem we hit turned out to be infrastructure, not language.</p>\n<p>Let me walk you through where the time actually went.</p>\n<h2>Metering 100 billion tokens</h2>\n<p>You can't bill what you can't count. Every request produces usage we have to record — tokens in, tokens out, cost, latency, model, key. At 100B tokens a month that's a firehose of writes into Postgres, and the naive approach (one insert per request, inside the request) was the first thing to fall over.</p>\n<p>So the write path became the real project:</p>\n<ul>\n<li><strong>We pulled writes off the request path.</strong> The proxy's job is to answer the user. Recording usage happens after, out of band, so a busy Postgres never slows down a completion.</li>\n<li><strong>We batch.</strong> Instead of one insert per request, we buffer and flush in bulk — far fewer round trips, far less lock contention.</li>\n<li><strong>We added the indexes we actually query on</strong>, and dropped the ones we didn't. The wrong index on a hot write table is a tax Postgres charges on every single insert.</li>\n<li><strong>We pre-aggregate.</strong> Per-minute and per-day rollup tables mean the dashboard reads a handful of rows instead of scanning billions.</li>\n</ul>\n<p>None of that is exotic. All of it took more thought than the proxy ever did.</p>\n<h2>Surviving a database that isn't there</h2>\n<p>Here's the line that actually matters: the gateway has to keep proxying even when Postgres is having a bad day.</p>\n<p>If the database hiccups — failover, a slow query pile-up, a connection storm — nobody should get a failed completion because of it. So the gateway scales independently of Postgres and treats it as best-effort: usage writes queue, degrade, and catch up later. The only thing that belongs in the request path is the request. Everything else can wait.</p>\n<p>Making the gateway resilient to its own database failing bought us more reliability than any runtime swap ever could.</p>\n<h2>Spending other people's money safely</h2>\n<p>A gateway holds the keys to real money, and a customer's runaway loop can burn thousands of dollars in minutes if you let it. So budgets and rate limits aren't a feature — they're a safety system. Enforced on the hot path, fast enough not to slow anyone down, strict enough that \"oops, infinite loop\" costs cents instead of a mortgage payment.</p>\n<h2>When a provider falls over</h2>\n<p>Upstreams degrade. A model gets slow, a region starts erroring, a provider has an incident. The gateway has to notice and fail over so the customer's call still lands somewhere that works — detect, reroute, and don't retry yourself into a thundering herd. That was its own multi-week saga.</p>\n<h2>Why TypeScript was the right call</h2>\n<p>Here's the thing the \"good luck\" crowd skips: TypeScript was never the cost. It was the leverage.</p>\n<p>Everything we build is one TypeScript monorepo — the gateway, the dashboard, the public API, the billing logic. One language, one toolchain, pnpm and turbo, one mental model. An engineer can follow a request from the Next.js dashboard through the Hono API into the gateway without ever switching context or re-learning a stack. The types are shared end to end, so a breaking change to the gateway's contract shows up as a red squiggle in the dashboard before it ever ships.</p>\n<p>Now picture the \"fast\" alternative: a separate, lower-level language for the gateway alone. Congratulations, you've bought a rounding error of per-core throughput — and signed up for a second toolchain, a second set of idioms, a second deploy and on-call story, a serialization boundary between the gateway and everything it shares, and a context-switch tax every engineer pays forever. That complexity is real, permanent, and compounding. The performance it buys back is theoretical and, for a proxy, mostly irrelevant.</p>\n<p>TypeScript handled the throughput without breaking a sweat. It never became the constraint — and staying in one language is exactly what let us pour our time into the parts that did matter: the metering, the resilience, the budgets, the failover.</p>\n<p>The runtime everyone fixates on is a rounding error — on the bill and on the calendar. The win wasn't picking a faster language. It was <em>not</em> fragmenting the stack to chase a number that was never the bottleneck.</p>\n<p>\"Good luck with high RPS,\" it turns out, had almost nothing to do with RPS.</p>\n<p><em>I'll dig into the metering pipeline, budget enforcement, and provider failover in follow-up posts.</em></p>","frontmatter":{"title":"Everyone warned us about scaling a TypeScript gateway","description":"A competitor wished us luck with high RPS. 100 billion tokens later, the runtime was the least of our problems.","date":"June 19, 2026"}}},"pageContext":{"slug":"scaling-a-typescript-gateway"}},"staticQueryHashes":["3115057458"]}