How I built the fastest browser infrastructure
Two months ago, I came across the BrowserArena leaderboard.
I had no real background in browser infrastructure. I just saw a public benchmark where companies were competing on browser session latency. I didn’t really understand what the numbers meant, so I decided to try to build my own browser infrastructure.
That became BaseLayer: a browser hosting control plane that runs Chromium inside Firecracker microVMs.
The goal was to build something usable. The benchmark was simple enough to obsess over.
So I worked backwards from that lifecycle and asked one question:
What actually makes a browser session slow?
Using a t3.micro benchmark runner against a single AWS m5zn.metal host running BaseLayer, the latest sequential run hit 224ms p50 full lifecycle latency as per BrowserArena lifecycle methodology.
BrowserArena sequential latency leaderboard snapshot data, public rows checked 2026-06-01: BaseLayer self-hosted 224ms lifecycle p50, 75ms session create, 54ms CDP connect, 83ms page.goto, 12ms release. Notte 358ms lifecycle p50, 107ms create, 117ms connect, 107ms page.goto, 27ms release. Kernel 365ms lifecycle p50, 42ms create, 79ms connect, 176ms page.goto, 68ms release. Browserbase 578ms lifecycle p50, 112ms create, 255ms connect, 127ms page.goto, 84ms release. Steel 972ms lifecycle p50, 170ms create, 591ms connect, 97ms page.goto, 114ms release. Browser Use 1206ms lifecycle p50, 892ms create, 152ms connect, 60ms page.goto, 104ms release. Hyperbrowser 1745ms lifecycle p50, 1028ms create, 231ms connect, 121ms page.goto, 365ms release. Anchor Browser 3791ms lifecycle p50, 1467ms create, 168ms connect, 1322ms page.goto, 835ms release.
These are self-hosted, sequential benchmark results on 2026-06-01. These runs are reproducible, and the repo includes the benchmark harness used, run artifacts, and replication notes.
Repo: https://github.com/Lasdw6/BaseLayer
This is the story of what worked, what failed, what I learned, and where the system still breaks.
Why I started this
At the start, I did not know much about browser infrastructure.
That was part of the appeal. BrowserArena gave me a concrete number, and I wanted to understand what was inside that number. Was browser latency mostly API overhead? Network overhead? CDP connection time? Browser startup? Scheduling? Something else?
The fastest way to understand it was to build my own version and measure each piece.
That became the guiding loop for the whole project:
- Build the obvious version.
- Measure where the time goes.
- Fix the biggest bottleneck.
- Keep it only if the lifecycle number improves.
What BrowserArena measures
BrowserArena’s lifecycle benchmark is useful because it measures the path a user actually cares about: from asking for a browser session to connecting to it, navigating, and releasing it.
BrowserArena session lifecycle
BrowserArena measures four provider-agnostic lifecycle stages: session create, CDP connect, page.goto, and session release.
BrowserArena measures the provider-agnostic lifecycle: create, connect, navigate, and release.
Simply put:
- session create: ask the provider for a browser session
- CDP connect: connect to the browser using Chrome DevTools Protocol
- page.goto: navigate the browser to the target page
- session release: clean up the session
The important detail is that the first step, session create, can hide a lot of work. For BaseLayer’s basic version, this is where the microVM boots, Linux starts, Chromium launches, and the CDP endpoint becomes available.
That meant the cold start was not the whole benchmark, but it dominated the first part of the benchmark so badly that the rest of the numbers barely mattered.
That full lifecycle mattered. It stopped me from optimizing a number that looked good internally but did not improve the user-visible path.
The first number
The first version was basic, with no real engineering effort behind it.
Create a Firecracker microVM. Boot Linux. Start chromium-headless-shell. Wait for Chromium to expose CDP. Connect with Playwright. Navigate. Release the session.
It worked, but the number was terrible:
cold launch: 6905.9 ms
That was the first useful measurement.
I still did not fully understand browser infrastructure, but I understood this: if the browser takes almost seven seconds to become usable, that’s not great.
Learning: measure before designing
Before building this, I assumed browser infrastructure latency would come from the control plane: API overhead, routing, scheduling, network hops, or some kind of orchestration tax.
The first measurement corrected that. The slowest part was below all of that. The browser simply was not ready yet.
Cold launch timing
Cold launch baseline total latency was 6905.9ms. The visual breakdown shows boot microVM, boot Linux, start Chromium, wait for CDP, and connect plus navigate, with Chromium startup dominating the cold path.
The first measurement showed that browser startup dominated everything else: cold launch took 6905.9ms.
The question became:
Can I move Chromium startup out of the request path entirely?
Why Firecracker
The honest answer is that I did not start with a deep Firecracker thesis.
After some light research into the browser infrastructure space, I saw that Firecracker was already relevant to how people were thinking about isolated browser sessions. That made it a reasonable starting point.
The part that made me stick with Firecracker was snapshot/restore.
If a cold session was slow because the VM had to boot and Chromium had to start, then the obvious question was:
Can I restore a browser that is already ready, instead of launching a new one every time?
or more technically,
Can I snapshot a VM after Chromium is already running, then restore from that snapshot for every new session?
The first real win: restore a browser that already exists
Instead of launching a new browser for every session, BaseLayer does the expensive work once:
- Start a Firecracker microVM.
- Boot the Linux guest.
- Launch
chromium-headless-shell. - Wait until Chromium is idle and ready for CDP.
- Snapshot the entire VM.
After that, new sessions do not boot from scratch. They restore from a paused VM that already has Chromium running.
That changed the number immediately:
| Launch path | Latency |
|---|---|
| Cold launch | 6905.9 ms |
| Snapshot restore p50 | 130.10 ms |
Across the restore benchmark, all 3/3 runs succeeded. After restore, Playwright could connectOverCDP, and a deterministic page correctness check passed every time.
That took the launch path from roughly seven seconds to roughly 130 milliseconds — about a 53x reduction.
Cold launch vs snapshot restore
Cold launch was 6905.9ms. Firecracker snapshot restore p50 was 130.10ms. Snapshot restore reduced the launch path by about 53 times.
Same lifecycle distance, drawn to one time-scale: cold launch was 6905.9ms, snapshot restore p50 was 130.10ms, about 53× faster.
This was the point where BaseLayer became interesting. A Firecracker snapshot was not just booting fast. It was restoring a usable browser.
Learning: make the expensive work happen before the request
Snapshotting did not make Chromium intrinsically faster.
The guest still had to boot. Chromium still had to start. CDP still had to become ready. The difference was that none of that had to happen while the user was waiting.
That became the main design principle for BaseLayer:
Move expensive, repeatable work out of the request path.
Restore was only one part of the lifecycle
Getting snapshot restore down to ~130ms was the first big win, but it was not the final benchmark.
BrowserArena still measured the full end-to-end path.
Restore was only one piece inside session create.
That was useful because once restore stopped dominating everything, the rest of the lifecycle became non-trivial. I could now see where the next blockers were.
So the project shifted from “can I make browser startup fast?” to “can I make the whole session lifecycle fast?”
So I added detailed timing breakdowns to every session.
Once I could see what the lifecycle was made of, the work became much more straightforward: attack the biggest bottleneck.
The optimization log
Once I could see what the metrics were made of, I started treating BaseLayer as an experiment loop.
Every meaningful experiment had to answer four questions:
- What bottleneck am I addressing?
- What changed?
- What did the numbers say?
- Do I keep it, reject it, revisit it, or leave it experimental?
That became the optimization log. It was not just for tracking wins. It was mostly there to stop me from rerunning the same dead ends.
I could spend hours on every experiment here, but heres the gist.
Optimization experiments log
Optimization log note: the BrowserArena target changed during the project from google.com to example.com, so older Google-target rows and later example.com rows should not be compared as a single methodology.
| Date | Experiment | Result | Decision |
|---|---|---|---|
| 2026-04-06 | Firecracker snapshot proof + density scaling | Cold boot 6905.9ms vs restore 130.1ms p50 (3/3) on GCP nested virt; 100% create/nav through c24. | keep |
| 2026-04-09 | First realistic 100-run sequential | GCP KVM, 99/100, lifecycle 8281ms p50; release alone was 5894ms. Exposed teardown as the real bottleneck. | keep |
| 2026-04-11 | 1-vCPU guest for high-density BrowserArena | AWS m5zn.metal: c24 improved from 4756ms on the 2-vCPU baseline to 4463ms at 1-vCPU. | keep |
| 2026-04-11 | 512MB / 768MB guest memory cuts | 512MB failed 0/24 (V8 OOM at startup); 768MB regressed versus the 1024MB baseline at c24. | reject |
| 2026-04-11 | Lightpanda standalone fast lane | ~396ms total on Google (5/5), but not Chromium-compatible. Kept as a separate fast lane. | experimental |
| 2026-04-12 | Sync delete as latency behavior | Valid, but c24 release rose to 1351ms and lifecycle to 5725ms; async c24 was 4349ms. | reject |
| 2026-04-12 | BrowserArena waitUntil=commit | Failed 0/12; page.goto timed out waiting for commit. | reject |
| 2026-04-12 | Playwright image/media request blocking | Failed c12 and c16; Google navigation timed out. Changing browser semantics is not the same as going faster. | reject |
| 2026-04-12 | Short Firecracker AF_UNIX API/state paths | Avoided socket path failures and remained valid in parity runs. | keep |
| 2026-04-13 | Active-navigation cap (nav-cap16) | Improved isolated goto (77ms avg / 91ms p95) but did not beat Mew total lifecycle. | revisit |
| 2026-04-13 | Fluid CPU profiles (hybrid / always / density) | First pass: hybrid neutral, always regressed, density did not beat profile Y/C enough to justify complexity. | reject |
| 2026-04-15 | Kernel-inspired launch-flag sweep | Gengar (startup-prune) was best, but all variants clustered within ~50ms, so flag churn has a low ceiling. | keep |
| 2026-04-16 | Startup-prune + thp-always host tuning | Gengar shaved ~342ms total / ~389ms goto at c24; thp-always added a small 18–55ms goto win. | keep |
| 2026-04-19 | ARM bare-metal study (a1.metal, c6gd.metal) | Mew valid (c6gd c24 8781ms) but Gengar/Dragonite failed base snapshot on ARM guest startup. | revisit |
| 2026-04-19 | PGO-trained custom headless_shell, first pass | Did not beat the marker-only source shell; the 4-iteration training corpus was too thin. | reject |
| 2026-04-19 | Goto attribution deep-dive | Found goto cost is renderer/page work, not DNS/connect (dnsLookup 0ms). Redirected effort away from network flags. | keep |
| 2026-04-19 | Browser-Use vs BaseLayer same-region sequential | BaseLayer 770ms vs Browser-Use 4173ms lifecycle (both 25/25). The create+release advantage is decisive. | keep |
| 2026-04-20 | Page-ready warm-level matrix | CDP-only warm was best; heavier context/target/blank warm added no extra win. | keep |
| 2026-04-21 | Mixed active/idle fluid rerun (after cgroup fix) | Raticate still best; Spearow regressed, Nidoran-M far worse. The fix removed the blocker but not the loss. | reject |
| 2026-04-22 | Create reservations + warm expected/actual instrumentation | In-flight creates consumed host capacity immediately; warmExpected/warmActual/warmMismatch became visible. | keep |
| 2026-04-23 | BrowserArena-comparable sequential rerun | 99/100, lifecycle 769ms p50 (93/56/618/5). Behind Kernel (743ms), ahead of Notte (953ms). | keep |
| 2026-04-23 | Firecracker restore retry for transient CDP misses | Bounded retry only for local /json/version, /json/list, websocket, and localhost readiness misses. | keep |
| 2026-04-24 | Exact-shape m5zn.metal + t3.micro rerun | 99/100, lifecycle 749ms p50, but one 30.8s create outlier and one 45s timeout. Faster median, unresolved tail. | revisit |
| 2026-05-06 | Full-Chromium guest 100x rerun after parity updates | 100/100, create 122ms, connect 102ms, goto 452ms, release 6ms, lifecycle 692ms p50. | keep |
| 2026-05-07 | Async delete modeled as a release reservation | Fixed the late 503 admission collapse; final 100-run row reached 100/100 with release 10ms p50. | keep |
| 2026-06-01 | Live demo hardening: stop JSON state accumulation | The live-demo control plane was keeping old terminal sessions/events in its JSON-backed state file, so repeated c1 runs got slower as the file grew. Capping retained terminal sessions stopped the drift; the latest five-run c1 x100 batch ranged from 190.8ms to 223.6ms, with the public headline using the slowest run. c10 still needs more reliability work. | keep / revisit |
The full optimization ledger: every meaningful experiment, kept or killed. Filter by decision, or scroll the table. The BrowserArena target changed during the project from `google.com` to `example.com`.
Learning: the benchmark became the teacher
At the start, I was trying to understand a leaderboard number.
By the middle of the project, the benchmark had become a feedback loop. Every run either moved the number, exposed a new bottleneck, or killed an idea that sounded good in my head.
That was probably the most useful part of the project. I learned the space by forcing every design choice to survive measurement.
The choices that actually mattered
After snapshot/restore, the work was mostly about making the fast path survive real benchmark runs.
The best changes were boring systems fixes, not clever browser tricks.
Async delete + scheduler reservations
Release latency is part of the lifecycle, so synchronous teardown was too expensive. BaseLayer used async delete: return quickly, then clean up the microVM, memory, state directory, and network resources in the background.
That made release fast, but it hid load from the scheduler. Some 100-run attempts collapsed late with:
503 No host is currently eligible for session admission
The fix was to account for in-flight work directly: create reservations, release reservations, and reserved guest memory. In practice, the key change was modeling async teardown as a short-lived release reservation. After that, the same lane reached 100/100 while keeping release latency low.
The lesson: cleanup is capacity.
Capacity accounting · host session slots
Capacity accounting diagram data: the host admits up to 60 concurrent session slots, where one slot is one microVM with about 1GB reserved. Without reservations, the scheduler may count 42 of 60 slots while the host is actually using 58 of 60 because teardown load is hidden. It may then admit more work until real usage reaches 64 of 60 and requests fail with 503. Counting active sessions plus in-flight creates, in-flight releases, and reserved guest memory makes admission settle at 60 of 60.
1. Async release returns
The API returns instantly, so the scheduler thinks the session is gone. But the host is still tearing it down in the background.
A slot is one concurrent session the host can hold (one microVM, ~1 GB reserved). The host admits up to 60. The solid bar is the slots the scheduler counts; the hatched part is slots still tied up in background teardown it does not count. That uncounted load is what pushes real usage over the limit and triggers 503s — until reservations count it too, and admission settles at the limit.
Retry the smallest safe unit
Some long runs were healthy except for one create timeout: 99/100 instead of 100/100.
The fix was not to blindly retry the whole API request. Firecracker restore got bounded retries only for local transient CDP readiness failures, after the failed microVM had already been torn down. Provider create also got explicit remote timeouts so bad hosts failed fast instead of hanging.
The lesson: retry the smallest safe unit, not the whole request.
VM sizing mattered
The best density baseline was not “give every browser more resources.” For these workloads, 1 vCPU / 1024 MB became the practical guest shape.
512 MB failed. 768 MB regressed. More memory did not clearly beat 1024 MB. In one c24 comparison, 1 vCPU beat 2 vCPU (4463ms vs 4756ms).
Dense browser infra is not just “more CPU, more RAM.”
Small path details mattered
Some fixes were tiny but real. Short Firecracker API/state paths avoided AF_UNIX socket path failures and made repeated runs more reliable.
Not exciting, but necessary.
The optimizations that looked smart and still lost
A lot of ideas sounded reasonable and still failed. That was useful: each failure narrowed the search space.
Sync delete
Sync delete was valid, but too expensive. In one c24 comparison, sync delete pushed release to ~1351ms and lifecycle to ~5725ms. The async path kept release around 9ms and lifecycle around 4349ms.
Async delete was necessary, but only after the scheduler accounted for teardown pressure.
Image/media blocking
I tried making navigation cheaper by blocking images and media. It broke benchmark parity and failed outright: c12 and c16 both hit 0/N success with Google navigation timing out.
Changing browser semantics is not the same as making the system faster.
waitUntil=commit
waitUntil=commit sounded like a cheaper navigation wait mode, but it failed the benchmark shape I needed. Runs timed out waiting for commit.
Rejected.
Fluid compute
Fluid compute sounded right: browser sessions are bursty, so active sessions should borrow CPU from idle ones.
The benchmark did not support promoting it. Hybrid, always-on, and density-focused profiles mostly tied or regressed. BrowserArena-style all-active waves also do not give idle sessions much CPU to donate.
Still plausible for mixed active/idle workloads, but not the default path.
Memory cuts
Smaller VMs sounded good for density. In practice, 512 MB failed, 768 MB regressed, and larger memory did not clearly beat the 1024 MB baseline.
Custom Chromium / PGO
I built source-level headless_shell variants and tested PGO. They did not beat the stock chromium-headless-shell path enough to justify the effort.
That killed an obvious theory: the magic was not “compile a custom browser.” The bigger win was the runtime path.
Chromium launch flags
Some launch flag profiles produced small wins. Many did nothing. Some made things worse.
Browser flags are easy to cargo-cult. Unless the run shows which lifecycle bucket improved, it is mostly superstition.
The end-to-end result
Once snapshot restore was wired into the full session lifecycle, the latest self-hosted sequential rerun hit 224ms p50.
This answered the question that started the project. A self-hosted Firecracker-based browser runtime could get into the same sequential-latency range as the fastest public browser-infra results, without a custom kernel or unikernel.
The main trick was not exotic Chromium work. It was paying the browser startup cost before the request arrived.
What I would do next
Most of the obvious sequential wins have been squeezed out.
The next real direction is concurrency.
In real workloads, concurrency is always part of the problem. A browser provider is not just asked to create one fast session. It has to keep many sessions alive, decide which ones get CPU, handle idle vs active browser sessions, and avoid letting cleanup or renderer contention collapse the host.
That is where I would go deeper:
- Better fluid-compute experiments for mixed active/idle workloads.
- Cleaner network identity per restored VM.
- Stronger host admission control under create/release pressure.
- Direct tracking of renderer contention instead of only counting microVMs.
- Longer soak tests that look more like real browser-provider workloads.
Some BrowserArena-style concurrent numbers from other systems were around the ~600ms range, but that comparison was less direct since other providers can route requests across a fleet and send each request to the least-loaded node. My setup was one AWS m5zn.metal box with 48 vCPUs, which resulted in 555ms p50 lifecycle.
The next question is not whether Firecracker snapshots can make browser startup fast, but how far that model can scale when many restored browsers are active at the same time.
What I learned
I started this because I saw a leaderboard and did not understand the numbers.
Two months later, the biggest lesson is that these systems are less mysterious once you actually try to rebuild them. From the outside, browser infrastructure looked like a black box. Once I broke the lifecycle into pieces, it became much more understandable: create the session, connect to CDP, navigate, release, then keep measuring which part is slow.
A good feedback loop goes a long way. The benchmark gave me a number. The timing breakdowns told me what the number was made of. The optimization log kept me from lying to myself when an idea sounded good but did not move the metric.
I am glad I spent two months on this. BaseLayer did not answer every question, especially around concurrency density, but it did answer the question that made me start: what actually makes a browser session slow, and how far can I get by attacking the biggest bottleneck directly?
Code: https://github.com/Lasdw6/BaseLayer
If you made it this far, thanks for reading. Always open to chat, feel free to reach out. <3
Vividh Mahajan
Website: vividh.lol
X: vividh.lol/x