This reference covers version: v1.0.0

Gridlane is a stdlib-only Go router and load balancer for the Selenwright browser-automation grid. It sits in front of N Selenwright backends and multiplexes Selenium WebDriver (HTTP), Playwright (WebSocket), and the side surfaces (VNC, logs, video, downloads, devtools, clipboard, artifact history) behind a single authenticated endpoint. Please refer to the GitHub repository if you need source code.

1. Getting Started

Three entry points:

  • Quick Start Guide — you have a Selenwright backend (or two) and want traffic flowing through Gridlane in a few minutes. Covers router.json, BasicAuth, the smoke checks, and docker run.

  • Running With Selenwright — the compose recipe for two Selenwright backends behind Gridlane with trusted-proxy identity propagation end-to-end.

  • FAQ — the set of things people hit in the first hour: 401 on /wd/hub/session vs /vnc/, stale session IDs after a restart, what changes Gridlane can reload and what needs a restart.

1.1. Quick Start Guide

1.1.1. Prerequisites

  • Docker (recent version).

  • One or more running Selenwright backends reachable from the host where you will run Gridlane.

1.1.2. Minimum Viable router.json

Gridlane reads a single strict-schema v1 JSON file. The smallest usable shape is one user, one backend, and one browser:

{
  "version": 1,
  "users": [
    {
      "name": "alice",
      "password_ref": "env:GRIDLANE_ALICE_PASSWORD",
      "quota": { "max_sessions": 20 }
    }
  ],
  "catalog": {
    "browsers": [
      {
        "name": "chrome",
        "versions": ["stable"],
        "protocols": ["webdriver", "playwright"]
      }
    ]
  },
  "backend_pools": [
    {
      "id": "selenwright-a",
      "endpoint": "http://selenwright-a:4444",
      "region": "local-a",
      "weight": 1,
      "protocols": ["webdriver", "playwright"],
      "health": { "enabled": true, "failure_threshold": 2, "cooldown": "10s" }
    }
  ]
}

See Router Configuration for every field and validation rule. The parser uses DisallowUnknownFields — misspelled keys are rejected loudly, not silently ignored.

Passwords and tokens are never inlined in router.json. They live behind env:NAME or file:/absolute/path references and are resolved on startup and on reload.

1.1.3. Start Gridlane

$ docker run -d --name gridlane                            \
    -p 4444:4444 -p 9090:9090                              \
    -v $(pwd)/router.json:/etc/gridlane/router.json:ro     \
    -e GRIDLANE_ALICE_PASSWORD=wonderland                  \
    selenwright/gridlane:latest-release                       \
    -config /etc/gridlane/router.json                      \
    -metrics-listen :9090

Gridlane listens on :4444 by default. Change it with -listen (see CLI Flags). The image runs as unprivileged 65532:65532 and exposes 4444 (main listener) and 9090 (metrics listener, when -metrics-listen is set).

To enable the admin scope (needed for /config and for /metrics on the main listener), add an admin.token_ref to the config and pass the corresponding environment variable:

$ docker run -d --name gridlane                            \
    -p 4444:4444 -p 9090:9090                              \
    -v $(pwd)/router.json:/etc/gridlane/router.json:ro     \
    -e GRIDLANE_ALICE_PASSWORD=wonderland                  \
    -e GRIDLANE_ADMIN_TOKEN=root-token                     \
    selenwright/gridlane:latest-release                       \
    -config /etc/gridlane/router.json                      \
    -metrics-listen :9090

1.1.4. Point Your Tests At Gridlane

Tests connect to Gridlane the same way they would connect to a single Selenwright instance:

Selenium WebDriver
http://localhost:4444/wd/hub
Playwright
ws://localhost:4444/playwright/<browser>/<version>

Auth is HTTP BasicAuth (for users[]) or bearer token (for admin). See Authentication for the full picture.

1.1.5. Smoke Checks

# Liveness — no auth
$ curl -fsS http://127.0.0.1:4444/ping
{"service":"gridlane","status":"ok"}

# Backend rollup — no auth
$ curl -fsS http://127.0.0.1:4444/status

# Caller's own quota — BasicAuth
$ curl -fsS -u alice:wonderland http://127.0.0.1:4444/quota

# Sanitized config — admin token
$ curl -fsS -H 'X-Gridlane-Admin-Token: root-token' http://127.0.0.1:4444/config

1.1.6. Running From Source

Go 1.26, stdlib-only — no go get, no vendoring. Download a prebuilt binary from releases or build locally:

$ go build ./cmd/gridlane
$ GRIDLANE_ALICE_PASSWORD=wonderland GRIDLANE_ADMIN_TOKEN=root-token \
    ./gridlane -config router.json

Flags and config file are identical to the container path — the Docker image’s ENTRYPOINT is the binary with no wrapper. Use this form when you are hacking on Gridlane itself, or in environments without a container runtime.

1.1.7. What’s Next

  • Running With Selenwright — two Selenwright backends behind Gridlane with trusted-proxy identity propagation, wired end-to-end.

  • Authentication — BasicAuth, guest access, admin token, and the scope ladder.

  • Router Configuration — every field in router.json with validation rules and secret-ref format.

1.2. Running With Selenwright

Minimal end-to-end recipe: one Gridlane in front of two Selenwright backends, with identity propagated from Gridlane to Selenwright so per-user quotas and session ACL keep working through the router.

1.2.1. Topology

         ┌──────────┐
 tests ──▶ gridlane │──▶ selenwright-a ──▶ Docker / browsers
         │  :4444   │──▶ selenwright-b ──▶ Docker / browsers
         └──────────┘
  • Gridlane terminates BasicAuth from the client.

  • Gridlane stamps X-Forwarded-User, optionally X-Admin, and a shared X-Router-Secret on every upstream request.

  • Each Selenwright runs in -auth-mode=trusted-proxy and verifies the router secret before trusting the identity headers.

  • The shared secret means a direct client cannot bypass Gridlane and hit Selenwright with a spoofed X-Forwarded-User.

1.2.2. Environment

Pick any opaque values for local development and rotate them in real deployments:

$ export GRIDLANE_ALICE_PASSWORD=wonderland
$ export GRIDLANE_ADMIN_TOKEN=root-token
$ export GRIDLANE_ROUTER_SECRET=dev-router-secret

1.2.3. router.json

A two-backend version of the shape from Quick Start Guide, with the upstream_identity block added:

{
  "version": 1,
  "users": [
    { "name": "alice", "password_ref": "env:GRIDLANE_ALICE_PASSWORD", "quota": { "max_sessions": 20 } }
  ],
  "guest": { "quota": { "max_sessions": 2 } },
  "catalog": {
    "browsers": [
      { "name": "chrome", "versions": ["stable"], "protocols": ["webdriver", "playwright"] }
    ]
  },
  "backend_pools": [
    {
      "id": "selenwright-a",
      "endpoint": "http://selenwright-a:4444",
      "region": "local-a",
      "weight": 1,
      "protocols": ["webdriver", "playwright"],
      "health": { "enabled": true, "failure_threshold": 2, "cooldown": "10s" }
    },
    {
      "id": "selenwright-b",
      "endpoint": "http://selenwright-b:4444",
      "region": "local-b",
      "weight": 1,
      "protocols": ["webdriver", "playwright"],
      "health": { "enabled": true, "failure_threshold": 2, "cooldown": "10s" }
    }
  ],
  "admin": { "token_ref": "env:GRIDLANE_ADMIN_TOKEN" },
  "upstream_identity": {
    "user_header":  "X-Forwarded-User",
    "admin_header": "X-Admin",
    "secret_ref":   "env:GRIDLANE_ROUTER_SECRET"
  }
}

1.2.4. Selenwright Invocation

Each Selenwright backend in this topology runs in trusted-proxy mode with the same router secret:

$ selenwright                                                       \
    -auth-mode=trusted-proxy                                        \
    -user-header=X-Forwarded-User                                   \
    -admin-header=X-Admin                                           \
    -trusted-proxy-secret="$GRIDLANE_ROUTER_SECRET"

Selenwright rejects every request whose X-Router-Secret header does not match -trusted-proxy-secret. Without this guard any client could set X-Forwarded-User: alice directly against Selenwright and impersonate any user.

Selenwright’s flag takes the secret value directly (not a env: reference). Pass the value via your process manager, systemd unit, or Docker Compose env interpolation; keep it out of shell history.

1.2.5. Smoke Checks

With Gridlane on :4444:

# Liveness + backend rollup
$ curl -fsS http://127.0.0.1:4444/ping
$ curl -fsS http://127.0.0.1:4444/status

# Admin view of the loaded config (secrets redacted)
$ curl -fsS -H 'X-Gridlane-Admin-Token: root-token' http://127.0.0.1:4444/config

# Alice's quota — routed through trusted-proxy
$ curl -fsS -u alice:wonderland http://127.0.0.1:4444/quota

# Metrics on the side listener (no auth)
$ curl -fsS http://127.0.0.1:9090/metrics

For a Playwright WebSocket handshake smoke using a configured catalog version and backend image:

$ websocat -E -n \
    --basic-auth "$(printf 'alice:wonderland' | base64)" \
    --protocol playwright-json \
    ws://127.0.0.1:4444/playwright/chrome/stable

1.2.6. Security Notes

  • /metrics on the main Gridlane listener requires X-Gridlane-Admin-Token. The separate -metrics-listen listener serves metrics with no auth and is intended for an internal-only bind or a private network path.

  • Side endpoints (/vnc/, /devtools/, /logs/, /video/, /download/, /downloads/, /clipboard/, /history/settings) require user BasicAuth and do not fall back to guest access — see Authentication.

  • Incoming X-Forwarded-User, X-Admin, and X-Router-Secret headers from the client are stripped before auth runs. A malicious client cannot impersonate another user by setting them.

See Identity Propagation for the deeper view of what happens on each upstream request.

1.3. FAQ

1.3.1. Why does /wd/hub/session return 401 even though my credentials work on /quota?

Almost always a typo in the Authorization header. WebDriver session create and Playwright upgrade are user scope — BasicAuth matching a users[] entry, or anonymous guest access if guest is configured. If the header is malformed, Gridlane falls through to the guest check; if guest is not configured the final response is 401.

/quota rejects the same way. If you are getting a 200 on /quota with the same credentials, recheck the header you are actually sending on the WebDriver request — tooling sometimes rewrites headers in unexpected ways.

1.3.2. Why does VNC / video / logs 401 when WebDriver works?

Side endpoints (/vnc/, /video/, /logs/, /devtools/, /download/, /downloads/, /clipboard/, /history/settings) are side scope — they require BasicAuth matching a users[] entry and do not fall back to guest. Sessions a guest created cannot be observed through VNC unless the observer authenticates as a named user.

See Authentication for the scope ladder.

1.3.3. Can I run more than one Gridlane replica in front of the same backend pool?

Yes — Gridlane keeps no per-session state. Public session IDs are r1_<route-token>_<upstream-id>, and the <route-token> is a deterministic HMAC of the backend pool ID. Any replica serving the same router.json will route a follow-up for r1_<token>_…​ to the same backend as the replica that created the session.

See Session ID Format for how this works and Identity Propagation for how the per-request identity survives across replicas.

1.3.4. Can I send SIGHUP to pick up a new router.json?

Yes — SIGHUP reloads the config (unless started with -reload-on-sighup=false). Reload is fail-closed: if the new config fails validation (schema violation, unresolvable secret, bad endpoint URL), the previous runtime keeps serving and the error is logged. You never serve a half-loaded config.

What reloads: users, guest, catalog, backend_pools, admin.token_ref, upstream_identity. What needs a restart: -listen, -metrics-listen, -log-format, -graceful-period, -session-attempt-timeout, -proxy-timeout, -reload-on-sighup.

1.3.5. Why is my backend "unhealthy" but curl against its endpoint works fine?

Gridlane’s health tracking is passive — it marks a pool unhealthy after failure_threshold consecutive failing proxy attempts through the pool (5xx, 408/425/429, 401/403) and keeps it out of rotation for cooldown. There is no active probe. A backend that is fine right now but failed the last failure_threshold requests will stay out of rotation until cooldown expires, even if the underlying issue has recovered.

Lower cooldown in the pool’s health block if you are seeing recovery lag in development. In production the default makes retries cheap and prevents a flaky pool from burning the whole request flow.

1.3.6. Why does /vnc/<id> give 404 on a Playwright session that otherwise works fine?

Gridlane picks the session ID for Playwright and tells Selenwright about it via X-Selenwright-External-Session-ID on the upgrade. If the Selenwright you’re running does not accept that header, it mints its own random ID — the two sides then disagree about what the session is called, the session itself is fine (frames flow through the tunnel), but everything that addresses the session by ID afterwards (/vnc/<id>, /video/<id>, /logs/<id>, …) `404`s.

Fix: use a Selenwright build that accepts the header. WebDriver is not affected — there the server picks the ID, so any Selenwright works.

1.3.7. Do I need upstream_identity to use Gridlane?

No. Without it, Gridlane still authenticates clients, enforces per-user quotas on its own side, and proxies transparently. Upstream Selenwright will see every session as coming from the pool-level BasicAuth account.

Turn on upstream_identity when you want per-user quotas / session ACL / admin bypass on the Selenwright side as well, which is the common production case. See Identity Propagation.

1.3.8. Why is /metrics requiring an admin token? I want to scrape it from Prometheus without one.

On the main listener, /metrics is admin scope — Prometheus metrics carry per-route latency histograms and per-backend health, which you probably do not want to expose on an internet-reachable listener.

For Prometheus scraping run a separate listener with -metrics-listen :9090 (or another address). That listener serves /metrics with no auth and is intended to be bound to a private interface or an internal overlay network.

1.3.9. Why is the HMAC for the route token not a configurable secret?

Today the HMAC key used to derive the <route-token> prefix is a fixed domain separator (gridlane-route-token-v1), not a per-deployment secret. It stops an attacker from guessing the token for a specific backend pool ID, but two Gridlane deployments that happen to use the same backend_pools[*].id will produce identical tokens.

This is acknowledged tech debt. A -route-salt flag that accepts an env: / file: reference is the planned path forward. Until then, keep pool IDs unique across deployments that might ever share traffic.

2. Routing

One section per traffic shape Gridlane handles. The routing layer is stateless — public session IDs carry enough information for any Gridlane replica serving the same router.json to follow up a session created through a different replica.

2.1. WebDriver Routing

Gridlane routes Selenium WebDriver (W3C) traffic on the same listener as Playwright. A WebDriver client sees Gridlane as a standard hub endpoint:

http://<gridlane>:4444/wd/hub

Both the Selenoid-style prefix (/wd/hub/session, /wd/hub/session/<id>/…​) and the bare W3C prefix (/session, /session/<id>/…​) are accepted.

2.1.1. Session Create

The session-create request (POST /session or POST /wd/hub/session) carries the W3C capabilities payload. Gridlane parses browserName and, when present, browserVersion + platformName from alwaysMatch/firstMatch, and looks up a browser entry in the loaded catalog:

  • The browser name must be listed in catalog.browsers[].

  • At least one backend_pools[] entry whose protocols contains "webdriver" must be healthy.

If no matching pool is available the client gets 503 Service Unavailable; if the browser is not in the catalog the client gets 400 Bad Request with a W3C-shaped error body.

Pool selection is weighted. Each healthy pool contributes weight slots to the selection pool, and region filtering is applied when the client requests a specific region via capabilities extension (selenwright:options.region). A region mismatch falls back to any region.

Once a backend accepts the session create, Gridlane:

  1. Reads the upstream session ID from the response body.

  2. Rewrites it into a public ID r1_<route-token>_<upstream-id> — see Session ID Format.

  3. Rewrites the response body so the client sees only the public ID.

The public ID is what the client uses for every follow-up.

2.1.2. Session Follow-Up

Follow-up requests (GET /session/<id>/url, DELETE /session/<id>, …) carry the public ID on the path. Gridlane splits r1_<route-token>_<upstream-id>:

  • <route-token> selects the backend pool (deterministic HMAC over pool.id).

  • <upstream-id> replaces the public ID on the rewritten request path before the request is proxied.

Because the mapping is deterministic, any Gridlane replica that has the same router.json loaded will route the same follow-up to the same pool without any shared state. Replicas can be round-robin’d behind an L4 load balancer.

2.1.3. Upstream Credentials

If the Selenwright backend enforces its own BasicAuth (or any Authorization header), configure it per-pool:

{
  "id": "selenwright-a",
  "endpoint": "http://selenwright-a:4444",
  "credentials": {
    "username_ref": "env:SELENWRIGHT_A_USER",
    "password_ref": "env:SELENWRIGHT_A_PASSWORD"
  },
  ...
}

Gridlane injects BasicAuth on the proxied request with these credentials. The client’s own Authorization header is consumed by Gridlane’s auth layer and is not forwarded.

If upstream_identity is configured, Gridlane additionally stamps X-Forwarded-User, optionally X-Admin, and X-Router-Secret on every upstream request — see Identity Propagation.

2.1.4. Capabilities Pass-Through

Gridlane is protocol-transparent for WebDriver capabilities — whatever the client sends in alwaysMatch/firstMatch is forwarded to Selenwright verbatim. Gridlane does not enforce a capability policy of its own; that lives on the Selenwright side (see Selenwright’s -caps-policy flag).

2.1.5. Quotas

Every new session counts against the caller’s quota (users[*].quota.max_sessions or guest.quota.max_sessions). When the quota is reached, Gridlane returns 429 Too Many Requests without attempting the upstream call.

Quotas count concurrent sessions that Gridlane has routed for the caller. A session that ends (client DELETE, upstream timeout, upstream 5xx) releases the slot.

The caller can query their own live quota state at GET /quota — see HTTP API.

2.1.6. Errors

Gridlane returns W3C-shaped error bodies ({"value":{"error":"…​","message":"…​","stacktrace":""}}) on session create so W3C clients interpret the failure correctly:

Status error code When

400

invalid argument

Browser not in catalog; malformed capabilities payload

401

n/a (no body)

BasicAuth missing or wrong, no guest fallback

403

n/a

Admin-scope endpoint called without admin token

429

session not created

Caller’s quota reached

503

session not created

No healthy backend pool matches the request

Follow-up requests (GET /session/<id>/…​) return the upstream response as-is, including upstream-generated W3C errors, with the session ID rewritten back to the public form.

2.2. Playwright Routing

Gridlane proxies Playwright WebSocket connections on the same listener as WebDriver. A Playwright client connects to the browser/version path the same way it would against a direct Selenwright:

ws://<gridlane>:4444/playwright/<browser>/<version>

<browser> must be a catalog.browsers[].name; <version> must be present in that browser’s versions. Neither value is free-form — both are validated against the loaded catalog before any upstream is contacted.

2.2.1. Upgrade Flow

On the incoming GET /playwright/<browser>/<version> with an Upgrade: websocket header:

  1. Gridlane authenticates the caller (user scope — BasicAuth or guest if configured) and enforces the quota.

  2. Gridlane selects a healthy backend pool from those advertising "playwright" in protocols, weighted and region-aware.

  3. Gridlane mints a fresh public session ID — the upstream part is pw_<32-hex>, so the full public ID is r1_<route-token>_pw_<32-hex>.

  4. Gridlane sends the upgrade to the chosen backend with X-Selenwright-External-Session-ID: <public-id> set on the upstream request.

  5. On the 101 Switching Protocols response Gridlane echoes X-Selenwright-Session-ID: <public-id> back to the client, then relays the WebSocket frames transparently.

Selenwright uses the external-ID header as the session key in its own storage. Clients that do not read headers off the 101 response can still address side endpoints — the public ID is also the one they would see by inspecting the WebSocket URL if they needed to reconnect.

Gridlane picks the session ID (not Selenwright) and tells Selenwright what it is via X-Selenwright-External-Session-ID on the upgrade. This way the same ID works across Gridlane replicas and across WebDriver/Playwright uniformly, and side endpoints (/vnc/<id>, /video/<id>, /logs/<id>) resolve against whatever the client already has.

2.2.2. Side Endpoints

Once a Playwright session is up, the usual Selenwright side surfaces are accessible at their normal paths, using the public session ID. Some are HTTP, some are WebSocket — see HTTP API for the full matrix. Typical examples:

# WebSocket — upgrade required
/vnc/<public-id>                 # live VNC stream
/devtools/<public-id>/page       # Chrome DevTools Protocol
/logs/<public-id>                # live session-log stream

# HTTP
/video/<public-id>.mp4           # recorded session video
/download/<public-id>/<file>     # per-session downloaded file
/downloads/<public-id>           # downloads index
/clipboard/<public-id>           # clipboard read/write

Gridlane forwards these to the pool selected by the public ID’s <route-token>, keeping the public ID on the upstream path (Selenwright stored the session under the public ID, so /vnc/r1_…/ lands on the right container).

These routes are side scope — BasicAuth required, no guest fallback.

2.2.3. Query Parameters

A small set of capabilities is accepted as query parameters on the upgrade URL and passed through to Selenwright verbatim:

  • enableVNC=true — start VNC for the session.

  • name=<label> — human-readable label shown in UI.

  • screenResolution=1280x1024 — container screen resolution.

ws://gridlane:4444/playwright/chromium/1.56.1?enableVNC=true&name=myTest

2.2.4. Version Matching

Gridlane performs no semver matching — <version> must exactly match one of catalog.browsers[].versions[]. Keep the catalog version, the Selenwright image tag, and the client’s connection URL aligned. Playwright client/server major.minor mismatches are rejected by the Playwright protocol itself, not by Gridlane.

2.2.5. Errors

Status When

400

<browser> / <version> not in the loaded catalog

401

BasicAuth missing or wrong, no guest fallback

404

Path not /playwright/<browser>/<version> (missing segments)

429

Caller’s quota reached

503

No healthy backend pool with "playwright" in protocols

WebSocket upgrade failures are returned as regular HTTP responses before the 101. Failures that happen after the upgrade (upstream disconnect, idle timeout) are surfaced as WebSocket close frames.

2.3. Session ID Format

Every session Gridlane routes has a public session ID of the form:

r1_<route-token>_<upstream-id>
  • r1_ — a version prefix. Future breaking changes in the encoding scheme will bump it.

  • <route-token> — 16 hex characters. The first 8 bytes of HMAC-SHA256(key, backend_pool.id), hex-encoded. Picks the backend pool for follow-up routing.

  • <upstream-id> — the session ID as it lives on the backend. For WebDriver this is whatever Selenwright returned on session-create. For Playwright this is a Gridlane-minted pw_<32-hex> that Selenwright accepts via X-Selenwright-External-Session-ID on the upgrade.

2.3.1. Why This Shape

Public session IDs carry enough information for any Gridlane replica serving the same router.json to route a follow-up to the correct backend, without any shared session state. A client that created a session through Gridlane replica A can send the next request through replica B and it lands on the same Selenwright container.

The route token is also a cheap forgery check: a client that wants to target a specific backend pool has to present a session ID whose route-token matches, which means they had to have a session ID Gridlane issued for that pool. There is no per-session shared key — the HMAC is over the pool ID, not the session — so this is a routing constraint, not an authentication signal. Authentication happens at the scope layer; see Authentication.

2.3.2. Route Token Derivation

key         = "gridlane-route-token-v1"                  # fixed
mac         = HMAC-SHA256(key, backend_pool.id)[:8]      # 8 bytes
route_token = hex(mac)                                   # 16 chars

The key is a domain separator, not a per-deployment secret. This means:

  • An attacker who knows (or guesses) a backend_pool.id can compute its route token. This is fine — routing to a specific pool is not a privileged operation; scope and quota enforcement happen on the request itself, not on the token.

  • Two Gridlane deployments that reuse the same backend_pool.id produce identical tokens. If you federate traffic between deployments that share a client surface, keep pool IDs unique.

A -route-salt flag that accepts an env: / file: secret reference is planned to make this a real per-deployment secret; see FAQ. Until then, uniqueness of pool.id is the mitigation.

2.3.3. Upstream ID By Protocol

Protocol Shape Source

WebDriver

Opaque

Whatever Selenwright returned in the session-create response

Playwright

pw_<32-hex>

Gridlane mints 16 random bytes, hex-encodes, prefixes pw_, and sends it to Selenwright in X-Selenwright-External-Session-ID on the upgrade request

For WebDriver, Gridlane rewrites the upstream ID to the public form on the session-create response body. The client never sees the raw upstream ID.

For Playwright, Gridlane echoes the public ID back to the client on the 101 Switching Protocols response in X-Selenwright-Session-ID, and also uses it as the path segment for all side endpoints (/vnc/<public-id>, /logs/<public-id>, …).

2.3.4. Follow-Up Routing

Every request that carries a session ID on its path (WebDriver /session/<id>/…​, or any Playwright side endpoint) is routed as:

  1. Parse the public ID; if it does not match r1_<16-hex>_<rest>, return 400.

  2. Look up the backend pool whose route-token matches the prefix. If no pool matches (pool was removed on reload, or ID was spoofed), return 404.

  3. For WebDriver, rewrite the session segment on the proxied path to <upstream-id>. For Playwright side endpoints, keep the public ID as the path segment (Selenwright stored the session under the public ID).

  4. Proxy the request.

Follow-up requests do not re-check health. A session that belongs to a now-unhealthy pool is still routed to that pool, because:

  • The session is already on that backend. Sending the follow-up elsewhere would guarantee a 404.

  • An unhealthy pool is kept out of new session placement, not out of existing-session fan-out.

If the pool is genuinely down, the proxied request fails and the client sees the upstream error shape. The pool’s failure counter ticks up — see Backend Health.

2.3.5. Regenerating A Session ID

Do not regenerate a Gridlane session ID on the client side. The ID is cryptographically bound to a specific pool via the route token; a hand-crafted ID with the wrong token will route to the wrong pool (or be rejected) and the session will not exist on that backend anyway. Use the ID Gridlane returned.

2.4. Backend Health

Gridlane tracks the health of each backend pool passively — there is no active health probe. A pool is marked unhealthy after a configured number of consecutive failing proxy attempts, kept out of rotation for a cooldown window, and then returned automatically.

2.4.1. Configuration

Per-pool, inside router.json:

{
  "id": "selenwright-a",
  "endpoint": "http://selenwright-a:4444",
  ...
  "health": {
    "enabled": true,
    "failure_threshold": 2,
    "cooldown": "10s"
  }
}
Field Default Notes

enabled

false

When false, the pool is always considered healthy and failure counting is disabled

failure_threshold

1

Consecutive failures that trip the pool into the unhealthy state. Must be zero or positive

cooldown

30s

Go duration string ("10s", "2m", "500ms"). Time the pool stays out of rotation after it trips

2.4.2. What Counts As A Failure

Gridlane classifies every proxied request outcome:

Class Counts as Notes

2xx, 3xx

success

Clears the pool’s failure counter

408, 425, 429

failure

Upstream is overloaded or rate-limiting

401, 403

failure

Upstream auth drift — usually a misconfigured shared secret or rotated backend credentials

5xx

failure

Upstream error or transport error (connection refused, TLS handshake failure, timeout)

4xx (other)

client error

Neither success nor failure; does not touch the counter

Transport-level errors (connection refused, i/o timeout, etc.) are surfaced as 502 Bad Gateway to the client and count as a failure.

2.4.3. State Machine

                    +-------------+
   success          |             |   failure (threshold-1 times)
 +-----------------▶|   healthy   |◀-+
 |                  |             |  |
 |                  +------+------+  |
 |                         |         |
 |                  (threshold reached)
 |                         |
 |                         ▼
 |                  +-------------+
 |                  |             |
 +------------------+  unhealthy  |  cooldown expires
                    |             +----- returns to healthy
                    +-------------+
  • ReportFailure() increments the failure counter. When the counter reaches failure_threshold, unhealthyUntil = now() + cooldown and the pool is removed from new-session placement.

  • ReportSuccess() clears the failure counter to zero. If called during the cooldown window it is a no-op for the unhealthy state — the pool still needs the cooldown to expire. This avoids a single successful follow-up request rapidly bouncing a flaky pool back into rotation.

  • Time alone recovers the pool — when now() >= unhealthyUntil, the pool returns to the healthy state with a fresh counter.

2.4.4. Placement vs Follow-Up

Health affects new session placement only. A session that already exists on a now-unhealthy pool is still routed to that pool on follow-up, because:

  • The session exists on that specific backend. Sending the follow-up elsewhere guarantees 404.

  • The follow-up may itself be successful and clear the pool’s failure counter.

See Session ID Format for how follow-up routing works.

2.4.5. /status Rollup

GET /status returns a rolled-up view of all pools:

{
  "service":         "gridlane",
  "status":          "ok",
  "backend_count":   2,
  "available_count": 2
}

available_count is the number of pools currently in the healthy state. If any pool is in cooldown, available_count < backend_count. If no pool is healthy, status becomes degraded and new session placement returns 503.

/status is public — no auth required. It is intended for external health-check probes.

2.4.6. gridlane_backend_available Metric

Per-pool health is also exposed as a gauge:

gridlane_backend_available{backend="selenwright-a",region="local-a",protocols="webdriver,playwright"} 1
gridlane_backend_failures_total{backend="selenwright-a",region="local-a",protocols="webdriver,playwright"} 17

available is 1 when the pool is in the healthy state, 0 when in cooldown. failures_total, despite its name, is the count of current consecutive failures — it ticks up while the pool misbehaves and resets to 0 on the first subsequent success or when the cooldown expires. See Observability.

2.4.7. When To Tune

Defaults (failure_threshold=1, cooldown=30s) are tuned for production where any backend failure is a real signal and a brief removal from rotation is cheap. In local development with a single backend, you probably want failure_threshold=3 and cooldown=5s so a transient issue during setup does not shut down the whole grid. For very flaky environments (shared CI hosts, spot instances), bump failure_threshold higher to avoid oscillation.

3. Security

Gridlane has a four-rung scope ladder (admin > user > side > public) and, when running in front of Selenwright, propagates the resolved identity downstream using the trusted-proxy pattern so Selenwright-side quotas and ACL keep working per-user instead of per-pool.

3.1. Authentication

Gridlane authenticates every inbound request against a four-rung scope ladder. Each endpoint is registered at a scope, and a request is admitted only if its credentials satisfy that scope or a stricter one.

3.1.1. The Scope Ladder

admin  >  user  >  side  >  public
Scope Credential Applies To

public

none

/ping, /status — liveness, public backend rollup

user

BasicAuth matching a users[] entry, or anonymous guest if guest is configured

/quota, session create and follow-up (WebDriver + Playwright)

side

BasicAuth matching a users[] entry (no guest fallback)

/vnc/, /devtools/, /video/, /logs/, /download/, /downloads/, /clipboard/, /host/, /history/settings

admin

X-Gridlane-Admin-Token: <token> header, or Authorization: Bearer <token>

/config, /metrics on the main listener

A request is evaluated bottom-up — a valid admin token satisfies all four scopes; valid BasicAuth satisfies user, side, and public; guest satisfies only user and public.

3.1.2. BasicAuth Users

Defined in router.json:

{
  "users": [
    {
      "name": "alice",
      "password_ref": "env:GRIDLANE_ALICE_PASSWORD",
      "quota": { "max_sessions": 20 }
    },
    {
      "name": "bob",
      "password_ref": "file:/run/secrets/bob.password",
      "quota": { "max_sessions": 5 }
    }
  ]
}
  • name must be unique across users[].

  • password_ref resolves to a plaintext password at startup and on reload. See Secret References.

  • quota.max_sessions is the maximum concurrent sessions Gridlane will route for this user.

Passwords are compared in constant time. BasicAuth credentials are consumed by Gridlane and are not forwarded upstream; if the backend pool enforces its own BasicAuth, use the pool-level credentials block — see Router Configuration.

3.1.3. Guest Access

When guest is configured, requests that arrive at a user-scope endpoint without BasicAuth (or with BasicAuth that does not match any users[] entry) are admitted as an anonymous guest:

{
  "guest": { "quota": { "max_sessions": 2 } }
}

Guest requests share a single quota pool keyed on the literal subject guest. When the pool’s max_sessions is reached, further guest session creates get 429.

Guest access applies to user scope only. Side endpoints always require named BasicAuth — there is no way to VNC into a guest-created session without authenticating as a user first.

Omit the guest block entirely to disable anonymous access. At least one of users[] or guest must be present.

3.1.4. Admin Token

The admin scope unlocks endpoints that expose internal state (/config, /metrics on the main listener). Configure a reference to the token:

{
  "admin": { "token_ref": "env:GRIDLANE_ADMIN_TOKEN" }
}

Gridlane reads the token at startup and on reload. Clients present it in one of two forms:

$ curl -fsS -H 'X-Gridlane-Admin-Token: root-token' http://127.0.0.1:4444/config
$ curl -fsS -H 'Authorization: Bearer root-token'  http://127.0.0.1:4444/config

Both forms are compared in constant time. The Bearer form wins if both headers are set.

If admin.token_ref is omitted, /config and main-listener /metrics are permanently unavailable — there is no in-band way to mint an admin token, deliberately.

3.1.5. Secret References

Passwords and tokens never appear in plaintext in router.json. Every *_ref field accepts one of:

Form Notes

env:NAME

Read from the NAME environment variable. Value must be non-empty at startup / reload

file:/absolute/path

Read from an absolute file path. Path must be absolute, cleaned (no .. or trailing /), and readable. Trailing whitespace on the file content is trimmed

Anything else — plaintext, relative path, bare : — is rejected with a validation error. This is a hard rule; there is no plaintext fallback.

Secrets are re-resolved on every reload. Rotate a password by updating the referenced env var / file and sending SIGHUP.

3.1.6. Constant-Time Comparison

All password and token comparisons use crypto/subtle.ConstantTimeCompare after a length check. This is defense in depth; the primary protection against credential guessing is rate-limiting at the network edge, which Gridlane does not implement itself.

3.1.7. What Gets Logged

Gridlane logs the authenticated subject and scope on every request (subject=alice scope=user, subject=guest scope=user, subject=admin scope=admin). Passwords, tokens, and the Authorization header contents are never logged.

3.1.8. Putting It Together

A concrete routing example. Assume users[] has alice, guest is configured with max_sessions=2, and admin.token_ref points to root-token:

Request Scope checked Outcome

GET /ping

public

200, no auth required

GET /status

public

200, backend rollup

GET /quota with BasicAuth alice:…​

user

200, returns Alice’s quota

GET /quota with no auth

user

200, returns guest quota (guest is configured)

POST /wd/hub/session with BasicAuth alice:…​

user

Routed, counts against Alice’s quota

POST /wd/hub/session with no auth

user

Routed as guest, counts against guest quota

GET /vnc/r1_…​ with BasicAuth alice:…​

side

Proxied

GET /vnc/r1_…​ with no auth

side

401 (guest cannot observe)

GET /config with X-Gridlane-Admin-Token: root-token

admin

200, sanitized config

GET /config with BasicAuth alice:…​

admin

401 (user cannot see config)

3.2. Identity Propagation

When Gridlane sits in front of Selenwright, the natural layout is: client authenticates to Gridlane, Gridlane authenticates to Selenwright with a pool-level service account. The downside is that Selenwright then sees every session as coming from the same account, so its own per-user quotas, session ACL, and admin-bypass collapse to a single identity.

Gridlane’s upstream_identity block solves this by propagating the resolved client identity downstream using the trusted-proxy pattern Selenwright already supports.

3.2.1. Configuration

Inside router.json:

{
  "upstream_identity": {
    "user_header":  "X-Forwarded-User",
    "admin_header": "X-Admin",
    "secret_ref":   "env:GRIDLANE_ROUTER_SECRET"
  }
}
Field Required Notes

user_header

yes (when the block is present)

Header name for the identity subject. Selenwright’s trusted-proxy default is X-Forwarded-User

admin_header

no

Header name that signals admin. When set, Gridlane stamps true iff the request authorized as Gridlane admin. Omit to not emit an admin flag

secret_ref

recommended

env: / file: reference to a shared secret. When set, Gridlane stamps X-Router-Secret: <secret> on every upstream request

Omit the entire upstream_identity block to keep the legacy behavior — Gridlane authenticates the client on its own and forwards nothing about the identity upstream.

3.2.2. What Gets Stamped

On every upstream request Gridlane stamps:

<user_header>:                alice
<admin_header>:               true        # only if Gridlane admin
X-Router-Secret:  <secret>    # only if secret_ref is set

For a guest-scope request, <user_header> is set to the literal string guest — it is still a named identity from Selenwright’s perspective, just a shared one.

The pool-level credentials block (BasicAuth injection) is independent of this. Gridlane can both inject Authorization: Basic …​ for Selenwright’s legacy -htpasswd mode and stamp identity headers. Selenwright will honour whichever matches its running auth mode.

3.2.3. What Gets Stripped

Before Gridlane’s own auth layer runs, a middleware strips these headers from the incoming request:

  • <user_header> (e.g. X-Forwarded-User)

  • <admin_header> (e.g. X-Admin)

  • X-Router-Secret

This means a client cannot set X-Forwarded-User: alice themselves and inherit Alice’s identity downstream — the header is erased before any auth decision happens, and Gridlane re-derives the identity from the client’s real credentials.

The strip is driven by whatever headers the upstream_identity block names, plus the router-secret header. Rename user_header to something odd like X-Gridlane-Identity and that becomes the stripped header.

3.2.4. The Matching Selenwright Config

Selenwright in trusted-proxy mode honours identity from headers only when the source is trusted. Run Selenwright with:

$ selenwright                                             \
    -auth-mode=trusted-proxy                              \
    -user-header=X-Forwarded-User                         \
    -admin-header=X-Admin                                 \
    -trusted-proxy-secret="$GRIDLANE_ROUTER_SECRET"

-trusted-proxy-secret must resolve to the same value as Gridlane’s upstream_identity.secret_ref. Selenwright rejects any request whose X-Router-Secret does not match — so a direct client that bypasses Gridlane and tries to spoof X-Forwarded-User against Selenwright gets 401.

Gridlane and Selenwright wire the same secret differently. Gridlane’s secret_ref takes a env: / file: reference and resolves it at startup / on reload. Selenwright’s -trusted-proxy-secret takes the plaintext value directly on the command line. Pass the secret into Selenwright through your process manager / compose file, not via hand-rolled shell.

Selenwright also supports -trusted-proxy-cidr and -trusted-proxy-mtls-ca for additional source-trust validation. Combine them — AND across all configured checks — when the overlay network itself is not trustworthy.

3.2.5. What The Upstream Sees

With the setup above, Selenwright logs a DELETE /session/<id> like:

subject=alice admin=false source=trusted-proxy

And Selenwright enforces its own per-user ACL against alice, not against the pool service account. Alice can only kill her own sessions; a team member in Alice’s Selenwright-side group can observe them; Selenwright admin bypass works for Gridlane admins.

3.2.6. What Happens Without A Router Secret

secret_ref is technically optional. Without it, Gridlane still stamps X-Forwarded-User / X-Admin on upstream requests, but there is nothing to distinguish a request from Gridlane from a direct client on the same network that forged X-Forwarded-User. Selenwright-side identity would be trivially spoofable.

Leave secret_ref out only when the overlay network between Gridlane and Selenwright is itself authenticated (mTLS, a private overlay with no other clients) and Selenwright’s trusted-proxy-cidr / trusted-proxy-mtls-ca already gate by source trust.

For the default local-compose topology, always set secret_ref.

3.2.7. Rotating The Router Secret

Update the referenced env var or file and send SIGHUP to both Gridlane and Selenwright. Both sides re-resolve the secret on reload. There is a brief window during rotation where Gridlane and Selenwright disagree — plan the rotation order (update Selenwright’s new value first, then flip Gridlane over) or accept a short read failure.

3.2.8. Observability

The log line on every request includes subject=<name> scope=<scope>. When an upstream request is made with identity headers, the line also includes upstream_identity=true. /metrics counts admin-scope requests separately (via the route label on /config / /metrics).

See Observability for the full metric list.

4. Operations

4.1. Reload (SIGHUP)

Gridlane reloads router.json on SIGHUP without restarting the process. Reload is fail-closed: if the new config is invalid, the previous runtime keeps serving and the error is logged. You never serve a half-loaded config and you never go "dark" because of a typo.

4.1.1. Triggering A Reload

$ kill -HUP $(pidof gridlane)

Inside Docker:

$ docker kill -s HUP gridlane

Disable with -reload-on-sighup=false if you prefer to roll out config changes via restart only.

4.1.2. What Gets Reloaded

Everything that lives in router.json:

  • users[] — add / remove / rename users, change passwords (re-resolves password_ref), change quotas.

  • guest — add / remove guest access, change guest quota.

  • catalog — add / remove browsers, change advertised versions / protocols / platforms.

  • backend_pools[] — add / remove pools, change endpoints, weights, regions, credentials, health policies.

  • admin.token_ref — rotate the admin token (re-resolves).

  • upstream_identity — turn trusted-proxy propagation on or off, rotate the router secret.

Health state is reset for pools that are new or whose endpoint / id changed. Pools that survive the reload unchanged keep their failure counter and cooldown state.

4.1.3. What Needs A Restart

Anything that is not in router.json — that is, every CLI flag:

  • -listen, -metrics-listen — changing the listen address requires a new socket.

  • -log-format — the logger is built once at startup.

  • -graceful-period, -session-attempt-timeout, -proxy-timeout.

  • -reload-on-sighup — the signal handler is wired once at startup.

Flags are frozen at startup. Move anything you need to tune live into router.json fields.

4.1.4. Fail-Closed Semantics

Reload proceeds in phases:

  1. Gridlane reads the config file from disk. If the read fails (file missing, permission denied), the error is logged and the current runtime is kept. No change.

  2. The config is parsed with DisallowUnknownFields. If parsing fails, error logged, current runtime kept.

  3. The parsed config is validated (schema rules, duplicate IDs, bad endpoints, invalid durations). If validation fails, error logged, current runtime kept.

  4. Every *_ref is resolved against environment and filesystem. If any reference fails (env var unset, file unreadable, trimmed value empty), error logged, current runtime kept.

  5. A new handler + health manager are built. If construction fails (should not happen with a validated config, but defensive), current runtime kept.

  6. Only when all of the above succeed is the live handler atomically swapped to the new one.

The swap is a single atomic.Value.Store call. Concurrent in-flight requests finish against whichever handler they started with — there is no torn state where a request sees a mix of old and new config.

4.1.5. Observability

A reload logs one of:

level=info  msg="config reloaded"              path=/etc/gridlane/router.json
level=error msg="config reload failed"         path=/etc/gridlane/router.json error="..."

The error message is the specific validation or resolution failure so you can fix the file and SIGHUP again without reading through the whole process log.

4.1.6. Inspecting The Loaded Config

Confirm the new config took effect with the admin /config endpoint:

$ curl -fsS -H 'X-Gridlane-Admin-Token: root-token' http://127.0.0.1:4444/config

The response is the sanitized view of the live config — secrets are redacted, resolved values are not included. If /config still shows the old pool list after a SIGHUP, the reload either did not happen (log) or failed (log). Check the log.

4.1.7. Degraded State

If Gridlane starts up successfully but the current runtime is later invalidated (for example, the file that admin.token_ref=file:…​ points at is deleted, then SIGHUP is sent), the reload fails and the old runtime keeps serving. The admin token the old runtime loaded on startup is still valid.

The runtime is only torn down on process exit.

4.1.8. Typical Workflows

Rotate a user password
  1. Update the env var or file referenced by password_ref.

  2. kill -HUP Gridlane.

  3. The next request from that user with the new password is admitted; requests with the old password get 401.

Add a new backend pool
  1. Add the pool entry to router.json.

  2. kill -HUP Gridlane.

  3. The new pool starts receiving a proportional share of new session placements immediately. Existing sessions continue against their original pools.

Take a pool out of rotation
  1. Remove the pool from backend_pools[].

  2. kill -HUP Gridlane.

  3. New sessions no longer go to that pool. Existing sessions on that pool return 404 on follow-up — their route token no longer matches any live pool. If you need a graceful drain, drop the pool’s weight to a tiny value first, let sessions churn, then remove it in a second reload.

4.2. Observability

Three surfaces:

  • Prometheus metrics — structured counters, histograms, and gauges for every routing decision.

  • Structured logslog/slog with a text or JSON backend.

  • Status rollup — public /status for external liveness / readiness probes.

4.2.1. Metrics

Prometheus text format on /metrics. On the main listener it is admin scope (X-Gridlane-Admin-Token or Authorization: Bearer). When -metrics-listen is set to a separate address (typically bound to a private interface), /metrics there has no auth.

Counters
Metric Labels Meaning

gridlane_http_requests_total

method, route, status

HTTP requests received, bucketed by route template and response status

gridlane_proxy_requests_total

protocol, backend, outcome

Proxy attempts to upstream backends; outcome is success (2xx/3xx), failure (5xx / 408 / 425 / 429 / upstream 401 / 403), client_error (other 4xx — caller’s fault, does not degrade health), or error (transport failure before the upstream ever answered)

gridlane_websocket_sessions_total

backend, event

Playwright WebSocket session events; event is started, upgraded

Histograms
Metric Labels Buckets (seconds)

gridlane_http_request_duration_seconds

method, route

0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 30

gridlane_proxy_request_duration_seconds

protocol, backend

same

Gauges
Metric Labels Meaning

gridlane_backend_available

backend, region, protocols

1 when the pool is healthy, 0 when in cooldown

gridlane_backend_failures_total

backend, region, protocols

Current run of consecutive failures against the pool’s failure_threshold. Reset to 0 on the first successful upstream request after the failures, and on cooldown expiration. Despite the _total suffix, this is emitted as a gauge — it is a live snapshot of the in-memory Manager, not a monotonic counter of all-time failures

Route Labels Are Templates

The route label on gridlane_http_requests_total and gridlane_http_request_duration_seconds is a fixed template, not the literal request path. Session IDs are rewritten to :session, browser/version pairs to :browser/:version, so cardinality stays bounded:

/ping
/status
/config
/quota
/metrics
/history/settings
/session/:session
/wd/hub/session/:session
/playwright/:browser/:version
/host/:session
/vnc/:session
/devtools/:session
/video/:session
/logs/:session
/download/:session
/downloads/:session
/clipboard/:session
other

Anything that does not match a registered route maps to other. This is not a leak — Gridlane `404`s unknown paths — but it is worth watching for a sudden increase, which usually means a client is using the wrong URL shape.

Example Queries

Request rate per route:

sum by (route) (rate(gridlane_http_requests_total[1m]))

p95 proxy latency by backend:

histogram_quantile(0.95, sum by (le, backend) (rate(gridlane_proxy_request_duration_seconds_bucket[5m])))

Backends currently in cooldown:

gridlane_backend_available == 0

Error rate by backend:

sum by (backend) (rate(gridlane_proxy_requests_total{outcome="failure"}[1m]))
  /
sum by (backend) (rate(gridlane_proxy_requests_total[1m]))

4.2.2. Logs

Gridlane emits one log/slog record per request and one per lifecycle event. Pick the format with -log-format:

$ ./gridlane -log-format text   # default, human-readable
$ ./gridlane -log-format json   # one-line JSON, for log aggregators

Per-request fields:

Field Value

msg

http request

method

GET, POST, DELETE, …

path

Literal request path (not the template)

status

HTTP response status

duration_ms

Millisecond request duration

subject

Resolved identity: user name, guest, admin, or - for unauthenticated

scope

Scope the handler required (public / user / side / admin)

backend

Pool ID when the request was proxied

route_token

Public route token when the request carried a session ID

Passwords, tokens, the Authorization header, and X-Router-Secret are never logged.

Lifecycle log lines include config reload (msg="config reloaded" / msg="config reload failed"), startup (msg="listening"), and shutdown (msg="gracefully stopping").

4.2.3. /status

Public JSON rollup for external health probes:

$ curl -fsS http://127.0.0.1:4444/status
{
  "service":         "gridlane",
  "status":          "ok",
  "backend_count":   2,
  "available_count": 2
}

status is "ok" when at least one pool is healthy, "degraded" otherwise. A degraded /status with available_count=0 means every new session placement is failing with 503.

/status reflects Gridlane’s view of its pools only. An external probe that hits Selenwright through Gridlane should hit /ping (cheaper) or an actual session-create round-trip (deeper, slower).

4.2.4. Tracing

Gridlane does not emit OpenTelemetry spans today. The gridlane_http_request_duration_seconds histogram combined with backend / route labels is the current latency breakdown surface. If you need end-to-end tracing across Gridlane and Selenwright, run a sidecar proxy that injects W3C traceparent on the ingress edge — Gridlane forwards unknown request headers transparently.

5. Configuration Reference

Complete list of knobs Gridlane honours — router.json schema, CLI flags, and how to run it in Docker.

5.1. Router Configuration

Gridlane reads a single JSON file — router.json by default, override with -config. The schema is strict: unknown fields, missing required fields, invalid types, and unresolvable secret references are rejected at startup and on reload.

5.1.1. Top-Level Shape

{
  "version":           1,
  "users":             [ ... ],
  "guest":             { ... },
  "catalog":           { ... },
  "backend_pools":     [ ... ],
  "admin":             { ... },
  "upstream_identity": { ... }
}
Field Type Required Notes

version

int

yes

Must be 1. The version prefix is also embedded in public session IDs

users

array

*

List of BasicAuth users. At least one of users or guest is required

guest

object

*

Anonymous guest scope. At least one of users or guest is required

catalog

object

yes

Advertised browsers and protocols

backend_pools

array

yes

Selenwright backends to route to (one or more)

admin

object

no

Admin token configuration

upstream_identity

object

no

Trusted-proxy identity propagation (see Identity Propagation)

5.1.2. users[]

{
  "users": [
    {
      "name": "alice",
      "password_ref": "env:GRIDLANE_ALICE_PASSWORD",
      "quota": { "max_sessions": 20 }
    }
  ]
}
Field Type Required Notes

name

string

yes

Non-empty, unique across users[]

password_ref

string

yes

env:NAME or file:/absolute/path — see Secret References

quota.max_sessions

int

yes

Must be > 0

5.1.3. guest

{ "guest": { "quota": { "max_sessions": 2 } } }
Field Type Required Notes

quota.max_sessions

int

yes

Must be > 0. Shared across all guest requests

Omit the block to disable anonymous access. When enabled, guest applies to user scope only — side endpoints still require named BasicAuth. See Authentication.

5.1.4. catalog

{
  "catalog": {
    "browsers": [
      {
        "name": "chrome",
        "versions": ["stable", "beta"],
        "platforms": ["linux"],
        "protocols": ["webdriver", "playwright"]
      }
    ]
  }
}

catalog.browsers[] must be non-empty. Per entry:

Field Type Required Notes

name

string

yes

Non-empty, unique across browsers[]

versions

[]string

yes

Non-empty. Each entry must be non-empty

platforms

[]string

no

Optional; each entry must be non-empty

protocols

[]string

yes

Non-empty. Allowed values: webdriver, playwright

The catalog is advertised via /quota and is the authoritative source of truth for which Playwright /<browser>/<version> paths are accepted.

5.1.5. backend_pools[]

{
  "backend_pools": [
    {
      "id":        "selenwright-a",
      "endpoint":  "http://selenwright-a:4444",
      "region":    "local-a",
      "weight":    1,
      "protocols": ["webdriver", "playwright"],
      "credentials": {
        "username_ref": "env:SELENWRIGHT_A_USER",
        "password_ref": "env:SELENWRIGHT_A_PASSWORD"
      },
      "health": {
        "enabled":            true,
        "failure_threshold":  2,
        "cooldown":           "10s"
      }
    }
  ]
}
Field Type Required Notes

id

string

yes

Non-empty, unique across backend_pools[]. Embedded in the route-token derivation; must stay stable

endpoint

string

yes

http:// or https:// URL. Host required. Must not embed user:password@

region

string

yes

Non-empty. Used for region-aware placement

weight

int

yes

Must be > 0. Weighted round-robin within a region

protocols

[]string

yes

Non-empty. webdriver, playwright, or both

credentials

object

no

BasicAuth to inject on upstream requests — see below

health

object

no

Failure-threshold + cooldown health policy — see Backend Health

credentials (optional):

Field Type Required Notes

username_ref

string

yes (if credentials is set)

env: / file: reference

password_ref

string

yes (if credentials is set)

env: / file: reference

health (optional):

Field Type Required Notes

enabled

bool

no

Default false. When false, the pool is always considered healthy

failure_threshold

int

no

Must be zero or positive. Defaults to 1 when health is enabled

cooldown

string

no

Go duration string ("10s", "2m"). Defaults to 30s

5.1.6. admin

{ "admin": { "token_ref": "env:GRIDLANE_ADMIN_TOKEN" } }
Field Type Required Notes

token_ref

string

no

env: / file: reference. When omitted, /config and main-listener /metrics are permanently unavailable

5.1.7. upstream_identity

{
  "upstream_identity": {
    "user_header":  "X-Forwarded-User",
    "admin_header": "X-Admin",
    "secret_ref":   "env:GRIDLANE_ROUTER_SECRET"
  }
}
Field Type Required Notes

user_header

string

yes (when the block is present)

Non-empty header name

admin_header

string

no

Header name for the admin flag

secret_ref

string

no

env: / file: reference to the shared router secret

See Identity Propagation for the end-to-end behavior.

5.1.8. Secret References

Every *_ref field accepts exactly two forms:

Form Notes

env:NAME

Read from the NAME environment variable. The value must be present and non-empty at startup / reload

file:/absolute/path

Read from an absolute file path. Path must be absolute (no .., no trailing /). Surrounding whitespace is trimmed from the file content

Anything else — a raw password string, a relative path, a URL — is rejected with a validation error. There is no plaintext fallback.

Secrets are re-resolved on every reload. Rotate a secret by updating the env var or the file content, then SIGHUP.

5.1.9. Validation Summary

Gridlane validates the whole config before it replaces the live runtime. The most common reasons a reload (or startup) fails:

  • Unknown field (DisallowUnknownFields). Typos in field names are not silently ignored.

  • Duplicate users[].name, catalog.browsers[].name, or backend_pools[].id.

  • Empty required string or empty list where non-empty is required.

  • backend_pools[*].endpoint without scheme, with a wrong scheme (not http:///https://), without a host, or with embedded credentials.

  • *_ref that does not start with env: or file:, a file: path that is not absolute or not cleaned, or an env var that resolves to an empty string.

  • health.cooldown that is not a valid Go duration.

Every failure is logged with the JSON path of the bad field (e.g. backend_pools[1].endpoint must use http or https) so a reload failure is self-diagnosing.

5.1.10. Full Example

See examples/router.compose.json for the canonical example that ships with the repo.

5.2. HTTP API

Gridlane serves every route on the main listener (-listen, default :4444). Optionally /metrics can be exposed on a separate listener (-metrics-listen) with no auth — intended for private-network scraping.

Every route is registered at a scope. See Authentication for the credential model.

5.2.1. Summary

Path Scope Notes

GET /ping

public

Liveness

GET /status

public

Backend rollup

GET /quota

user

Caller’s own quota (plus guest quota if configured)

GET /config

admin

Sanitized live router.json

GET /metrics

admin on main / public on -metrics-listen

Prometheus text format

POST /session, POST /wd/hub/session

user

WebDriver session create

* /session/:id/, /wd/hub/session/:id/*

user

WebDriver session follow-up

GET /playwright/:browser/:version

user

Playwright WebSocket upgrade

GET /host/:id

side

JSON describing the backend the session is pinned to

GET /vnc/:id

side

VNC live stream — WebSocket upgrade

GET /devtools/:id/*

side

Chrome DevTools Protocol — WebSocket upgrade

GET /logs/:id

side

Live session logs — WebSocket upgrade

GET /video/:id, GET /video/:id.mp4

side

Recorded session video — HTTP file download

GET /download/:id/:name

side

HTTP file download from inside the session

GET /downloads/:id

side

HTTP directory listing for the session’s downloads

GET /clipboard/:id, POST /clipboard/:id

side

HTTP clipboard read/write

GET /history/settings, GET /history/settings/*

side

Selenwright artifact-history settings (HTTP)

5.2.2. GET /ping

Liveness. Returns 200 with a fixed body. No auth.

$ curl -fsS http://127.0.0.1:4444/ping
{"service":"gridlane","status":"ok"}

5.2.3. GET /status

Rolled-up backend view. No auth.

$ curl -fsS http://127.0.0.1:4444/status
{
  "service":         "gridlane",
  "status":          "ok",
  "backend_count":   2,
  "available_count": 2
}

status is "ok" when at least one pool is healthy, "degraded" otherwise. See Backend Health.

5.2.4. GET /quota

Caller’s own quota. user scope — BasicAuth matches the returned user, or (when guest is configured) an unauthenticated request returns the guest quota.

$ curl -fsS -u alice:wonderland http://127.0.0.1:4444/quota
{
  "users": [
    { "name": "alice", "quota": { "max_sessions": 20 } }
  ],
  "guest": { "quota": { "max_sessions": 2 } }
}

The users array contains only the requesting user; other users' quotas are never exposed. guest is omitted when guest access is disabled.

5.2.5. GET /config

Sanitized live config — no secrets, no resolved passwords, no admin token. admin scope.

$ curl -fsS -H 'X-Gridlane-Admin-Token: root-token' http://127.0.0.1:4444/config

Returns a JSON body shaped like router.json with *_ref fields rendered as-is ("env:GRIDLANE_ALICE_PASSWORD") and no resolved plaintext. Safe to dump into a log or paste into a ticket.

5.2.6. GET /metrics

Prometheus text format. On the main listener it is admin scope; on the -metrics-listen listener it has no auth. See Observability for the metric list.

5.2.7. WebDriver

POST /session or POST /wd/hub/sessionuser scope. Gridlane routes the session create by catalog + weighted pool selection, then rewrites the upstream session ID in the response body to the public form.

Follow-up requests (GET /session/<id>/url, DELETE /session/<id>, …) route by the <route-token> embedded in the public ID. Gridlane rewrites the session segment on the upstream path before proxying.

5.2.8. Playwright

GET /playwright/<browser>/<version> with Upgrade: websocketuser scope. Gridlane mints a public session ID, sends the upgrade upstream with X-Selenwright-External-Session-ID, and echoes the public ID back on the 101 response in X-Selenwright-Session-ID.

5.2.9. Side Endpoints

All side endpoints are side scope — BasicAuth matching a users[] entry, no guest fallback. The session ID on the path is the Gridlane public ID; Gridlane decodes it, picks the backend by the embedded route-token, and rewrites the upstream path per protocol: for WebDriver it substitutes the decoded upstream ID (Selenwright stored the session under that); for Playwright it keeps the public ID (Selenwright stored the session under the public ID via X-Selenwright-External-Session-ID). For /video/ Gridlane appends .mp4 if the client omitted it.

HTTP vs WebSocket by endpoint

Side endpoints are a mix of HTTP requests and WebSocket streams; Gridlane’s reverse proxy handles both transparently, but the client has to speak the right protocol.

Endpoint Transport

/vnc/<id>

WebSocket — live VNC stream; client must send Upgrade: websocket

/devtools/<id>/…​

WebSocket — Chrome DevTools Protocol

/logs/<id>

WebSocket — live session-log stream. There is no HTTP body here — curl http://…/logs/<id>; returns 400; use a WS client. The log file on disk (when -log-output-dir and -save-all-logs are set on Selenwright) is not exposed over HTTP by Selenwright and therefore not by Gridlane

/video/<id>, /video/<id>.mp4

HTTP file download — the recorded mp4

/download/<id>/<name>

HTTP file download — one file from the session’s download area

/downloads/<id>

HTTP directory listing (usually JSON)

/clipboard/<id>

HTTP — GET reads, POST writes

/host/<id>

HTTP (Gridlane-only) — JSON describing the backend the session is pinned to

/history/settings, /history/settings/*

HTTP

Gridlane is a thin reverse proxy for all of these — response body, content type and upstream headers pass through untouched apart from the session-ID rewriting described above.

5.2.10. Error Shapes

Most endpoints return a minimal body on failure — the status code is authoritative:

Status Meaning

400

Malformed path, missing required segment, unknown browser/version, bad session ID format

401

Credentials missing or rejected (and no guest fallback for this scope)

403

Scope mismatch (e.g. admin endpoint with user credentials)

404

Unknown path, or session ID whose route-token does not match any live pool

429

Caller’s quota reached

502

Upstream transport failure

503

No healthy backend pool matches; or reload has not completed on a cold start

WebDriver session create returns a W3C-shaped error body on 4xx/5xx:

{
  "value": {
    "error":      "session not created",
    "message":   "no healthy backend pool matches webdriver",
    "stacktrace": ""
  }
}

5.3. CLI Flags

All CLI flags are frozen at startup — none of them reload on SIGHUP. Everything tunable at runtime lives in Router Configuration.

5.3.1. Listeners

-listen string
    Main HTTP listen address (default ":4444"). Serves WebDriver,
    Playwright, side endpoints, /ping, /status, /quota, /config, and
    /metrics (admin-scoped on this listener).

-metrics-listen string
    Optional separate Prometheus listen address (e.g. ":9090"). When
    set, /metrics on this listener is exposed with no authentication
    — intended for internal-only binds or a private overlay network.
    When empty (default), /metrics is only available on the main
    listener under admin scope.

5.3.2. Configuration

-config string
    Path to router.json (default "router.json"). See
    <<Router Configuration>> for the schema.

-reload-on-sighup
    Reload config on SIGHUP (default true). Reload is fail-closed —
    an invalid new config keeps the prior runtime serving. Set to
    false to disable SIGHUP handling entirely. See <<Reload (SIGHUP)>>.

5.3.3. Timeouts

-graceful-period duration
    Graceful shutdown period (default 15s). On SIGINT/SIGTERM,
    Gridlane stops accepting new connections and waits up to this
    long for in-flight requests to finish before force-closing.

-session-attempt-timeout duration
    Upstream session creation timeout (default 30s). Upper bound on
    how long a WebDriver POST /session or a Playwright upgrade may
    take against the backend before Gridlane gives up and returns
    a 503.

-proxy-timeout duration
    Per-request upstream proxy timeout (default 5m). Upper bound on
    any other proxied request (WebDriver follow-ups, side endpoints).
    WebSocket connections are not subject to this timeout after the
    101 Switching Protocols — idle WebSockets are closed by the
    usual Selenwright-side session idle timeout.

Durations use Go time.Duration format: 500ms, 30s, 2m, 1h.

5.3.4. Logging

-log-format string
    Log format (default "text"). "text" is human-readable; "json"
    emits one-line structured JSON via log/slog — feed this directly
    into a log aggregator.

See Observability for the field catalog.

5.3.5. Miscellaneous

-version
    Print the version and exit. Set at build time via the
    `-X main.version=<tag>` ldflag (goreleaser populates this).

5.3.6. Defaults Summary

Flag Default Notes

-listen

:4444

Main HTTP listener

-metrics-listen

(empty)

Separate unauthenticated metrics listener

-config

router.json

Path to the config file

-graceful-period

15s

Shutdown drain

-session-attempt-timeout

30s

Upstream session-create timeout

-proxy-timeout

5m

Upstream request timeout

-reload-on-sighup

true

Reload config on SIGHUP

-log-format

text

text or json

-version

Print version and exit

5.3.7. Example

$ ./gridlane                                              \
    -listen         :4444                                 \
    -metrics-listen :9090                                 \
    -config         /etc/gridlane/router.json             \
    -log-format     json                                  \
    -proxy-timeout  10m

5.4. Docker

The published image is a minimal two-stage Alpine build. The binary is the image’s ENTRYPOINT, so every argument on docker run becomes a Gridlane CLI flag.

  • Build stage: golang:1.26.2-alpine, CGO_ENABLED=0, -trimpath -ldflags="-s -w".

  • Final stage: alpine:3.23 with ca-certificates and tzdata.

  • Runs as unprivileged 65532:65532.

  • Exposes 4444 (main listener) and 9090 (optional metrics listener).

5.4.1. Standalone Container

$ docker run -d --name gridlane                               \
    -p 4444:4444 -p 9090:9090                                 \
    -v $(pwd)/router.json:/etc/gridlane/router.json:ro        \
    -e GRIDLANE_ALICE_PASSWORD=wonderland                     \
    -e GRIDLANE_ADMIN_TOKEN=root-token                        \
    selenwright/gridlane:latest-release                          \
    -config /etc/gridlane/router.json                         \
    -metrics-listen :9090                                     \
    -log-format json

Mount the config read-only. Pass secrets through environment variables — Gridlane reads them at startup and on reload via the env:NAME references in router.json.

Rotate a password or the admin token:

# Update the referenced env var on the host, then:
$ docker kill -s HUP gridlane

(env: references pin the value at process start. To rotate the in-process value you either restart the container or, for file: references, update the mounted file and send SIGHUP. For strict rotations without restart, use file: refs with a tmpfs secret mount.)

5.4.2. Docker Compose

End-to-end recipe: one Gridlane in front of two Selenwright backends, with trusted-proxy identity propagation. See Running With Selenwright for the router.json shape.

version: "3.9"

networks:
  gridlane-integration:

services:
  gridlane:
    image: selenwright/gridlane:latest-release
    networks: [gridlane-integration]
    ports:
      - "4444:4444"
      - "9090:9090"
    volumes:
      - ./router.json:/etc/gridlane/router.json:ro
    environment:
      GRIDLANE_ALICE_PASSWORD: "${GRIDLANE_ALICE_PASSWORD:?required}"
      GRIDLANE_ADMIN_TOKEN:    "${GRIDLANE_ADMIN_TOKEN:?required}"
      GRIDLANE_ROUTER_SECRET:  "${GRIDLANE_ROUTER_SECRET:?required}"
    command:
      - "-listen"
      - ":4444"
      - "-metrics-listen"
      - ":9090"
      - "-config"
      - "/etc/gridlane/router.json"
      - "-log-format"
      - "json"
      - "-proxy-timeout"
      - "10m"

  selenwright-a:
    image: selenwright/hub:latest-release
    networks: [gridlane-integration]
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      GRIDLANE_ROUTER_SECRET: "${GRIDLANE_ROUTER_SECRET:?required}"
    command:
      - "-auth-mode=trusted-proxy"
      - "-user-header=X-Forwarded-User"
      - "-admin-header=X-Admin"
      - "-trusted-proxy-secret=${GRIDLANE_ROUTER_SECRET}"
      - "-limit=20"
      - "-timeout=10m"

  selenwright-b:
    image: selenwright/hub:latest-release
    networks: [gridlane-integration]
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      GRIDLANE_ROUTER_SECRET: "${GRIDLANE_ROUTER_SECRET:?required}"
    command:
      - "-auth-mode=trusted-proxy"
      - "-user-header=X-Forwarded-User"
      - "-admin-header=X-Admin"
      - "-trusted-proxy-secret=${GRIDLANE_ROUTER_SECRET}"
      - "-limit=20"
      - "-timeout=10m"

Bring it up with the three secrets in your environment:

$ GRIDLANE_ALICE_PASSWORD=wonderland   \
  GRIDLANE_ADMIN_TOKEN=root-token      \
  GRIDLANE_ROUTER_SECRET=dev-secret    \
  docker compose up --build

Exposed ports:

Port Role

4444

Main Gridlane listener — WebDriver, Playwright, side endpoints, /ping, /status, /quota, /config, admin-scoped /metrics

9090

Unauthenticated Prometheus /metrics — bind to a private interface in production

Smoke checks follow the pattern from Running With Selenwright.

5.4.3. Building The Image Locally

$ docker build -t gridlane:local .

The Dockerfile is a multi-stage build with no build-time args you need to set. The resulting image tag is gridlane:local; run it the same way as the published image.

5.4.4. Image Tags

Tag Published On

latest

Every push to main (CI build)

latest-release

Every GitHub release

<version> (e.g. 0.3.1)

Per-release semver tag

Pin production deployments to a specific semver tag, not latest-release.