Browser performance artifact Report artifact 2026-05-31

Browser WebGPU Training Speedup

This is the public proof that the browser playground was not just a demo. The same GPT-2-shaped training path runs through a benchmark harness and reports measured end-to-end speedups.

Headline Numbers

WebGPU speedup

12.1x vs WASM SIMD at d_model=256

Small-width speedup

2.6x d_model=96

Browser track

shipped WASM, SIMD, OPFS, WebGPU fast path

Competitive Context

System Metric Score Size / Class Comparable? Readout
TinyGPT WebGPU training step speedup 12.1x d_model=256 browser run Direct Directly measured against the repo's WASM SIMD path.
TinyGPT WASM SIMD training step speedup 1.0x same browser model/config Direct Portable CPU baseline and fallback path.
Native Mac runtimes browser training benchmark not measured MLX/Metal class Not comparable Native runtimes are the right competition for production throughput, but not for the browser-learning artifact.

Direct rows share this artifact's eval setup. Directional rows are useful market context but should not be read as leaderboard claims.

Performance readout

VariantResultInterpretation
WASM SIMDbaselinePortable CPU path
WebGPU d_model=962.6xGPU overhead still visible
WebGPU d_model=25612.1xGPU dominates as width grows

Release Blockers

Not active factory center

The browser track is valuable, but current active work is the Mac-local specialist factory.

Unblock: Use browser pages to present factory reports instead of expanding playground scope.

Evidence

Next Release Action

Keep as a public performance artifact and cross-link it from factory reports when browser-local training matters.