Question 1

How does an in-browser GPU benchmark work?

Accepted Answer

Modern browsers expose your GPU through WebGPU, the successor to WebGL. The quick test runs compute shaders that read a large buffer (measuring memory bandwidth) and execute millions of fused multiply-add operations (measuring compute throughput). The real model test goes further: it loads an actual LLM into your GPU via the WebLLM library and generates text. Everything runs locally — no data leaves your machine.

Question 2

Why does memory bandwidth matter more than compute for LLMs?

Accepted Answer

Generating each token requires reading every active model parameter from GPU memory, but only doing about 2 math operations per parameter read. GPUs can do hundreds of operations in the time it takes to read one value, so the math units spend most of their time waiting for memory. Your bandwidth number divided by the model size (in bytes) is approximately your maximum tokens per second.

Question 3

Why are browser results slower than Ollama or llama.cpp?

Accepted Answer

WebGPU adds overhead: stricter security checks, less optimized kernels than hand-tuned CUDA/Metal code, and FP32-only compute paths in many cases. Browser inference typically achieves 60-75% of native performance. The benchmark accounts for this — the "estimated native bandwidth" and predicted model speeds are adjusted to reflect what you would get from Ollama, llama.cpp, or LM Studio on the same hardware.

Question 4

Is the model download safe, and where does it go?

Accepted Answer

The real-model test downloads open-weight models (Qwen, Llama, SmolLM) directly from Hugging Face — the same files everyone uses. Your browser caches them in its storage system (IndexedDB/Cache API), so repeat tests skip the download. You can clear them anytime via your browser settings (clear site data). Nothing is installed on your system.

Question 5

Why does the benchmark need Chrome, Edge, or Safari?

Accepted Answer

The benchmark requires WebGPU, which is fully supported in Chrome 113+, Edge 113+, and Safari 18+. Firefox has WebGPU support in progress (available in nightly builds). If your browser does not support WebGPU, you can still use our LLM Inference Speed Calculator, which estimates speed from your GPU's published specs instead of measuring it.

Question 6

My measured bandwidth is much lower than my GPU's spec. Why?

Accepted Answer

Several causes: **1)** Browser overhead (expect 25-35% below spec — this is normal and accounted for). **2)** Other tabs or apps using the GPU — close them and re-run. **3)** Laptop power management — plug in and set performance mode. **4)** On laptops with two GPUs, the browser may be using the integrated GPU instead of the discrete one — check your browser's GPU settings (chrome://gpu). **5)** Thermal throttling on thin laptops.

Question 7

What is a good score?

Accepted Answer

For memory bandwidth (the number that matters most): **under 100 GB/s** (integrated graphics) — only small models run well; **200-500 GB/s** (mainstream GPUs, Apple M-series) — 7-14B models run great; **500-1000 GB/s** (high-end consumer: RTX 4080/4090, M-series Max) — 30B models are comfortable; **1000+ GB/s** (RTX 5090, datacenter) — 70B models become practical. The predicted speeds table translates your exact number into real model performance.

LLM GPU Benchmark

Benchmark Your GPU for AI Workloads

What It Measures

Why Measured Beats Theoretical

When to Use It

Reading Your Benchmark Results

Why Run a Real Model Instead of Trusting the Quick Test?

Frequently Asked Questions