BrowserLLM - Run AI Models Directly In Your Browser
Run 100+ open-source AI models directly in your browser using WebGPU. No servers, no API keys, completely private and works offline.
Features
- 100+ open-source LLMs including Llama, Qwen, Phi, Gemma, Mistral, DeepSeek
- 100% private - your data never leaves your device
- Works completely offline after first model download
- WebGPU-powered near-native GPU performance
- Free forever - no API keys, no subscriptions, no per-token charges
- PWA support - install as a native app on any device
How It Works
BrowserLLM uses WebGPU, the next-generation GPU API for the web, combined with 4-bit quantization and WebAssembly to run large language models at near-native speed directly in your browser tab. Models are cached locally after the first download.
FAQ
How can an AI model run in a browser? Modern browsers support WebGPU, giving web apps direct GPU access. Combined with 4-bit quantization (8x compression) and WebAssembly, models with billions of parameters run in a browser tab.
Is my data private? Yes. All processing happens on your device. There is no server - your conversations never leave your browser.
Is it free? Yes, forever. No API keys, no subscriptions. The computation uses your own GPU.
Please enable JavaScript to use BrowserLLM.