Skip to main content
Pillar

WebGPU Video Generation: AI Video Scripts in Your Browser

WebGPU lets a browser tab access your device's GPU directly. Phantomline uses this to run Llama 3.2 1B for script generation, Web Speech API for narration, and ffmpeg.wasm for video rendering, all inside a browser with no install, no server, and no API key.

What is WebGPU?

WebGPU is a browser API (successor to WebGL) that gives web applications low-level access to your device's GPU. Where WebGL was designed for 3D graphics, WebGPU adds compute shader support, which makes it possible to run machine learning inference directly in a browser tab.

For video creators, this means one thing: you can generate AI video scripts without installing any software and without sending your prompts to a cloud server. Open a browser tab, wait for the model to load, type a topic, and get a script. The entire computation happens on your device.

How Phantomline uses WebGPU

Phantomline's browser mode (the PWA) uses the WebLLM library to run Llama 3.2 1B over WebGPU. The architecture works like this:

  1. When you open the studio at phantomline.xyz/app, the app checks for WebGPU support in your browser.
  2. On first use, the Llama 3.2 1B model weights download (~1 GB). This is cached in the browser's Cache API storage.
  3. Subsequent visits load the model from cache in 5-15 seconds. No re-download needed.
  4. When you generate a script, WebLLM runs inference on the model using your GPU via WebGPU compute shaders.
  5. The script output feeds into Web Speech API (for narration), Web Audio API (for music), and ffmpeg.wasm (for rendering).

The result is a complete video generation pipeline that runs in a browser tab with zero server-side inference. Phantomline's web server delivers the application code and static assets, but never participates in AI inference.

The ~1 GB first download

The Llama 3.2 1B model is about 1 GB in quantized form. This downloads once over your internet connection and gets stored in your browser's cache. After that initial download:

  • The model loads from local cache in 5-15 seconds on subsequent visits.
  • No internet is needed for script generation. You can work offline after the first load.
  • The cache persists across browser sessions. Clearing browser data will require a re-download.
  • Multiple devices each need their own download since browser caches are per-device.

One gigabyte is a significant download on slow connections. But it is a one-time cost that enables unlimited free script generation from that point forward. Compare that to paying per-token for every cloud API call.

Browser support

BrowserWebGPU statusNotes
Chrome 113+ (desktop)Full supportMost reliable. Recommended for Phantomline.
Edge 113+ (desktop)Full supportSame engine as Chrome. Works identically.
Chrome (Android)SupportedWorks on flagship phones from the last 2-3 years. Slower than desktop.
Safari (macOS Sonoma+)Partial supportSome compute shader features are still landing. May have issues with larger models.
Safari (iOS 17+)Partial supportWorks for the 1B model. Performance varies by device.
FirefoxIn developmentAvailable behind a flag (dom.webgpu.enabled). Not recommended for production use yet.

If your browser does not support WebGPU, Phantomline falls back to the free public LLM endpoint (text.pollinations.ai) for script generation. The rest of the pipeline (TTS, music, rendering) uses Web Speech API and ffmpeg.wasm, which work in all modern browsers.

Mobile support

WebGPU on mobile is genuinely usable for short-form content. Chrome on Android supports WebGPU on devices with Adreno, Mali, or Apple GPUs made in the last 2-3 years. Performance expectations:

  • Script generation: 30-90 seconds for a 200-word Shorts script on a flagship phone. Slower than desktop but functional.
  • Narration: Web Speech API uses your device's built-in TTS voices. Quality varies by phone manufacturer and OS version.
  • Rendering: ffmpeg.wasm works on mobile but is slower. A 60-second Short renders in 1-3 minutes on a modern phone.

The mobile experience is best suited for YouTube Shorts and quick drafts rather than long-form video production. For 10-minute videos with complex scripts, the desktop install with Ollama is a better choice.

Quality tradeoffs: WebGPU vs. Ollama vs. Claude

The 1B model is the smallest language model in Phantomline's lineup. It trades quality for accessibility. Here is how the three tiers compare for the same prompt:

DimensionWebGPU (1B)Ollama (8B)Claude (frontier)
Install requiredNoneOllama + model pullNone (API key only)
Script detailBasic, shorter paragraphsDetailed, well-structuredMost nuanced, best narrative
Max practical script length500-1,000 words2,000-10,000 words2,000-10,000+ words
Generation speed (2K words)2-5 minutes30-60 seconds5-15 seconds
Works on mobileYesNo (needs desktop)Yes (needs internet)
Works offlineYes (after first load)Yes (after model pull)No
Cost$0$0$0.001-$0.02 per script

WebGPU is the right choice when you need zero-install, zero-cost generation and are willing to accept shorter, simpler scripts. It excels for quick drafts and Shorts scripts where brevity is the goal anyway. For premium long-form content, Ollama or Claude will produce better results.

The full in-browser pipeline

WebGPU handles script generation, but Phantomline's browser mode runs the entire video pipeline client-side:

  • Script: WebLLM + Llama 3.2 1B via WebGPU compute shaders.
  • Narration: Web Speech API using your OS-provided TTS voices. Quality depends on your operating system (macOS and iOS have the best built-in voices).
  • Music: 8 royalty-free ambient tracks bundled with the app, plus on-the-fly synthesis via Web Audio API.
  • Captions: Generated from the script text. Styled and positioned for the selected format (vertical, horizontal, square).
  • Rendering: ffmpeg.wasm assembles the final MP4 in the browser. Multi-threaded rendering is available when the browser supports SharedArrayBuffer.

No server touches your content at any point. The MP4 file exists only on your device until you choose to upload it to YouTube or export it.

When to use WebGPU mode

  • You are on a shared or locked-down machine where you cannot install Ollama or desktop software.
  • You want to try Phantomline without installing anything. Open the studio in Chrome, generate a test video, decide if the tool is right for you.
  • You are on mobile and want to draft scripts or produce Shorts on the go.
  • You do not have an API key and do not want to set up Ollama. WebGPU is the zero-configuration, zero-cost default.
  • You are producing short-form content where the 1B model's quality ceiling is acceptable.

Limitations

  • Model quality. The 1B model is significantly less capable than the 8B model Ollama runs or the frontier models Claude provides. For long-form scripts with complex narratives, the output will be noticeably simpler.
  • Browser compatibility. WebGPU is not universally supported yet. Firefox users and some Safari configurations will fall back to the cloud endpoint instead of local inference.
  • TTS voice quality. Web Speech API voices vary wildly by operating system. macOS and iOS have good voices. Windows and Android voices tend to sound more robotic. The desktop install's Kokoro TTS produces consistently better narration.
  • No MusicGen. The browser mode uses bundled music tracks instead of generating custom music via MusicGen. The desktop install provides AI-generated music; the browser mode provides a curated selection of royalty-free ambient tracks.
  • Memory pressure. Running a 1 GB model in a browser tab alongside other tabs can cause memory pressure on devices with 8 GB RAM or less. Close other tabs during generation for best results.

FAQ

What is WebGPU video generation?

WebGPU is a browser API that gives web apps access to your device's GPU. Phantomline uses it to run Llama 3.2 1B directly in a browser tab for script generation, with no server involved, no install needed, and no API key required.

Does WebGPU work on mobile?

Yes. Chrome on Android supports WebGPU on flagship phones from the last 2-3 years. Safari on iOS 17+ has partial support. Mobile inference is slower than desktop, but it works for shorter scripts and Shorts content.

How big is the WebGPU model download?

About 1 GB for Llama 3.2 1B. The model downloads once and is cached in your browser's storage. After the first download, it loads from cache and works offline.

Is WebGPU as good as Ollama or Claude for scripts?

No. Llama 3.2 1B is a smaller model. Scripts are shorter, less detailed, and less nuanced. WebGPU is best for quick drafts, Shorts, and situations where you cannot install Ollama or do not have a cloud API key.

Which browsers support WebGPU?

Chrome 113+, Edge 113+, and Chrome on Android support WebGPU. Safari has partial support starting with iOS 17 and macOS Sonoma. Firefox is working on support but has not shipped it in stable releases yet.

Try it

Free tier needs no card. Open the studio See pricing


Related reading