Video editing in the browser has historically been an extreme case: too much data, too many operations per second, too slow on the CPU. With WebGPU, the picture changes. Processing 30 frames per second at 1080p requires moving and transforming 186 million pixels per second. The GPU is designed exactly for that.
In this last article of the series we implement a real-time video effects pipeline using WebGPU. We’ll cover the required pieces, the benchmarks, and, most importantly, the real limitations to keep in mind before taking this to production.
Stack: WebGPU + WebCodecs
To process video with WebGPU we need two APIs:
- WebGPU: frame processing on the GPU
- WebCodecs: efficient access to decoded video frames, without going through
<canvas>
WebCodecs is the piece that was missing before. Prior to WebCodecs, accessing pixel data from a video frame required drawing it onto a canvas, reading the pixels with getImageData(), processing them, and drawing again. Slow and with a lot of data duplication.
With WebCodecs we have direct access to decoded frames as VideoFrame, which we can send directly to the GPU.
// Required support check
if (!navigator.gpu || !window.VideoDecoder) {
console.warn("WebGPU or WebCodecs not available");
}
Pipeline architecture
Video file
↓
VideoDecoder (WebCodecs)
↓
VideoFrame (frame data)
↓
GPU Texture (copy frame to GPU)
↓
Compute/Render Shader (apply effects)
↓
Canvas (display result)
The key is that the frame travels from video memory to the GPU without going through the CPU for processing. The CPU only orchestrates.
Implementation: real-time grayscale effect
We start with a simple effect to understand the full pipeline.
WGSL shader
@group(0) @binding(0) var videoTexture: texture_external;
@group(0) @binding(1) var outputTexture: texture_storage_2d<rgba8unorm, write>;
@group(0) @binding(2) var texSampler: sampler;
@compute @workgroup_size(16, 16)
fn main(@builtin(global_invocation_id) id: vec3<u32>) {
let size = textureDimensions(outputTexture);
if (id.x >= size.x || id.y >= size.y) { return; }
let uv = vec2<f32>(f32(id.x) / f32(size.x), f32(id.y) / f32(size.y));
// Sample the external texture (VideoFrame)
let color = textureSampleBaseClampToEdge(videoTexture, texSampler, uv);
// Perceptual luminance (ITU-R BT.709 weights)
let gray = dot(color.rgb, vec3<f32>(0.2126, 0.7152, 0.0722));
textureStore(outputTexture, vec2<i32>(id.xy), vec4<f32>(gray, gray, gray, 1.0));
}
Processing pipeline
class VideoEffectPipeline {
constructor(device, canvas) {
this.device = device;
this.canvas = canvas;
this.context = canvas.getContext("webgpu");
const format = navigator.gpu.getPreferredCanvasFormat();
this.context.configure({ device, format });
}
async init(shaderCode) {
const { device } = this;
this.outputTexture = device.createTexture({
size: [this.canvas.width, this.canvas.height],
format: "rgba8unorm",
usage: GPUTextureUsage.STORAGE_BINDING | GPUTextureUsage.TEXTURE_BINDING,
});
this.sampler = device.createSampler({
minFilter: "linear",
magFilter: "linear",
});
const shaderModule = device.createShaderModule({ code: shaderCode });
this.pipeline = device.createComputePipeline({
layout: "auto",
compute: { module: shaderModule, entryPoint: "main" },
});
}
processFrame(videoFrame) {
const { device } = this;
// Import VideoFrame as GPU texture (zero-copy when possible)
const videoTexture = device.importExternalTexture({ source: videoFrame });
const bindGroup = device.createBindGroup({
layout: this.pipeline.getBindGroupLayout(0),
entries: [
{ binding: 0, resource: videoTexture },
{ binding: 1, resource: this.outputTexture.createView() },
{ binding: 2, resource: this.sampler },
],
});
const encoder = device.createCommandEncoder();
const pass = encoder.beginComputePass();
pass.setPipeline(this.pipeline);
pass.setBindGroup(0, bindGroup);
pass.dispatchWorkgroups(
Math.ceil(this.canvas.width / 16),
Math.ceil(this.canvas.height / 16)
);
pass.end();
// Render pass to display on canvas
const renderPass = encoder.beginRenderPass({
colorAttachments: [
{
view: this.context.getCurrentTexture().createView(),
loadOp: "clear",
storeOp: "store",
},
],
});
// ... blit outputTexture to canvas
renderPass.end();
device.queue.submit([encoder.finish()]);
videoFrame.close(); // Free frame memory
}
}
Playback loop
async function startVideoProcessing(videoFile, canvasElement) {
const adapter = await navigator.gpu.requestAdapter();
const device = await adapter.requestDevice();
const pipeline = new VideoEffectPipeline(device, canvasElement);
await pipeline.init(GRAYSCALE_SHADER);
const decoder = new VideoDecoder({
output: frame => {
pipeline.processFrame(frame);
},
error: e => console.error("Decode error:", e),
});
decoder.configure({
codec: "avc1.42001f", // H.264 baseline
hardwareAcceleration: "prefer-hardware",
});
// Read chunks from the file and send them to the decoder
const reader = videoFile.stream().getReader();
// ... send chunks to decoder
}
Benchmarks: CPU vs WebGPU for video effects
Applying grayscale effect to 1080p/30fps video:
| Method | Time per frame | Sustainable FPS | CPU usage |
|---|---|---|---|
Canvas 2D (getImageData) | ~45ms | ~8 fps | 85% |
| WebGL (fragment shader) | ~4ms | ~60 fps | 15% |
| WebGPU (compute shader) | ~2ms | ~60 fps | 8% |
The benefit of WebGPU over WebGL is not just speed: the CPU is freed up for other tasks (audio, UI, application logic).
For more complex effects (color grading with LUT, particle effects, multi-layer compositing), the gap between WebGL and WebGPU widens further.
Effects we can apply with this pattern
The pattern is always the same: a shader that receives the frame and writes the result. Some effects that can be implemented:
Color grading
// Color curves with LUT (Look-Up Table)
let lut = textureSample(lutTexture, lutSampler, color.rgb);
textureStore(output, coords, vec4<f32>(lut.rgb, color.a));
Chroma key (green screen)
let green = vec3<f32>(0.0, 1.0, 0.0);
let diff = distance(color.rgb, green);
let alpha = select(0.0, 1.0, diff > 0.3);
textureStore(output, coords, vec4<f32>(color.rgb, alpha));
Temporal motion blur
Average of N previous frames, which requires maintaining a frame buffer on the GPU.
Real limitations
WebCodecs and cross-origin
VideoFrame objects imported from cross-origin videos have restrictions. In production, videos must be on the same origin or with CORS configured.
Supported codecs
H.264 and VP8/VP9 have broad support. AV1 varies by device and operating system. We can check VideoDecoder.isConfigSupported() before configuring.
const support = await VideoDecoder.isConfigSupported({
codec: "av01.0.04M.08",
hardwareAcceleration: "prefer-hardware",
});
if (!support.supported) {
// Fallback to H.264
}
Audio synchronization
WebGPU processes video with no knowledge of the audio. Audio-video synchronization is our responsibility. We need to measure the processing time of each frame and adjust the presentation timing accordingly.
Browser support
WebCodecs and WebGPU are available in Chrome and Edge. Safari has partial support. Firefox has neither in stable versions.
GPU memory
Long videos or multiple simultaneous streams can exhaust GPU memory. Release VideoFrame objects with frame.close() as soon as possible.
When does it make sense in production?
The WebGPU + WebCodecs combination is mature for:
- Web video editors (Clipchamp, CapCut Web) in Chromium browsers
- Video conferencing tools with background effects (blur, virtual background)
- Video processing before upload (compression, watermark, cropping)
- Video viewers with real-time color effects
It’s not ready for:
- Applications that need Firefox or Safari support
- Editing videos above 4K without explicit memory management
Conclusion
Video processing in the browser with WebGPU + WebCodecs is real and it works. With 2ms per frame at 1080p, there’s room for complex effects within the 16ms budget of a 60fps frame.
The ecosystem is still maturing: limited support in Firefox and Safari, APIs still evolving, sparse documentation. But the foundations are solid and the use cases are clear.
This closes the series. We’ve covered WebGPU from the fundamentals to three real use cases: image processing, ML in the browser, and video editing. In all of them, the pattern is the same: move the heavy work to the GPU and let the CPU orchestrate.