Video editing in the browser with WebGPU

Video editing in the browser has historically been an extreme case: too much data, too many operations per second, too slow on the CPU. With WebGPU, the picture changes. Processing 30 frames per second at 1080p requires moving and transforming 186 million pixels per second. The GPU is designed exactly for that.

In this last article of the series we implement a real-time video effects pipeline using WebGPU. We’ll cover the required pieces, the benchmarks, and, most importantly, the real limitations to keep in mind before taking this to production.

Stack: WebGPU + WebCodecs

To process video with WebGPU we need two APIs:

WebGPU: frame processing on the GPU
WebCodecs: efficient access to decoded video frames, without going through <canvas>

WebCodecs is the piece that was missing before. Prior to WebCodecs, accessing pixel data from a video frame required drawing it onto a canvas, reading the pixels with getImageData(), processing them, and drawing again. Slow and with a lot of data duplication.

With WebCodecs we have direct access to decoded frames as VideoFrame, which we can send directly to the GPU.

// Required support check
if (!navigator.gpu || !window.VideoDecoder) {
  console.warn("WebGPU or WebCodecs not available");
}

Pipeline architecture

Video file
      ↓
VideoDecoder (WebCodecs)
      ↓
VideoFrame (frame data)
      ↓
GPU Texture (copy frame to GPU)
      ↓
Compute/Render Shader (apply effects)
      ↓
Canvas (display result)

The key is that the frame travels from video memory to the GPU without going through the CPU for processing. The CPU only orchestrates.

Implementation: real-time grayscale effect

We start with a simple effect to understand the full pipeline.

WGSL shader

@group(0) @binding(0) var videoTexture: texture_external;
@group(0) @binding(1) var outputTexture: texture_storage_2d<rgba8unorm, write>;
@group(0) @binding(2) var texSampler: sampler;

@compute @workgroup_size(16, 16)
fn main(@builtin(global_invocation_id) id: vec3<u32>) {
  let size = textureDimensions(outputTexture);
  if (id.x >= size.x || id.y >= size.y) { return; }

  let uv = vec2<f32>(f32(id.x) / f32(size.x), f32(id.y) / f32(size.y));

  // Sample the external texture (VideoFrame)
  let color = textureSampleBaseClampToEdge(videoTexture, texSampler, uv);

  // Perceptual luminance (ITU-R BT.709 weights)
  let gray = dot(color.rgb, vec3<f32>(0.2126, 0.7152, 0.0722));

  textureStore(outputTexture, vec2<i32>(id.xy), vec4<f32>(gray, gray, gray, 1.0));
}

Processing pipeline

class VideoEffectPipeline {
  constructor(device, canvas) {
    this.device = device;
    this.canvas = canvas;
    this.context = canvas.getContext("webgpu");

    const format = navigator.gpu.getPreferredCanvasFormat();
    this.context.configure({ device, format });
  }

  async init(shaderCode) {
    const { device } = this;

    this.outputTexture = device.createTexture({
      size: [this.canvas.width, this.canvas.height],
      format: "rgba8unorm",
      usage: GPUTextureUsage.STORAGE_BINDING | GPUTextureUsage.TEXTURE_BINDING,
    });

    this.sampler = device.createSampler({
      minFilter: "linear",
      magFilter: "linear",
    });

    const shaderModule = device.createShaderModule({ code: shaderCode });

    this.pipeline = device.createComputePipeline({
      layout: "auto",
      compute: { module: shaderModule, entryPoint: "main" },
    });
  }

  processFrame(videoFrame) {
    const { device } = this;

    // Import VideoFrame as GPU texture (zero-copy when possible)
    const videoTexture = device.importExternalTexture({ source: videoFrame });

    const bindGroup = device.createBindGroup({
      layout: this.pipeline.getBindGroupLayout(0),
      entries: [
        { binding: 0, resource: videoTexture },
        { binding: 1, resource: this.outputTexture.createView() },
        { binding: 2, resource: this.sampler },
      ],
    });

    const encoder = device.createCommandEncoder();
    const pass = encoder.beginComputePass();
    pass.setPipeline(this.pipeline);
    pass.setBindGroup(0, bindGroup);
    pass.dispatchWorkgroups(
      Math.ceil(this.canvas.width / 16),
      Math.ceil(this.canvas.height / 16)
    );
    pass.end();

    // Render pass to display on canvas
    const renderPass = encoder.beginRenderPass({
      colorAttachments: [
        {
          view: this.context.getCurrentTexture().createView(),
          loadOp: "clear",
          storeOp: "store",
        },
      ],
    });
    // ... blit outputTexture to canvas
    renderPass.end();

    device.queue.submit([encoder.finish()]);
    videoFrame.close(); // Free frame memory
  }
}

Playback loop

async function startVideoProcessing(videoFile, canvasElement) {
  const adapter = await navigator.gpu.requestAdapter();
  const device = await adapter.requestDevice();

  const pipeline = new VideoEffectPipeline(device, canvasElement);
  await pipeline.init(GRAYSCALE_SHADER);

  const decoder = new VideoDecoder({
    output: frame => {
      pipeline.processFrame(frame);
    },
    error: e => console.error("Decode error:", e),
  });

  decoder.configure({
    codec: "avc1.42001f", // H.264 baseline
    hardwareAcceleration: "prefer-hardware",
  });

  // Read chunks from the file and send them to the decoder
  const reader = videoFile.stream().getReader();
  // ... send chunks to decoder
}

Benchmarks: CPU vs WebGPU for video effects

Applying grayscale effect to 1080p/30fps video:

Method	Time per frame	Sustainable FPS	CPU usage
Canvas 2D (`getImageData`)	~45ms	~8 fps	85%
WebGL (fragment shader)	~4ms	~60 fps	15%
WebGPU (compute shader)	~2ms	~60 fps	8%

The benefit of WebGPU over WebGL is not just speed: the CPU is freed up for other tasks (audio, UI, application logic).

For more complex effects (color grading with LUT, particle effects, multi-layer compositing), the gap between WebGL and WebGPU widens further.

Effects we can apply with this pattern

The pattern is always the same: a shader that receives the frame and writes the result. Some effects that can be implemented:

Color grading

// Color curves with LUT (Look-Up Table)
let lut = textureSample(lutTexture, lutSampler, color.rgb);
textureStore(output, coords, vec4<f32>(lut.rgb, color.a));

Chroma key (green screen)

let green = vec3<f32>(0.0, 1.0, 0.0);
let diff = distance(color.rgb, green);
let alpha = select(0.0, 1.0, diff > 0.3);
textureStore(output, coords, vec4<f32>(color.rgb, alpha));

Temporal motion blur

Average of N previous frames, which requires maintaining a frame buffer on the GPU.

Real limitations

WebCodecs and cross-origin

VideoFrame objects imported from cross-origin videos have restrictions. In production, videos must be on the same origin or with CORS configured.

Supported codecs

H.264 and VP8/VP9 have broad support. AV1 varies by device and operating system. We can check VideoDecoder.isConfigSupported() before configuring.

const support = await VideoDecoder.isConfigSupported({
  codec: "av01.0.04M.08",
  hardwareAcceleration: "prefer-hardware",
});

if (!support.supported) {
  // Fallback to H.264
}

Audio synchronization

WebGPU processes video with no knowledge of the audio. Audio-video synchronization is our responsibility. We need to measure the processing time of each frame and adjust the presentation timing accordingly.

Browser support

WebCodecs and WebGPU are available in Chrome and Edge. Safari has partial support. Firefox has neither in stable versions.

GPU memory

Long videos or multiple simultaneous streams can exhaust GPU memory. Release VideoFrame objects with frame.close() as soon as possible.

When does it make sense in production?

The WebGPU + WebCodecs combination is mature for:

Web video editors (Clipchamp, CapCut Web) in Chromium browsers
Video conferencing tools with background effects (blur, virtual background)
Video processing before upload (compression, watermark, cropping)
Video viewers with real-time color effects

It’s not ready for:

Applications that need Firefox or Safari support
Editing videos above 4K without explicit memory management

The series

WebGPU: the browser’s new performance engine
Image processing in the browser with WebGPU
ML in the browser with WebGPU: TensorFlow.js with the WebGPU backend, real-time inference.
Video editing in the browser with WebGPU — you are here

Conclusion

Video processing in the browser with WebGPU + WebCodecs is real and it works. With 2ms per frame at 1080p, there’s room for complex effects within the 16ms budget of a 60fps frame.

The ecosystem is still maturing: limited support in Firefox and Safari, APIs still evolving, sparse documentation. But the foundations are solid and the use cases are clear.

This closes the series. We’ve covered WebGPU from the fundamentals to three real use cases: image processing, ML in the browser, and video editing. In all of them, the pattern is the same: move the heavy work to the GPU and let the CPU orchestrate.