WebPerf Snippets and Agent SKILLs: deterministic audits with Chrome DevTools MCP

In the previous article about WebPerf Snippets + WebMCP I mentioned an alternative that works today, without waiting for browser support: Agent SKILLs.

The basic idea is straightforward: turn performance analysis snippets into capabilities an AI agent can use autonomously, using Chrome DevTools MCP as the execution layer in the browser. But there’s a design decision in the implementation worth mentioning, because it changes what the agent can actually do.

The limit of text-only SKILLs

When building a SKILL for an AI agent, the most direct approach is to put everything in a markdown file: instructions, thresholds, examples… and the JavaScript code inline. The agent reads that file when activating the SKILL and has all the context.

The problem appears when it scales. With 47 snippets, including all the JavaScript in the markdown means thousands of tokens consumed on every invocation, even if the agent only needs to measure LCP. All the code travels into context even when unused.

But there’s a subtler problem: if the JavaScript is just text in a markdown file, nothing guarantees the agent will execute it exactly as written. LLMs can “optimize”, reinterpret, or adapt code rather than copy it literally. For performance measurement, that’s exactly what we don’t want.

Scripts in the SKILL scope

The approach in WebPerf Snippets is different: the JavaScript is not in the SKILL.md. It lives in .js files in the same directory as the SKILL. When the agent needs to measure LCP, it doesn’t generate code — it reads scripts/LCP.js and executes it via Chrome DevTools MCP.

skills/webperf-core-web-vitals/
├── SKILL.md          # Instructions, thresholds, and script index
└── scripts/
    ├── LCP.js
    ├── CLS.js
    ├── INP.js
    └── ...

The SKILL.md acts as an index and set of instructions. The scripts are the actual tools. The agent reads a script when it needs it; not before.

Two concrete objectives

Deterministic tools

When the agent receives the instruction to measure LCP, it doesn’t improvise JavaScript. It reads scripts/LCP.js — the same script we’ve created, tested, and validated — and executes it. The result is consistent across sessions, across agents, and across models.

This matters especially in performance diagnostics. A measurement that varies depending on how the LLM interprets the code that day isn’t a reliable measurement. A fixed script, executed directly, is.

Token savings

The SKILL.md contains instructions and metadata. The scripts are separate files the agent only loads when needed. If the agent is analyzing CLS and not LCP, scripts/LCP.js never enters the context.

With 47 snippets distributed across 6 SKILLs, the difference between “everything inline” and “scripts as files” is significant. The agent transfers tokens only for what it uses.

The SKILL structure

The organization follows the same categories as the original snippets:

SKILL	Snippets	What it measures
`webperf-core-web-vitals`	7	LCP, CLS, INP
`webperf-loading`	28	TTFB, FCP, scripts, fonts
`webperf-interaction`	8	LoAF, LongTask, scroll
`webperf-media`	3	Images, video
`webperf-resources`	1	Bandwidth
`webperf`	—	Meta-SKILL: central router

Each SKILL has its own SKILL.md with:

Index of available scripts
Reference thresholds (what values are good, bad, or need investigation)
Result interpretation instructions

This makes maintaining the SKILLs straightforward, both for updating thresholds and for updating scripts or adding new ones.

It occurs to me that we can improve the scripts by optimizing the reports they generate with console.{log,info,error,table}, to make the console output easier to work with — but we can delegate that work to the LLM, since it can do it better and adapt the output to whatever is needed: a markdown report, a PDF, a Slack alert, etc.

Decision trees and workflows

The most interesting part isn’t the individual scripts, but the workflows that orchestrate multiple scripts based on the results obtained.

A real example: if TTFB exceeds 600ms, the agent doesn’t stop there. The workflow tells it to run TTFB-Sub-Parts.js to break down the time into DNS, TCP connection, TLS negotiation, and server time. With that breakdown, the recommendation is specific rather than generic.

TTFB > 600ms
  → run TTFB-Sub-Parts.js
  → if DNS > 100ms → DNS/CDN issue
  → if connection > 200ms → network/server distance issue
  → if server > 300ms → backend issue

The SKILLs include 8 main workflows with 16 decision trees of this type. The agent knows what to run next based on what it finds, without whoever is doing the analysis having to direct it step by step.

Chrome DevTools MCP as the execution layer

For all of this to work, the agent needs to be able to navigate to a page and execute JavaScript in it. That’s exactly what Chrome DevTools MCP provides.

The complete flow with an agent like Claude Code:

1. The agent navigates to the URL with `navigate_page`
2. Reads the relevant script from the SKILL (scripts/LCP.js)
3. Executes it in the browser with `evaluate_script`
4. Captures the console output
5. Compares the result against the thresholds in `SKILL.md`
6. If a threshold is exceeded, activates the corresponding workflow

The agent doesn’t improvise JavaScript. It doesn’t interpret a visual interface. It doesn’t simulate clicks. It reads a predefined script, executes it, reads the result. Deterministic.

Installation

The fastest way is with the Skills CLI, which installs the SKILLs directly from the repository into the global ~/.claude/skills/ environment:

npx skills add nucliweb/webperf-snippets

If you prefer local (current project) or global installation from a cloned repository:

# Local (current project → .claude/skills/)
npm run install-skills

# Global (all projects → ~/.claude/skills/)
npm run install-global

Once installed, the agent discovers them automatically when needed. No additional configuration required.

In practice

In my performance audits, the most repeated flow is: navigate to the page, measure Core Web Vitals, identify which one is out of threshold, go deeper on that specific metric. With WebPerf SKILLs, that flow goes from being a series of manual instructions to a capability the agent executes autonomously. I’m looking forward to trying this flow with OpenClaw.

It’s not about the AI “understanding” web performance better than whoever is doing the audit. It’s about the AI having the right tools to collect data reliably, so whoever is analyzing can focus on interpreting, deciding, and prioritizing by impact — not on running repetitive scripts.

Real case: detecting the Preload + Async anti-pattern

With Claude Code, Chrome DevTools MCP, and the WebPerf SKILLs installed globally, I ran this prompt:

Check if the page https://cocunat.com/es-es/products/clinical-beauty-filler-duo
has resources with Preload + Async/Defer

Claude Code screenshot showing the prompt to detect resources with Preload and Async/Defer on the Cocunat product page

The agent navigated to the page, selected the webperf-loading SKILL, read the corresponding script, and executed it in the browser. The result: 7 anti-patterns found, all from the same origin.

Report generated by Claude Code showing the 7 Preload + Async anti-patterns detected on the Cocunat product page, all from the web.cmp.usercentrics.eu domain, with proposed fix options

All 7 scripts from the Usercentrics consent banner (web.cmp.usercentrics.eu) are loaded with async but also have rel="preload". A combination that seems reasonable — discover the resource early and load it without blocking — but produces the opposite effect.

async lowers the script priority to Lowest/Low. rel="preload" raises it to Medium/High. The result is a priority escalation that puts these third-party scripts competing with critical resources: the LCP image, fonts, CSS. The browser discovers them early and gives them priority they don’t deserve.

The agent identified the problem, explained the mechanism, and proposed two fix options:

Option 1 (recommended): Remove the preloads and let the async scripts load at their natural priority.

<!-- ❌ Remove -->
<link rel="preload" href="WebSdk.lib.91f0d3cd.js" as="script" />

<!-- ✅ Keep only this -->
<script src="WebSdk.lib.91f0d3cd.js" async></script>

Option 2: Keep the preload adding fetchpriority="low", to preserve early discovery without priority escalation.

<link rel="preload" href="WebSdk.lib.91f0d3cd.js" as="script" />
<script src="WebSdk.lib.91f0d3cd.js" async fetchpriority="low"></script>

It also added a relevant note: since Usercentrics is a third-party tool, the fix likely needs to be reported to their support or configured via their dashboard or tag manager integration.

All of this from a single-line prompt. The agent didn’t generate ad hoc JavaScript, didn’t try to interpret the HTML visually, didn’t improvise detection logic. It read the SKILL script, executed it, and delivered an actionable diagnosis.

Keep in mind this isn’t a silver bullet. For first visits, it may make sense to prioritize the consent banner scripts to anticipate rendering — in many cases the CMP modal is the LCP element on first visits. In that case, a Server-Side Conditional Loading strategy could be the right approach. I cover this in Optimizing consent scripts for Core Web Vitals. This shows that with AI, we can speed up our analysis and improve our reports — but we need to know what to measure, how to interpret it, what to prioritize, and what the best strategy is for each product.

Conclusion

The distinction between a “text SKILL” and a “SKILL with scripts in scope” seems small, but it changes what the system can do: from an agent that improvises code from instructions, to an agent that executes validated tools consistently. Determinism in measurement; savings in tokens; real autonomy in analysis.

WebMCP and Agent SKILLs are two approaches to the same goal. One waits for the standard to mature in the browser; the other works today with the tools we already have.