Anthropic reveals what makes Claude Code Skills actually scale

Claude Code’s Skills system is turning into a kind of internal “standard library” for AI-assisted development—and Anthropic has been stress-testing it hard. In a detailed thread on X, Thariq (@trq212) shared what Anthropic has learned after building and using hundreds of skills inside Claude Code, along with the patterns that seem to hold up once a team moves past a handful of one-off helpers.

A key framing: despite the persistent “markdown file” mental model, skills are folders, not single documents. That matters because a skill can package scripts, assets, reference materials, and data—and Claude can discover and use those files as part of a workflow. Skills also support a broad set of configuration options, including dynamic hooks, which fuels some of the more interesting uses.

Skills tend to fall into recognizable categories

After cataloging internal usage, Anthropic found most skills cluster into a small set of recurring types. The best ones fit cleanly into one bucket; confusing skills tend to sprawl across several.

Library & API reference

These focus on “how to use X correctly,” especially for internal SDKs/CLIs or external libraries Claude commonly trips over. They often include reference snippets and a gotchas list.

Product verification

Skills that encode how to test and verify output, often via tools like playwright or tmux. Anthropic’s advice here is telling: invest deeply in verification skills, down to adding scripts that enforce assertions step-by-step or even recording what was tested.

Data fetching & analysis

Skills that connect an agent to internal analytics and monitoring: dashboard IDs, datasource UIDs, canonical joins, and repeatable query workflows.

Business process & team automation

One-command operational workflows (standups, ticket creation, weekly recaps), where logging prior runs can help the model stay consistent and reason about deltas.

Code scaffolding & templates

Boilerplate generation with room for natural-language requirements. The practical trick is to include templates and composable scripts so the agent spends effort on orchestration rather than recreating structure.

Code quality & review

Skills that enforce internal quality bars (style, testing practices) and can be paired with deterministic tooling. Anthropic notes these can be run via hooks or even GitHub Actions.

CI/CD & deployment

“Operational glue” for PR babysitting, deployments, rollouts, rollbacks, and cherry-picks, often chaining other skills.

Runbooks

Symptom → investigation → structured report, spanning multiple tools and query patterns. This is where a skill’s file/folder structure can act as progressive disclosure: detailed queries and playbooks live alongside the top-level instructions.

Infrastructure operations

Routine but risky procedures (cleanup, dependency workflows, cost investigations) where guardrails matter.

What makes a skill “good” in practice

Anthropic’s guidance reads less like prompt-crafting advice and more like maintainable engineering practice.

Don’t state the obvious. Claude already has defaults; skills are most valuable when they capture what’s non-obvious in an org: edge cases, house style, and “we do it this way because…”.
Build a “Gotchas” section. This is described as the highest-signal part of a skill, evolving as failures are discovered.
Treat the skill as a mini filesystem-based product: split references into separate files, ship templates under assets/, and include scripts for recurring operations.
Avoid railroading. Reusable skills get brittle when they overfit to a single workflow; flexibility matters.
Use setup patterns like config.json and, when needed, structured prompts via the AskUserQuestion tool.
Remember the description field is for the model, used during skill discovery to decide when to trigger—not as a human-facing summary.
For state, Anthropic points to ${CLAUDE_PLUGIN_DATA} as a stable location because data inside the skill directory may be wiped during upgrades.
On-demand hooks can make “only when needed” safety and workflow constraints possible (e.g., blocking destructive commands or restricting edits to a directory).

Distribution, marketplaces, and the creeping “skills sprawl” problem

Sharing is one of the main payoffs: skills can be checked into a repo under ./.claude/skills, or shipped as plugins via a marketplace (Anthropic links to the plugin marketplaces documentation). The thread notes a real tradeoff here: skills checked into a repo also add to the model’s context, so at scale an internal marketplace becomes a way to let teams opt into what they actually need.

Measuring what’s working is part of the loop. Anthropic logs skill usage via a PreToolUse hook, with an example gist shared in the thread.

The replies underline the rough edges that show up quickly in larger orgs: versioning, duplicate skills with overlapping scope, enterprise-scale registries, evaluation between competing skills, and sensitive key handling (a topic Thariq says should be added). There’s also interest in “lazy loading” or search-style discovery for skills; Thariq suggests that could itself be implemented as a skill.

A quick note on how the thread was written

In a follow-up, Thariq describes a process that will sound familiar to anyone doing AI-assisted writing seriously: Claude helped with research, categorization, and examples, but most of the generated copy was discarded. Claude did generate all visuals (with multiple iterations), while the actual writing remained an exploratory process meant to distill real insights.

Original source