Agent Skills 2026: Teaching AI Agents New Tricks with SKILL.md

Posted on: 5/27/2026 1:10:00 AM

You have a brilliant AI agent. It reasons well, writes clean code, calls tools fluently. But ask it to "fill this PDF form using our company template", "export an Excel report with the pivot table our finance team expects", or "write a commit message following our internal convention", and it stumbles — not because it lacks intelligence, but because it doesn't know your specific procedures. That knowledge isn't in its training data, and you don't want to cram 5,000 lines of instructions into every prompt.

This is exactly the gap Agent Skills was built to fill. Introduced by Anthropic in late 2025 and quickly adopted as an open standard by dozens of tools, Agent Skills let you package "procedural knowledge" into a simple folder centered on a SKILL.md file — and the agent loads it into context only when needed. This article dissects the entire mechanism underneath.

TL;DR

Agent Skills = a folder containing a SKILL.md file (metadata + instructions) plus bundled scripts and resources. The core mechanism is progressive disclosure: the agent loads only the name + description at startup (~tens of tokens per skill), loads the full instructions when a task matches, and reads detailed resources only when truly required. As a result, the knowledge attached to an agent is effectively unbounded without bloating the context window.

1. The problem: context window is a scarce resource

A modern LLM may have a context window of hundreds of thousands of tokens, but that's no excuse to stuff it. Three real constraints apply:

  • Cost & latency: Every token in context costs money and slows inference. Loading 50 detailed procedures into every request is wasteful when 49 are irrelevant.
  • Context rot: When context grows too long, the model's attention dilutes — important information gets buried among thousands of lines of unused instructions, degrading reasoning quality.
  • Maintainability: Organizational procedures change constantly. You want to edit in one place, not patch prompts scattered across the codebase.

Previous solutions each had weaknesses: fine-tuning is expensive and slow to update; stuffing the system prompt bloats context; RAG suits "fact lookup" more than "executing multi-step procedures". Agent Skills takes a different path — and the key is progressive disclosure.

2. What are Agent Skills?

Anthropic's concise definition: "A skill is a folder containing a SKILL.md file that packages instructions, scripts, and resources that give agents additional capabilities." It's a lightweight, open format not tied to any specific framework.

The minimal structure of a skill is just one file. More complex ones add subfolders:

pdf-skill/
├── SKILL.md          # Required: metadata + core instructions
├── reference.md      # Optional: detailed docs, loaded on demand
├── forms.md          # Optional: dedicated guidance for form filling
└── scripts/
    └── fill_form.py  # Optional: deterministic executable code

The heart of every skill is SKILL.md, with two parts: YAML frontmatter (metadata) and the body (Markdown instructions).

---
name: pdf-processing
description: Extract, fill, and edit PDF files — use when the user needs
  to read a form, fill data into a PDF, or split/merge pages.
---

# PDF Processing

## When to use this skill
When the task involves reading PDF content, filling forms, or page operations.

## Form-filling workflow
1. Run `scripts/fill_form.py --inspect <file>` to list the fields.
2. Map the user's data to the corresponding field names.
3. Call `scripts/fill_form.py --fill ...` to write values.

See `forms.md` for handling checkboxes and digital signatures.

Tip: write the description carefully

The description field is what decides whether a skill gets activated. Write it as an answer to "when should the agent use me?": state the task clearly and the keywords a user might mention. A vague description ("processes documents") will cause the skill to be skipped or triggered wrongly.

3. Progressive Disclosure — the brain of the mechanism

Anthropic likens Agent Skills to a well-organized manual: it starts with a table of contents, then specific chapters, and finally a detailed appendix. The agent only flips to the part it needs. This happens across three levels:

flowchart TD
    A[Agent startup] --> B["LEVEL 1 — Discovery
Load name + description
of EVERY skill into system prompt"] B --> C{Does the user task
match a skill?} C -->|No match| D[Skip, no extra tokens spent] C -->|Match| E["LEVEL 2 — Activation
Read the full SKILL.md
into context"] E --> F{Need extra
details?} F -->|No| G[Execute per instructions] F -->|Yes| H["LEVEL 3+ — Execution
Read reference.md, forms.md...
or run scripts via Bash"] H --> G style B fill:#e94560,stroke:#fff,color:#fff style E fill:#2c3e50,stroke:#fff,color:#fff style H fill:#16213e,stroke:#fff,color:#fff style G fill:#f8f9fa,stroke:#e94560,color:#2c3e50
The three levels of progressive disclosure: pay tokens only for what you actually use

Level 1 — Discovery

When the agent starts, it loads only each skill's name and description into the system prompt — just enough to know "this skill exists and is for this purpose". Per Anthropic's measurements, the median cost is about ~80 tokens per skill (ranging from ~55 tokens for the webapp-testing skill to ~235 tokens for the xlsx skill). This lets you install hundreds of skills while keeping the system prompt lean.

Level 2 — Activation

When the user task matches a skill's description, the agent reads the full contents of SKILL.md into context. This is when it gets the step-by-step procedure, conventions, and caveats.

Level 3+ — Execution

During execution, if SKILL.md references other files (reference.md, forms.md...), the agent reads them only when it actually reaches that part. More importantly: an agent with a filesystem and code-execution tools doesn't need to read the entire skill into context — it can run a script to process data without ever loading the raw data into the context window. This is why Anthropic calls the knowledge attached to a skill "effectively unbounded".

~80median tokens per skill at Discovery level
3progressive disclosure levels (and beyond)
2required frontmatter fields: name + description
attached knowledge capacity (effectively unbounded)

4. The real power: bundling code for deterministic reliability

An often-missed point: a skill doesn't just contain instruction text — it can package executable scripts (typically Python) that the agent invokes via the Bash tool. This is the key difference from "pure prompt engineering".

Take Anthropic's PDF skill: instead of asking the model to "comprehend" the binary structure of a PDF (token-heavy and error-prone), the skill bundles a Python script that extracts the form's field list without loading the PDF into context. Anthropic describes this as "the deterministic reliability that only code can provide".

Golden rule: if code can do it, don't make the LLM "guess"

Parsing XLSX, validating JSON against a schema, generating checksums, calling an API with complex rate limits... should all be deterministic code in scripts/. Let the LLM handle reasoning and orchestration, and hand the "precise repetitive mechanics" to scripts. This is also the spirit of the CodeAct pattern — agents act by writing and running code rather than calling discrete JSON tools.

5. Composability — multiple skills working together

Because the metadata of all installed skills is loaded simultaneously at startup, the agent can trigger multiple skills at once based on task context. For example, a request to "read data from an Excel file then export a signed PDF report" might simultaneously activate the xlsx and pdf skills, each contributing its part of the workflow.

flowchart LR
    U["Request:
Read Excel → export PDF report"] --> AG((Agent)) AG -.activates.-> S1["Skill: xlsx
read & pivot data"] AG -.activates.-> S2["Skill: pdf
render & fill form"] AG -.activates.-> S3["Skill: brand-style
brand conventions"] S1 --> R[Complete result] S2 --> R S3 --> R style AG fill:#e94560,stroke:#fff,color:#fff style S1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style S2 fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style S3 fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style R fill:#2c3e50,stroke:#fff,color:#fff
Several independent skills cooperate on one task — each skill is a reusable "skill module"

6. Skills, MCP, RAG, or Fine-tuning — which to pick?

This is the most practical question. These four mechanisms are not mutually exclusive — they solve different problems and are often used together.

CriterionAgent SkillsMCPRAGFine-tuning
What it solvesTeaches procedures & how-toConnects external tools/dataRetrieves factual knowledgeChanges the model's core behavior
Input formSKILL.md file + scriptsServer exposing tools/resourcesVector DB + documentsTraining dataset
UpdatingEdit a file, instantRedeploy the serverRe-index documentsRetrain (expensive, slow)
Context costVery low (progressive)Medium (tool schema)Per chunk loadedZero (baked into weights)
When to useRepeated procedures with custom conventionsNeed to read/write external systemsLarge, changing knowledge baseNeed an ingrained style/format

The relationship between Skills and MCP

Anthropic states clearly that Skills complement MCP: "Skills teach agents more complex workflows that involve external tools and software." In other words — MCP provides the arms (connections to databases, APIs, systems), while Skills provide the playbook (knowing in what order to use those arms to follow your procedure). A skill can absolutely instruct the agent which MCP server to call at which step.

7. From a feature to an open standard

What makes Agent Skills noteworthy isn't just the design, but the speed at which it became an industry-wide standard — echoing the very trajectory MCP took before it.

Oct 16, 2025
Anthropic introduces Agent Skills, supported on Claude.ai, Claude Code, the Agent SDK, and the Developer Platform.
Dec 18, 2025
Anthropic publishes Agent Skills as an open standard, with a SKILL.md specification any tool can read.
Within 48 hours
Microsoft integrates it into VS Code; OpenAI adds it to both ChatGPT and the Codex CLI.
March 2026
Over 32 tools from different vendors — Gemini CLI (Google), Junie (JetBrains), Kiro (AWS), Goose (Block)... — all read the same SKILL.md format from the same directory structure.
48hfor Microsoft & OpenAI to integrate after the standard opened
32+tools reading the same SKILL.md format (Mar 2026)
1single format, no vendor lock-in

8. Security: a skill is code — treat it like code

A skill can contain executable scripts and instruct the agent to reach out externally — meaning it carries exactly the risk of installing third-party software. Anthropic's recommendation is explicit: "We recommend installing skills only from trusted sources."

Checklist before installing a skill

  • Read all files in the skill before installing — especially the scripts.
  • Inspect dependencies & bundled resources: which libraries does the script pull, and from where?
  • Beware instructions that connect to the network: if SKILL.md tells the agent to access unfamiliar external URLs/sources, that's a red flag.
  • Watch for the "lethal trifecta": a skill that combines access to sensitive data + reading untrusted content + the ability to exfiltrate is the exact recipe for prompt injection and data leakage.

In other words, put skills through the same security review as any dependency in your codebase: pin versions, audit contents, and constrain the privileges of the execution environment ().

9. When you should (and shouldn't) write a Skill

Write a Skill when...

  • There's a repeated multi-step procedure you keep re-explaining to the agent.
  • Your organization has custom conventions (report formats, code conventions, brand voice) the model can't know on its own.
  • There's work better handed to deterministic code than left to the LLM to "guess".

You don't need a Skill when...

  • The task is a one-off fact query — RAG or search fits better.
  • You only need to connect to an external system with no special procedure — an MCP server is enough.
  • The instruction is so short a single line in the prompt suffices — don't create a skill for the trivial.

Conclusion

Agent Skills don't try to make the model "smarter" — they make it more useful in the real world by teaching it how to follow your procedures. The beauty of the design lies in its simplicity: a folder, a Markdown file, and a progressive-disclosure principle that keeps attached knowledge nearly unbounded while context stays lean.

Together with MCP (the arms reaching outward) and RAG (the lookup memory), Skills complete the trio for building production agents: knowing the procedure, having the tools, and remembering the knowledge. If you run agents and find yourself repeating the same instructions over and over, that's the signal: it's time to write your first SKILL.md.