By 2026, agentic coding has moved from novelty to default. AI coding agents (Claude Code, Cursor, Codex CLI, Gemini CLI, Antigravity) ship production code under human direction. The unit of leverage is no longer the prompt — it’s the Skill.

This post is the working guide to Claude Code Skills: what they are, the SKILL.md format, the multi-session patterns that produce reliable code, and the agentic patterns I keep reaching for.

What a Skill is

A Skill is a folder with a SKILL.md file and optional supporting assets (scripts, templates, reference docs). When Claude Code encounters a task that matches the Skill’s description, it loads the Skill and follows the playbook.

.claude/skills/release-notes/
├── SKILL.md
├── template.md
└── scripts/
    └── git-log-since-tag.sh

A SKILL.md:

---
name: release-notes
description: Generate release notes from git history since the last tag.
---

# Release Notes

When asked to generate release notes for a release:

1. Run `scripts/git-log-since-tag.sh` to get the commit log.
2. Group commits by type using Conventional Commits (`feat`, `fix`, `chore`, ...).
3. Use `template.md` as the markdown structure.
4. Highlight breaking changes in a **Breaking Changes** section.
5. Include a **Migration** section if any breaking changes exist.
6. Save to `CHANGELOG.md` and open a PR titled `chore: release notes for vX.Y.Z`.

## Examples

For a typical patch release, the output should look like:

\`\`\`
## v1.4.2 — 2026-04-29

### Fixes
- Auth: handle empty `Authorization` header safely (#412)
\`\`\`

## Notes

- Use the project's existing changelog conventions if `CHANGELOG.md` already exists.
- Don't include merge commits.

That’s a complete Skill. Frontmatter (name, description), playbook in markdown, optional scripts and templates. The description is what Claude pattern-matches against the user’s request.

Why Skills, not just better prompts

Three reasons Skills outperform “write a better system prompt”:

  1. Composability. A coding agent can have dozens of skills loaded. Each is small and focused. A monolithic system prompt becomes hard to maintain.
  2. Versioning. Skills are files in a repo. They diff. They review. They roll forward and back like code.
  3. Scope. Skills load only when the task matches. Token budget stays small. The model sees one playbook at a time.

If you’ve been stuffing instructions into your CLAUDE.md, a skill is probably the cleaner answer.

Anatomy of a good Skill

A Skill that earns its place has:

  • A description that matches user phrasings. “Generate release notes” matches more queries than “produce a CHANGELOG.md from git log.”
  • An explicit step list. Numbered. Verifiable.
  • Examples of the output. Models follow examples better than they follow rules.
  • Notes on edge cases. “If CHANGELOG.md doesn’t exist, create it. If it has a different format, follow that format.”
  • References to scripts/templates where useful. Don’t make Claude reinvent the wheel.

Three rules I follow:

  • Skills should be specific, not generic. “Refactor code” is too broad. “Migrate a Django ModelForm to a Pydantic Schema” is right.
  • Skills should be self-contained. Don’t reference files outside the skill directory unless you have to.
  • Skills should fail loudly. If a precondition isn’t met (no CHANGELOG.md template, dirty git tree), the skill says so instead of guessing.

Skills vs MCP servers — the divide

I see these conflated. They’re complementary:

SkillMCP server
What it isInstructions + assetsCode that exposes tools/resources
When loadedWhen the description matchesAlways available
Where it runsIn Claude’s contextAs a separate process
Best forWorkflows, playbooks, repeatable tasksIntegrating external systems (DB, APIs, files)

A typical setup uses both. Skills describe the playbook (“how do I generate release notes?”). MCP servers integrate the systems (“read my GitHub PRs”, “query my Postgres”). See Model Context Protocol Explained .

Skills you should have on day one

These are the skills I install in every new repo:

1. code-review

---
name: code-review
description: Review pull requests for correctness, style, security, and tests.
---
Open the PR. For each changed file, evaluate: (a) does it solve the problem
described in the PR; (b) any obvious bugs; (c) tests cover the change; (d)
naming and structure consistent with the rest of the repo. Output a markdown
review with **Must change** and **Nice to have** sections.

2. git-commit

---
name: git-commit
description: Stage and commit changes with a Conventional Commits message.
---
Run `git status` and `git diff` to see changes. Group by logical scope. For
each group, stage the files and commit with `<type>(<scope>): <message>` where
`<type>` is one of feat/fix/chore/docs/refactor/test/perf.

3. repo-bootstrap

A skill that knows your house’s standard project layout, picks the right template, scaffolds it, opens an editor. See Modern Python Tooling 2026 for a Python equivalent.

4. db-migration

---
name: db-migration
description: Generate and apply a database migration safely.
---
1. Read schema changes from the diff or user description.
2. Generate a migration file with the project's tool (alembic, drizzle-kit, etc.).
3. Review for backwards-compatibility (no DROP COLUMN without a deprecation phase).
4. Apply to a local DB and run tests.
5. Open a PR with the migration and a deployment note.

These four skills cover ~80% of recurring tasks in any backend repo.

Multi-session patterns

The 2026 unlock isn’t bigger context windows — it’s multiple coordinated sessions.

Writer / Reviewer

Session A (writer):  implements the change
Session B (reviewer): fresh context, reviews A's diff

The reviewer doesn’t see how the code was written; only what the diff says. This kills the “I just wrote it, it must be right” bias and catches more issues than self-review.

Run pattern in Claude Code:

  1. Session A: “implement the change.”
  2. Save the branch.
  3. Session B (fresh): “review branch feature/x for correctness, style, and tests.”
  4. Apply review feedback in session A or a new session C.

Spec / Implement

Session A (spec):       writes a Markdown spec from the user's requirements
Session B (implement):  reads the spec, implements
Session C (verify):     reads the spec and the diff, confirms requirements met

Useful for non-trivial features. The spec becomes a permanent artifact in the PR.

Triage / Fix

Session A (triage):  reads a bug report, finds the root cause, writes a plan
Session B (fix):     applies the plan

Used heavily by Anthropic’s internal SRE team. Triage requires breadth (find the cause); fix requires focus (apply the change). Different sessions, different system prompts.

How to keep agents productive

A few patterns I keep reaching for:

1. Tight loops

Let the agent iterate fast. Run tests on every change. Compile on every change. The fewer round-trips of “agent writes code, you run tests, you copy back errors,” the better.

In Claude Code, configure the test runner so the agent can invoke it directly. Auto-run mode handles this.

2. Small commits

Commit frequently. Each commit is a “save point” you can roll back to. When the agent goes off the rails (it will), git reset --hard is your friend.

3. Explicit approval gates

For risky operations (destructive deletes, force-push, schema migrations on production), require explicit confirmation. Configure a Skill that pauses before such actions and waits for approval.

4. Persistent rules in CLAUDE.md

Project-wide constants — naming conventions, where things live, what tools to use — go in CLAUDE.md at the repo root. Per-task playbooks go in Skills. Per-session context goes in the chat. Three layers, three lifetimes.

5. Don’t argue, redirect

If the agent goes the wrong direction, don’t reason it back. Stop, give a one-line course correction, restart from a clean state. Saves tokens and time.

Skills marketplace

By 2026 there are thousands of community skills:

  • Anthropic’s official skills repo (the canonical examples).
  • Skill registries like skill-libraries and various community indexes.
  • Vendor-shipped skills (Drizzle, Prisma, AWS CDK, Terraform, etc.).

Pick conservatively. A skill in your loaded set costs context. Curate, don’t hoard.

Production-grade agentic coding

The pattern I see at companies actually shipping AI-written production code in 2026:

  1. Strict scope: agents handle “small but real” changes — bug fixes, refactor passes, dependency updates, test additions. Not green-field architecture.
  2. Always reviewed. Either by another agent (fresh context) or a human. No direct-to-main from an agent.
  3. CI as the safety net. Tests, type checks, linters, security scanners all run on agent PRs.
  4. Skills for repeatable patterns. Each common task is a skill, not a re-derived plan.
  5. Observability. Track which PRs an agent opened, time-to-merge, defect rate. Treat agents like a new junior dev — measure, coach, iterate.

The reason this works: small changes + thorough review + automation catches errors before they ship. The agent is fast and tireless; the system catches its mistakes.

Common failure modes

1. Skills too vague

A skill named “fix bugs” with description “fix the bug” is just a system prompt. Skills earn their cost when they encode specific knowledge.

2. Too many skills

Loading 50 skills bloats context, slows responses, confuses the model. Curate to ~10–20 active skills per repo.

3. Skills that depend on undeclared context

A skill that says “use the standard release format” without saying what the standard is will guess. Embed the format or reference a template.

4. No verification step

A skill that ends “and that’s done” without a test/verify step often produces something that compiles but doesn’t work. Always include “run the tests” or “verify by …”.

5. Single-session for everything

Long-running sessions accumulate context drift. The agent forgets what was decided 80 messages ago. Use multiple sessions for distinct phases.

What’s coming

  • Multi-agent orchestration built into IDEs (writer/reviewer/critic patterns as a first-class feature).
  • Skill discovery — Claude Code suggesting skills you might want based on your repo.
  • Sandboxed skills — running skill-provided scripts in a hermetic environment.
  • Org-level skills — companies publishing internal skill catalogs.

The trajectory is clear: skills become the unit of organizational knowledge for AI workflows.

Read this next

If you want a starter Claude Code Skills repo with the four core skills above wired up for a Python+TypeScript monorepo, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .