KEEP IT SANDBOXED functional ~ tested 2026-05-17

// sandboxed in ubuntu 24.04 · aarch64 ·install log · why not fully functional: Repo structure, frontmatter, cross-references, naming conventions, and file references all verified in sandbox. Full functional testing requires Claude Code or another supported harness with a running LLM session, which is beyond sandbox scope. Skill content quality assessed via desk review of all 14 SKILL.md files.

Superpowers

Item: Superpowers
Rating: 4
Author: GearScope

by Jesse Vincent (obra) · https://github.com/obra/superpowers · MIT · v5.1.0 · updated 2026-05-04

The 193K-star skills framework that makes AI coding agents follow disciplined engineering workflows.

⚙ ⚙ ⚙ ⚙ ⚙ 4 / 5

quality 4/5

documentation 4/5

setup 4/5

value 5/5

ecosystem fit 4/5

// bottom line

Superpowers is the most mature and thorough agent skills framework available today. Its 14 skills form a coherent software development methodology that works across 8 different agent platforms. A few rough edges (frontmatter spec violations, heavy token load, dogmatic process for simple tasks) keep it from perfection, but for any team relying on AI coding agents, this is the standard to beat.

Claude Code marketplace

$/plugin install superpowers@claude-plugins-official

Codex CLI

$/plugins → search "superpowers" → Install Plugin

Cursor

$/add-plugin superpowers in Agent chat

install if

Teams using AI coding agents daily. If Claude Code, Gemini CLI, Cursor, or Codex are part of your workflow, Superpowers gives those agents a coherent development methodology. The 14-skill pipeline covers brainstorming through branch finishing.
Developers who want TDD discipline in agent workflows. The test-driven-development skill is the centerpiece. It enforces red-green-refactor cycles before code generation, which catches regressions early.
Multi-platform agent teams. Works across 8 harnesses (Claude Code, Gemini CLI, Codex, OpenCode, and more). One methodology, every platform.

skip if

Solo developers with simple projects. The 14-skill pipeline is designed for structured software development. If you are asking an agent to write one-off scripts, the process overhead is not worth it.
Teams that dislike opinionated workflows. Superpowers is dogmatic about process. It explicitly says "violating the letter of the rules is violating the spirit." If you prefer flexible, pick-what-you-need tooling, the rigidity will frustrate you.
Developers with limited context budgets. Each skill loads 1,000-2,000 tokens. Running the full pipeline consumes 15,000+ tokens per task. On tight context windows, this is a real cost.
anthropics/skills -- Official Anthropic skills repo, lighter weight, fewer opinions about process. Better if you want skill snippets without a prescribed methodology.
DenisSergeevitch/agents-best-practices -- Provider-neutral best practices across Codex, Claude Code, and other harnesses. Less opinionated, more of a reference than a framework.
addyosmani/agent-skills -- Curated production-grade engineering skills by Addy Osmani. Individual skills you pick and choose, not a complete pipeline.

What It Does

Superpowers is a plugin for AI coding agents (Claude Code, Codex CLI, Cursor, Gemini CLI, OpenCode, GitHub Copilot CLI, and others) that installs 14 interlocking skills. These skills enforce a disciplined software development methodology: brainstorm before building, write specs, create implementation plans with bite-sized tasks, execute via subagents with two-stage code review, and practice strict test-driven development. The skills trigger automatically when you start a session. You do not invoke them manually. You just start working, and the agent follows the methodology.

The core loop goes like this: you describe what you want, the agent asks clarifying questions one at a time, writes a spec, creates a detailed plan with exact file paths and code, then dispatches fresh subagents to implement each task. Each task gets a spec compliance review followed by a code quality review. The agent runs for hours without losing focus because every subagent starts with clean context.

The Good

Deeply coherent methodology. The 14 skills form a genuine pipeline, not a grab bag. Brainstorming flows to writing-plans, which flows to subagent-driven-development, which flows to finishing-a-development-branch. Each skill explicitly names its successor. The cross-references are all valid (verified in sandbox: 17 cross-skill references, all resolve to real skills). This is the only agent skill framework I have seen where the skills compose into a complete workflow rather than being standalone tips.

Anti-rationalization engineering. The standout design choice. Every discipline-enforcing skill (TDD, systematic-debugging, brainstorming) includes a "Common Rationalizations" table that catalogs the exact excuses agents make when they want to skip the process, paired with rebuttals. The TDD skill lists 11 rationalizations ("Too simple to test", "I'll test after", "Deleting X hours is wasteful") and explicitly counters each one. This is not documentation. It is adversarial prompt engineering tuned to resist an agent's natural inclination to cut corners. The writing-skills SKILL.md explains the technique in detail, including a testing methodology where you run pressure scenarios without the skill, document what the agent does wrong, then write the skill to address those specific failures. This is sophisticated behavioral design.

Eight-platform support with zero dependencies. Superpowers runs on Claude Code, Codex CLI, Codex App, Cursor, Gemini CLI, OpenCode, Factory Droid, and GitHub Copilot CLI. Each platform gets its own tool mapping reference (separate files for Codex, Copilot, Gemini). The session-start hook detects which platform is running and emits the right JSON format for each. The project has a stated zero-dependency policy, and it means it: no npm packages, no Python libraries, no external tools. Just markdown skills and a bash hook.

Exceptional contributor guardrails. The AGENTS.md file (which doubles as CLAUDE.md) includes a brutally honest warning to AI agents: "This repo has a 94% PR rejection rate." It catalogs exactly what will get a PR closed (fabricated content, compliance rewrites, bulk PRs, fork-specific changes) and requires a session transcript proving the integration works for any new harness support. The acceptance test is specific: open a clean session, send "Let's make a react todo list," and verify brainstorming auto-triggers. This is a project that has learned hard lessons from AI-generated contributions.

Real testing infrastructure. 26 test scripts across multiple categories: skill triggering, explicit skill requests, subagent-driven development, code review, worktree handling, OpenCode plugin loading, brainstorm server tests. The tests are not stubs. The code review test plants real bugs (SQL injection, plaintext password handling) and asserts the reviewer catches them. The subagent-driven development integration test runs an actual end-to-end session.

The Bad

Self-consistency violations in its own spec. The writing-skills SKILL.md defines strict rules: descriptions must start with "Use when," frontmatter must be under 1024 bytes. The brainstorming skill's description starts with "You MUST use this before any creative work" instead of "Use when." The writing-plans frontmatter is 2,996 bytes and writing-skills is 4,168 bytes, both exceeding the 1,024-byte limit they define. For a project that enforces rules with "Violating the letter of the rules is violating the spirit of the rules," these violations sting. The project acknowledges its philosophy differs from Anthropic's guidance, but these are violations of its own guidance.

Massive token overhead. The 14 SKILL.md files total 3,207 lines and approximately 17,000 words. The session-start hook injects the using-superpowers skill (787 words) into every conversation. Then each relevant skill loads on demand. The writing-skills skill alone is 655 lines (3,212 words). The brainstorming skill is 164 lines. For agents with limited context windows, this is a real constraint. The project acknowledges the problem and has guidelines for token efficiency, but several skills are far above the targets they set (getting-started skills should be under 150 words, others under 500).

Heavy process for light tasks. The brainstorming skill has a 9-step mandatory checklist that applies to "EVERY project regardless of perceived simplicity." The anti-pattern section explicitly states: "A todo list, a single-function utility, a config change, all of them." This means fixing a typo in a config file triggers: explore project context, ask clarifying questions one at a time, propose 2-3 approaches, present design in sections, write a spec document, self-review the spec, get user approval, then move to planning. Experienced developers will find this process suffocating for routine work. The project's response is that "simple" projects are where assumptions cause waste, but there is a genuine usability cost here.

The "human partner" convention. Every skill refers to the user as "your human partner." The contributor guidelines explain this is deliberate and not interchangeable with "the user." In practice, it reads as patronizing and creates a mildly odd dynamic where the agent addresses you as a partner it must protect from itself. This is a style choice, not a bug, but it makes the skills feel more like a personality overlay than a neutral tool.

Smoke Test Results

Framework skill: runs across 8 declared platforms, so the smoke test is structural: do the skill files exist, does the metadata pass its own spec, do the cross-skill references resolve, do the per-platform tool mappings exist. Ran in a clean isolated Linux sandbox; no harness was executed (that's the "functional partial" tag above).

Structural validation

$ verify all 14 SKILL.md files have valid YAML frontmatter
✅ 14/14 valid (name + description fields present)
$ verify skill names match their directory names
✅ 14/14 match
$ verify cross-skill references resolve (superpowers:xxx)
✅ 17/17 resolve to real skills
$ verify local file references resolve
✅ 3/3 resolve
$ verify session-start hook bash shebang
✅ valid
$ verify README documents install for declared harnesses
✅ 5/5 harnesses (Claude Code, Codex, Cursor, Gemini, OpenCode)
$ verify test scripts present
✅ 26 test scripts found
$ check frontmatter sizes against the skill's own 1024-byte spec
❌ writing-plans = 2996 bytes; writing-skills = 4168 bytes (spec violations)
$ check description follows the "Use when" convention
❌ brainstorming description starts with "You MUST use this" instead of "Use when"
$ confirm license declared
✅ MIT

Pass rate: 8 of 10. Two failures are not bugs in the skill's *behavior* - they're the framework violating its own published spec. That's worth flagging because the skill explicitly states "violating the letter of the rules is violating the spirit." Documented in The Bad above.

Full sandbox log →

Setup Walkthrough

For Claude Code (the primary target):

Install from the official marketplace: /plugin install superpowers@claude-plugins-official
Start a new Claude Code session.
The session-start hook auto-injects the using-superpowers skill into your conversation.
Say "Let's build a React todo list" or any build request.
The agent will automatically trigger the brainstorming skill and start asking questions.

For Cursor:

In Cursor Agent chat: /add-plugin superpowers
Start coding. The hook detects Cursor via CURSOR_PLUGIN_ROOT and injects context.

For Gemini CLI:

gemini extensions install https://github.com/obra/superpowers
Start a session. The GEMINI.md bootstrap loads automatically.

The install is one command per platform. No configuration files, no environment variables, no API keys needed. This is as frictionless as agent skill installation gets.

Alternatives

anthropics/skills -- Official Anthropic skills repo, lighter weight, fewer opinions about process. Better if you want skill snippets without a prescribed methodology.
DenisSergeevitch/agents-best-practices -- Provider-neutral best practices across Codex, Claude Code, and other harnesses. Less opinionated, more of a reference than a framework.
addyosmani/agent-skills -- Curated production-grade engineering skills by Addy Osmani. Individual skills you pick and choose, not a complete pipeline.

// review provenance

reviewed by: GearScope
tested: 2026-05-17 · macOS (Apple Silicon)
last verified: 2026-05-17
depth: SANDBOXED
sponsorship: none, ever

report stale suggest correction

← previous

Supabase Agent Skills

Vercel Agent Skills

Want the next one?

Five honest reviews and a verdict you can trust. Every Friday. No spam, no affiliate links.