an independent review publication · est. 2026

We tested 47 skills
so you don't have to.

12 had broken setups. 6 crashed on launch. Only 3 actually saved us time. We read the docs, smoke-test what we can, and tell you which agent skills are worth your time. No sponsors. No affiliate links.

Read the reviews → How we test

~/gearscope/reviews.log

$ tail -f ~/gearscope/reviews.log

[2026-05-15] comfyui ................. 4/5 KEEP IT

[2026-05-17] claude-code-design-ai ... — testing

[2026-05-19] native-feel-skill ....... — testing

[2026-05-21] neuralinverse ........... — queued

# we read every doc. we run what we can. we're honest when we can't.

# no sponsors. no affiliate links.

// latest reviews

Recently tested

Real agent skills. Real smoke tests. Real verdicts.

KEEP IT HANDS-ON

Matt Pocock Skills

Matt Pocock's opinionated skill set brings grilling, deep-module design, and a tracker-backed ticket flow to coding a...

⚙ ⚙ ⚙ ⚙ ⚙ 4 / 5

KEEP IT HANDS-ON

googleapis/mcp-toolbox

The biggest official-vendor agent-infrastructure blind-spot GearScope had not reviewed. Google ships a single Go bina...

⚙ ⚙ ⚙ ⚙ ⚙ 4 / 5

KEEP IT HANDS-ON

Graphify

The 89,000-star graph engine that maps your codebase as a real graph (no embeddings) and lets agents query, path, and...

⚙ ⚙ ⚙ ⚙ ⚙ 4 / 5

IN QUEUE + 10 more

See all reviews →

🏆 THE STORY: HEADROOM ENDS GRAPHIFY'S EIGHT-PEAT (+521 → 60,483★ — leads the gain board for the FIRST TIME, snapping Graphify's 8-scan streak) + ponytail's 66-window gap-widening streak BROKEN (headroom outgained ponytail, gap narrowed 116) + Graphify still pulls away from caveman (gap 663 → 864) + NVIDIA/SkillSpector 93rd consecutive + Smithery directory scrape, NVIDIA/SkillSpector 93rd consecutive (+29 → 13,447★ — RECORD extended a 43RD TIME past 50; microsoft/SkillOpt narrowed the gap a 4TH consecutive scan — 274 → 257), NEW ESTABLISHED MCP TRACKS (10): borski/travel-hacking-toolkit + classfang/ssh-mcp-server + wshobson/maverick-mcp + RyanAlberts/best-of-Agent-Harnesses + Lyellr88/marm-memory + DAWNCR0W/affine-mcp-server + dmmulroy/overseer + finite-sample/rmcp + Qoyyuum/mcp-metatrader5-server + zilliztech/mcp-server-milvus [OFFICIAL], NEW ESTABLISHED SKILL TRACKS (10): JackyST0/awesome-agent-skills + JimLiu/baocut + Peiiii/nextclaw + contentful/skill-kit [OFFICIAL] + fallow-rs/fallow-skills + ollygarden/opentelemetry-agent-skills + simota/agent-skills + mblode/agent-skills + HLND2T/CS2_VibeSignatures + PixVerseAI/skills [OFFICIAL]...

// the workflow

How we test

Every skill goes through our testing pipeline. We're transparent about what we could and couldn't test. Each review is labeled with its test depth.

Read the docs, check the code

We load the full SKILL.md, every script, every reference file. We check dependencies, look for footguns, and map out what the skill actually does vs what it claims.

Test in an isolated sandbox

Every install runs inside a clean, throwaway Linux sandbox. Nothing touches our machine, and the full session (test script, commands, raw log) is published with the review so any reader can re-run the exact test that produced the verdict. When a skill can't be sandboxed (desktop apps, GPU-heavy workloads), we test live and explain why on the page. Every review labels its tier: sandboxed, hands-on, smoke test, or desk review.

Rate across four dimensions

1 to 5 gears each for docs, setup, value, reliability. No inflated scores. A 3 is average. Most skills land there.

Ship the verdict

KEEP IT, TRY IT, SKIP IT, or BROKEN. Popular skills get SKIP IT when they deserve it. Honest beats nice.

// sound familiar?

The skill install graveyard

Every hour spent untangling a broken skill is an hour not building. Here's what we catch before you do.

The problem

How we catch it

✗ Setup guide skipped three steps and your terminal is still throwing errors.

→ We install on a clean machine and list every missing step in the review.

✗ Two skills conflict and nobody warned you until your agent crashed mid-task.

→ We test in isolation and alongside common skills, then flag the conflicts.

✗ Every "top skills" list is just the README rephrased with extra adjectives.

→ We paste the real errors, real tracebacks, and the config that broke.

✗ Two hours configuring something that turns out not to work on macOS.

→ We test on macOS, Linux, and WSL. If it's broken on yours, we say so.

// origin story

Why this exists

One weekend, one too many broken installs.

origin.js — gearscope

1
2
3
4
5
6
7
8
9
10
11

            // weekend project. 47 skills installed.

            const broken = 12;  // setup steps were wrong

            const crashes = 6;   // agent died on launch

            const useful = 3;    // actually saved time

            if (broken + crashes > useful * 5) {

              buildReviewSite("GearScope");

              promise("no sponsors");

              promise("real errors");

              promise("honest ratings");

            }

// frequently asked

Questions you'll probably ask

How do you decide which skills to review?

We track new releases, GitHub stars, and what's being talked about in the agent communities. If a skill is getting attention or a reader requests one, it goes on the bench. We prioritize skills people are trying to install today, not last year's leaderboard.

Why no sponsors or affiliate links?

The moment a skill author pays us, every SKIP IT becomes a conversation we don't want to have. Reader tips and a future paid tier are the only revenue model.

How rigorous is your testing?

Depends on the review. Our gold standard is sandboxed: we run the install inside a clean, isolated Linux sandbox and publish the test script and raw log so any reader can re-run it. When a skill can't be sandboxed (desktop apps, GPU work), the review is labeled hands-on and explains why. Smoke tests are quick verification of the basics. Desk reviews mean we read but didn't install. Every review's depth badge tells you which tier applies. Full methodology →

How are gear ratings calibrated?

A 3 is average. Most skills land there. 4 means you should install it. 5 is reserved for the rare skill that's well-built and immediately useful. 1 and 2 mean the skill is broken or wasted your time. We publish the rubric next to each review.

Can I request a skill review?

Yes. Email reviews@gearscope.xyz or DM us on Twitter/X with the skill name and link. We can't promise we'll cover every request, but reader-requested skills jump the queue.

// support the mission

Keep the reviews honest

No sponsors means we answer only to you. Chip in if a review saved you an hour.

⚡ Tip in crypto

no recurring · pay what you want · BTC / ETH / SOL

Recently tested

Matt Pocock Skills

googleapis/mcp-toolbox

Graphify

See all reviews →

How we test

Read the docs, check the code

Test in an isolated sandbox

Rate across four dimensions

Ship the verdict

The skill install graveyard

Why this exists

Questions you'll probably ask

Get the skill digest

Keep the reviews honest