an independent review publication · est. 2026

Agent skills reviewed
brutally honestly.

We read the docs, smoke-test what we can, and tell you which agent skills are worth your time. MCP servers, Hermes skills, Cursor rules, skill packs. We cover them all. No sponsors. No affiliate links.

~/gearscope/reviews.log
$ tail -f ~/gearscope/reviews.log

[2026-05-15]  comfyui  .................  4/5  KEEP IT
[2026-05-17]  claude-code-design-ai  ...   —   testing
[2026-05-19]  native-feel-skill  .......   —   testing
[2026-05-21]  neuralinverse  ...........   —   queued

# we read every doc. we run what we can. we're honest when we can't.
# no sponsors. no affiliate links.

$
// latest reviews

Recently tested

Real agent skills. Real smoke tests. Real verdicts.

// the workflow

How we test

Every skill goes through our testing pipeline. We're transparent about what we could and couldn't test. Each review is labeled with its test depth.

01

Read the docs, check the code

We load the full SKILL.md, every script, every reference file. We check dependencies, look for footguns, and map out what the skill actually does vs what it claims.

02

Test in an isolated sandbox

Every install runs inside a clean, throwaway Linux sandbox. Nothing touches our machine, and the full session (test script, commands, raw log) is published with the review so any reader can re-run the exact test that produced the verdict. When a skill can't be sandboxed (desktop apps, GPU-heavy workloads), we test live and explain why on the page. Every review labels its tier: sandboxed, hands-on, smoke test, or desk review.

03

Rate across four dimensions

1 to 5 gears each for docs, setup, value, reliability. No inflated scores. A 3 is average. Most skills land there.

04

Ship the verdict

KEEP IT, TRY IT, SKIP IT, or BROKEN. Popular skills get SKIP IT when they deserve it. Honest beats nice.

// sound familiar?

The skill install graveyard

Every hour spent untangling a broken skill is an hour not building. Here's what we catch before you do.

The problem
How we catch it
Setup guide skipped three steps and your terminal is still throwing errors.
We install on a clean machine and list every missing step in the review.
Two skills conflict and nobody warned you until your agent crashed mid-task.
We test in isolation and alongside common skills, then flag the conflicts.
Every "top skills" list is just the README rephrased with extra adjectives.
We paste the real errors, real tracebacks, and the config that broke.
Two hours configuring something that turns out not to work on macOS.
We test on macOS, Linux, and WSL. If it's broken on yours, we say so.
// origin story

Why this exists

One weekend, one too many broken installs.

origin.js — gearscope
1
2
3
4
5
6
7
8
9
10
11
// weekend project. 47 skills installed.
const broken = 12; // setup steps were wrong
const crashes = 6; // agent died on launch
const useful = 3; // actually saved time

if (broken + crashes > useful * 5) {
  buildReviewSite("GearScope");
  promise("no sponsors");
  promise("real errors");
  promise("honest ratings");
}
// frequently asked

Questions you'll probably ask

We track new releases, GitHub stars, and what's being talked about in the agent communities. If a skill is getting attention or a reader requests one, it goes on the bench. We prioritize skills people are trying to install today, not last year's leaderboard.
The moment a skill author pays us, every SKIP IT becomes a conversation we don't want to have. Reader tips and a future paid tier are the only revenue model.
Depends on the review. Our gold standard is sandboxed: we run the install inside a clean, isolated Linux sandbox and publish the test script and raw log so any reader can re-run it. When a skill can't be sandboxed (desktop apps, GPU work), the review is labeled hands-on and explains why. Smoke tests are quick verification of the basics. Desk reviews mean we read but didn't install. Every review's depth badge tells you which tier applies. Full methodology →
A 3 is average. Most skills land there. 4 means you should install it. 5 is reserved for the rare skill that's well-built and immediately useful. 1 and 2 mean the skill is broken or wasted your time. We publish the rubric next to each review.
Yes. Email hello@gearscope.xyz or DM us on Twitter/X with the skill name and link. We can't promise we'll cover every request, but reader-requested skills jump the queue.
// weekly digest

Get the skill digest

Five skills that mattered. Honest ratings. Every Friday.

one-click unsubscribe.
// support the mission

Keep the reviews honest

No sponsors means we answer only to you. Chip in if a review saved you an hour.