Ninjō · Training · Video 3

Continuous improvement: the 3 buckets

Every time you review an agent you'll find things. The trick is not to fix at random: it's to sort each finding into one of three buckets and tackle them with the logic each one calls for. That turns QA from firefighting into a system.

Intake You review the agent Real conversations · metrics · QA · what the autotester reports
You classify Each finding → 1 bucket Is it broken, can it be optimized, or is it ready to grow?
Output You attack by priority One hypothesis at a time · measure · roll back if it got worse
BUCKET 01

Fixes

Broken → fix it
Qué es

Behaviors that are wrong. The agent isn't doing what it should. High priority: this loses money today.

Typical signs
Repeats itself Sends the link too early Interrogates Makes things up Doesn't sound like you Doesn't identify the lead
How you work it
  • One hypothesis at a time (don't touch 5 things at once).
  • Surgical change to the prompt, then re-test.
  • Always with rollback on hand in case it gets worse.
Ties into the vision (Matt) The autotester catches these on its own: it runs after every deploy and verifies the agent still behaves as expected. And the wiki's troubleshooting tree takes you from symptom (e.g. "reasoning leakage") to cause to fix.
What you ask Claude

"My agent sends the payment link too early. Fix it without touching the rest and test it first."

BUCKET 02

Performance

Works → optimize
Qué es

The agent already works well. Now the game is raising conversion: getting more bookings and more sales from the same leads.

Where the needle moves
Better objection reply Link timing Highest-converting keyword Finer qualification Tone / rapport
How you work it
  • You look at the funnel and the best/worst conversations.
  • You optimize the stage that leaks most.
  • Mental A/B: you change, you measure against the prior period.
Ties into the vision (Matt) Making the agent "more complete and intelligent": this is where the playbooks and best practices we embed come in, so every agent is born near-optimal instead of learning it the hard way.
What you ask Claude

"At which funnel stage do I lose the most people, and what change would you try to improve conversion this week?"

BUCKET 03

Scaling

Ready → grow
Qué es

The agent performs and is measured. Nothing's broken or urgent to optimize: it's ready for more. Here you don't tinker, you multiply.

Ways to scale
More lead volume New channel (WhatsApp, web) Second agent / another offer Another funnel stage Another agency client
How you work it
  • You reuse the voice, templates, and archetype of the one that works.
  • Every new agent is born already measured and connected.
  • You go from running 1 to running a fleet.
Ties into the vision (Matt) It's the underlying thesis: MCP + playbooks to build without friction, multi-platform (Claude, Codex, ChatGPT), and the wiki as a living manual — to scale to many clients and many agents without breaking.
What you ask Claude

"This agent performs. Build me a second one for another channel reusing its voice, and leave it measured from day 1."

The rule that runs through all three buckets
1. One hypothesis at a time
2. Measure against the prior period
3. Roll back if it got worse
Ninjō · Continuous improvement · the 3-buckets framework — companion resource for Training Video 3