The Best Agent Is a Boring Agent

The clever-agent trap

  • First-time teams want clever → multi-step planning, reflection loops, dynamic tool selection, growing memory.
  • Exciting to build → almost always falls over in production.

The pattern that actually works

  • Fixed, short list of tools.
  • Loop runs at most ~10 steps before handing back to a human.
  • No reflection phase → no inner monologue, no self-grading.
  • Deterministic failure mode → when unsure, stop and ask.
  • Boring → also reliable, cheap, auditable.

Why “fancy” hurts

  • Every fancy feature → also a failure mode.
  • Reflection loops → add cost, can drift.
  • Dynamic tool selection → adds latency + new error surfaces.
  • Long-running memory → adds state you debug at 2am.

The rule

  • Start with the dumbest thing that could possibly work.
  • Add complexity only with a specific, measured reason.
  • ! Most of the time, you won’t need any.

Lesson: production reliability is bought by saying no to features, not by stacking them.