The right context window for the right job (caveman version)

Pick the smallest window that fits the work. Not the biggest the AI brand claims.

The Honest Context Rule

  1. The advertised number lies. Most AIs only think clearly up to ~65 % of their advertised window.
  2. Every extra token costs memory, money, and milliseconds.
  3. Facts in the middle of a long window get ignored. “Lost in the middle.”

Fastest, cheapest, most accurate setting is rarely the biggest one.

The research, in two lines

  • Lost in the Middle (Stanford 2023): facts in the middle of long context drop accuracy 30 %+. Beginning and end win.
  • RULER (NVIDIA 2024): models advertise ~1.5× more context than they actually handle.

Tier table — which window for which job

JobWindow
Quick chat, single answer4k - 8k
One file, no deps16k
One file + imports + test32k
Coding agent in a loop64k
Whole package or paper96k - 128k
”Drop the whole codebase”Use RAG instead

Why a coding agent wants 64k

Three things eat the window at once:

  • Files loaded. A package’s src/ is 30k - 80k tokens.
  • Reasoning. 1k - 2k tokens of thinking per turn.
  • Tool output. Tests, diffs, command logs.

At 32k you hit “window full” mid-session. At 64k you barely notice.

The cost side

CostWhat grows
MemoryKV cache doubles when window doubles
SpeedFirst-letter wait grows roughly with window²
MoneyPaid APIs charge per input token. 100k context = 100× the bill

Hybrid Mamba-Attention models are an exception — they pay almost nothing for long context.

RAG vs stuffing the window

Use RAG whenStuff into window when
Source > 200k tokensWhole source fits in 64k
One paragraph buried in 100 pagesNeed cross-document reasoning
Need cited sourcesCreative writing or refactoring

Rule of thumb: under 64k → paste. Over 200k → RAG.

What to do

  1. Daily chat? 8k.
  2. One file? 32k.
  3. Coding agent? 64k.
  4. Small book? 96k.
  5. Bigger? Switch to RAG.
  6. Always put important context at the START or END of the window. Never in the middle.

Recap

  • Plan for ~65 % of the advertised window.
  • Mid-window facts get ignored.
  • 64k is the right default for a coding agent in 2026.
  • Past 200k → RAG, not stuffing.
  • Hybrid Mamba models break the rule (long context becomes cheap).
  • The best context window is the smallest one that holds the job.