The right context window for the right job (caveman version)
Pick the smallest window that fits the work. Not the biggest the AI brand claims.
The Honest Context Rule
- The advertised number lies. Most AIs only think clearly up to ~65 % of their advertised window.
- Every extra token costs memory, money, and milliseconds.
- Facts in the middle of a long window get ignored. “Lost in the middle.”
Fastest, cheapest, most accurate setting is rarely the biggest one.
The research, in two lines
- Lost in the Middle (Stanford 2023): facts in the middle of long context drop accuracy 30 %+. Beginning and end win.
- RULER (NVIDIA 2024): models advertise ~1.5× more context than they actually handle.
Tier table — which window for which job
| Job | Window |
|---|---|
| Quick chat, single answer | 4k - 8k |
| One file, no deps | 16k |
| One file + imports + test | 32k |
| Coding agent in a loop | 64k |
| Whole package or paper | 96k - 128k |
| ”Drop the whole codebase” | Use RAG instead |
Why a coding agent wants 64k
Three things eat the window at once:
- Files loaded. A package’s
src/is 30k - 80k tokens. - Reasoning. 1k - 2k tokens of thinking per turn.
- Tool output. Tests, diffs, command logs.
At 32k you hit “window full” mid-session. At 64k you barely notice.
The cost side
| Cost | What grows |
|---|---|
| Memory | KV cache doubles when window doubles |
| Speed | First-letter wait grows roughly with window² |
| Money | Paid APIs charge per input token. 100k context = 100× the bill |
Hybrid Mamba-Attention models are an exception — they pay almost nothing for long context.
RAG vs stuffing the window
| Use RAG when | Stuff into window when |
|---|---|
| Source > 200k tokens | Whole source fits in 64k |
| One paragraph buried in 100 pages | Need cross-document reasoning |
| Need cited sources | Creative writing or refactoring |
Rule of thumb: under 64k → paste. Over 200k → RAG.
What to do
- Daily chat? 8k.
- One file? 32k.
- Coding agent? 64k.
- Small book? 96k.
- Bigger? Switch to RAG.
- Always put important context at the START or END of the window. Never in the middle.
Recap
- Plan for ~65 % of the advertised window.
- Mid-window facts get ignored.
- 64k is the right default for a coding agent in 2026.
- Past 200k → RAG, not stuffing.
- Hybrid Mamba models break the rule (long context becomes cheap).
- The best context window is the smallest one that holds the job.