Skip to main content
Advanced

Claude's 1M Context Window: When It Helps and When It Hurts

Anthropic rolled the 1-million-token context window to general availability on March 13, 2026. This guide covers where the longer window genuinely changes what you can do, and where it quietly makes your results worse.

How Do I Use AI11 min read

What Actually Changed on March 13, 2026

Anthropic announced that the 1-million-token context window is now generally available for Claude Opus 4.6 and Claude Sonnet 4.6 at standard pricing, with no long-context surcharge. That sentence deserves to be unpacked, because it bundles three separate things that each matter for how you use the product.

First, the window is real and it is ten times what Claude shipped with a year ago. One million tokens is approximately 750,000 words, or roughly 2,500 to 3,000 pages of typical business prose. A full novel, a quarterly filing, a small codebase, or a few dozen meeting transcripts all fit comfortably inside a single prompt.

Second, retrieval accuracy has improved. Anthropic's published evaluations report around 90% retrieval accuracy across the full window and an MRCR v2 score of 78.3%. Those numbers matter because the "lost in the middle" problem — where facts buried halfway through a long context quietly stopped being retrievable — has plagued long-context models since the feature first appeared.

Third, pricing is flat. Earlier versions of long-context features carried premium multipliers. A 900,000-token request is now billed at the same per-token rate as a 9,000-token request, which removes the financial reason to chunk unnecessarily.

The upshot is that a capability that used to require engineering — vector databases, retrieval pipelines, clever chunking — is now available as a single prompt. That unlocks real work. It also creates new ways to waste your context and your money.

Where the 1M Window Genuinely Changes What You Can Do

End-to-end document review

A 400-page contract, a year of board minutes, or a full regulatory filing can now be sent as a single prompt with a question attached. You do not need to build a retrieval pipeline for a one-off review. You paste the document and ask.

The pattern that works best: front-load the instruction ("I want you to identify every clause that references X") before the document, then repeat the instruction at the end after the document. Claude's attention over very long contexts is strongest at the start and end, so bracketing the ask is a simple technique that lifts accuracy noticeably.

Whole-codebase analysis

A small-to-medium codebase — perhaps 50,000 to 200,000 lines — now fits alongside a task description. That enables questions like "where is authentication handled across this app?" or "which modules call the billing API?" to be answered from the code itself rather than from a hand-picked subset.

The practical limit is usually not the window size but the signal-to-noise ratio. A codebase packed with autogenerated files, vendored dependencies, and build artefacts will produce worse answers than a curated slice of the same codebase. More context available does not mean more context useful.

Multi-document synthesis

The use case the 1M window was built for is genuinely novel work with related-but-different sources. A dozen research papers, a set of customer interview transcripts, a competitor's product documentation plus your own — these were previously impossible to hold side by side. Now they are one prompt.

The ask that works well here is explicitly comparative. "Identify where these six papers agree, where they disagree, and where the evidence in paper three contradicts the claim in paper one." The model is given a frame for what to do with the volume of material, rather than being asked to summarise it.

Long-running agent tasks

For autonomous workflows, the practical effect of the bigger window is that an agent can hold far more tool output, intermediate reasoning, and task history in a single session without being re-primed. Anthropic's own Claude Code agent shipped support for the 1M window on a similar timeline, and engineering teams using it report more coherent multi-hour sessions on large refactors.

Where the 1M Window Will Quietly Make Things Worse

The awkward truth is that a longer window does not always help. There are four failure modes worth watching for.

Diluted attention on a narrow question

If your real question fits in 5,000 tokens, do not pad it to 500,000. The model's attention degrades across a very long context in ways that are hard to predict, and things near the middle of a huge prompt get underweighted compared to things near the start and end. A question about clause 7 of a contract answered from the full contract is more prone to error than the same question answered from pages 6 through 10.

The rule is: use only the context the task requires. If you can answer the question by showing three sections instead of the whole book, do that.

Wasted tokens on irrelevant material

Pricing is flat per token, which means a 900,000-token request costs the same rate as a 9,000-token request. But the total bill is still 100 times larger. An agent that reflexively dumps an entire knowledge base into every prompt will cost an order of magnitude more than one that retrieves the right slice.

If you are paying per API call, long-context inputs are where costs balloon silently. Monitor token usage as deliberately as you monitor rate limits.

Slower responses

A prompt with 500,000 input tokens takes noticeably longer to respond than one with 5,000 tokens. For interactive work — coding, writing, conversation — that latency shows up as a degraded experience. Save the long window for tasks where the latency is worth it, not for tasks where you simply have a lot of material to paste.

Stale information within the window

The longer a conversation runs, the more likely it is that early information becomes outdated. If you pasted a document at the start of a two-hour session and then revised that document in a different tool, Claude is still reasoning from the old version. Clearing regularly and re-uploading is the cleanest approach for anything that changes during the session.

The Practical Workflow

A workflow that takes advantage of the 1M window without falling into its traps looks like this.

Start by asking whether the task needs the extra context at all. If the question fits in a chunk of your codebase, your document, or your data, select that chunk. Do not paste everything on autopilot.

When you do need the long window, structure the prompt for attention. Put the instruction at the top, the source material in the middle, and a restatement of the instruction at the end. Anthropic's own guidance treats this as the default pattern for long inputs, and it consistently improves the model's ability to focus on what you actually asked.

Label your sources clearly inside the context. Wrap each document in a tag or a header. "Document 1: Q3 earnings call transcript. Document 2: Q3 press release." The model will reference these labels back to you, which makes the output verifiable.

Ask for citations within the context. "For each claim in your answer, cite the document and page number from the material above." This is the most effective single move to reduce hallucination in long-context responses. The model is less likely to invent a fact when it has to attach a specific page to it.

Review the output with the context still in front of you. The single biggest long-context failure mode is accepting an answer that sounds right without checking it against the source. The whole point of including the source was to make verification possible.

Where the Capability Falls Short

Two caveats are worth naming even though they are not failure modes exactly.

First, 90% retrieval accuracy is impressive by the standard of prior models, but it is not 100%. On a critical task — a compliance review, a legal question, a medical summary — the 10% of cases where Claude misses a fact buried on page 400 is the 10% that will hurt you. Long-context search should supplement human review on high-stakes work, not replace it.

Second, retrieval accuracy degrades on genuinely adversarial or highly similar content. If a document contains near-duplicate clauses with subtly different meanings, Claude can mix them up. Human eyes remain the right tool for fine-grained disambiguation of look-alike text.

A Thumbnail Cost Calculation

Because pricing is now flat, the arithmetic is easy. At Opus 4.6 list rates, a 900,000-input-token prompt with a 5,000-output-token response costs roughly \$15 on the input side and a few cents on the output side. That is cheap for a one-off deep analysis; it is expensive for a routine query.

A reasonable heuristic: the 1M window is a good default for tasks that would otherwise take a person an hour. It is a bad default for tasks that would take a person five minutes, because you will pay more than their time is worth in tokens.

When to Reach for a Retrieval Pipeline Instead

The 1M window reduces the need to build retrieval systems, but it does not eliminate it. A few scenarios still call for retrieval even with the bigger window available.

Repeated access to the same corpus. If you are answering a hundred questions against the same 500,000-token document set, it is cheaper and faster to embed the corpus once and retrieve chunks per question than to send the full corpus with every query.

Dynamic or extremely large data. Anything larger than the window itself — a multi-gigabyte codebase, a streaming log, a document store that changes hourly — still needs retrieval. The 1M window is a better ceiling, not an infinite one.

Latency-sensitive workflows. A chat assistant that needs to respond in under three seconds cannot afford the extra latency of a million-token prompt, regardless of cost.

The Short Version

Claude's 1M context window is a genuine unlock for document review, codebase analysis, and multi-document synthesis. It removes a class of engineering work that used to be required to handle large inputs.

It is not a reason to stop curating your context. Use the long window when the task actually requires it, structure your prompts so the instruction bookends the source material, label and cite sources inside the context, and review the output against the material you provided. Those four disciplines are the difference between a 90%-accurate tool that saves you hours and a 90%-accurate tool that makes plausible-sounding mistakes you then have to fix.

For tasks that pair long-context reasoning with spreadsheet data analysis, [Office Productivity Hacks](https://officeproductivityhacks.com) covers how to combine Copilot's Excel tooling with AI workflows. And if you're building prompts that take advantage of the longer window, our [prompt frameworks guide](/resources/prompt-frameworks-better-ai-outputs) has patterns that scale well as inputs grow.

Found this helpful? Share it with others!

Follow for More