Tutorials

Adaptive Thinking in Claude 4.6: When the Model Should Slow Down (and When It Shouldn't)

Claude 4.6 introduced adaptive thinking, where the model decides for itself how deeply to reason on a given prompt. A practical guide to what it actually does, when it earns its latency cost, and the prompt patterns that make it work for you.

How Do I Use AI•8 min read•May 5, 2026

What Adaptive Thinking Actually Is

Anthropic released Claude Opus 4.6 on February 5, 2026, and Sonnet 4.6 on February 17, 2026. The headline feature in both was adaptive thinking. The model now decides for itself how much extended reasoning to use on each prompt, rather than thinking at a fixed budget set by the developer or by a slider.

The mechanism, according to Anthropic's official documentation on the Claude Platform, works like this. When you submit a prompt, the model evaluates the complexity of the task. For routine questions, it answers directly with little or no extended thinking. For multi-step or ambiguous requests, it spends more tokens reasoning through the problem before producing the final answer. The effort parameter, available in the API and in Claude Code, guides this decision but is not a hard budget. The model can think more or less than the parameter suggests if the task warrants it.

The practical effect, in everyday use of Claude Sonnet 4.6 and Opus 4.6 inside Claude.ai, Claude Cowork, Claude in Excel, or Claude in Chrome, is that you get faster responses on simple questions and more reliable answers on hard ones. You no longer have to manually decide whether to enable extended thinking. The model does it for you.

Whether that is a feature or a problem depends on whether you understand the patterns under the hood.

The Three Things That Trigger Deeper Thinking

Across hundreds of prompts in production usage, three signals consistently push Claude into deeper adaptive thinking. Knowing them lets you write prompts that get the right depth of response.

Signal one: explicit ambiguity. When the prompt contains conflicting constraints, multiple valid interpretations, or asks the model to compare options, adaptive thinking activates more aggressively. "Should I use SQL or BigQuery for this?" reliably produces deeper reasoning than "What is BigQuery?" The first requires weighing trade-offs. The second is a definition lookup.

Signal two: multi-step structure. Prompts that require chaining several decisions trigger deeper thinking. "Plan the migration from Postgres 14 to Postgres 16 for a production system handling 50,000 requests per minute" engages adaptive thinking far more than "Tell me about the differences between Postgres 14 and 16." The dependency between steps is what cues the model that simple recall will not be enough.

Signal three: novelty. When the prompt combines elements in ways that are not common in the training distribution (cross-domain analogies, edge-case scenarios, original problems), adaptive thinking spends more tokens. The model is essentially searching its knowledge graph harder, rather than retrieving a pre-formed answer.

The inverse also holds. Prompts that ask for a single fact, a definition, a quick translation, or a one-shot transformation will get shallow thinking, regardless of how you phrase the urgency.

When Adaptive Thinking Earns Its Latency

The honest framing on adaptive thinking is that it costs latency for the questions where it actually helps. According to Anthropic's published guidance on extended thinking, deeper reasoning adds wall-clock time and should only be used when it will meaningfully improve answer quality. Adaptive thinking is the model trying to make that trade-off automatically. It is not perfect.

High value scenarios. Coding problems with non-obvious failure modes. Architecture decisions where the wrong call costs weeks. Research questions that require synthesizing across multiple sources. Anything where the cost of a wrong answer is meaningfully larger than the cost of waiting an extra 15 to 60 seconds.

Low value scenarios. Quick lookups. Format conversions. Drafting tasks where you will iterate anyway. Conversational exchanges where momentum matters more than precision. Anything where you can tell from the response in two seconds whether it is right or wrong.

The Resolve.ai engineering team, in their published evaluation of Claude Sonnet 4.6 on production AI agents, recommends starting with effort set to medium and adjusting from there. They also flag a practical constraint that catches many developers off guard: thinking and output tokens share the same budget. Setting a low max_tokens limit can cause the model to hit the ceiling mid-reasoning and cut off abruptly, with no graceful degradation. Their default of 32k max_tokens, tuned down only for simpler subagent tasks, is a sensible starting point.

Prompt Patterns That Get the Right Depth

Three prompt patterns reliably produce the depth of thinking that matches the task. They work because they signal complexity to the model in ways that align with how adaptive thinking is calibrated.

Pattern one: state the constraints explicitly. Instead of "help me write this email," try "help me write this email to a customer who churned three months ago, where the goal is to surface a new feature that addresses their original complaint, but I want to avoid sounding like I am asking them to come back." The constraints make the thinking depth match the actual difficulty.

Pattern two: ask for the reasoning, not just the answer. Adding "walk me through how you would think about this before giving me the answer" pulls the model into deeper adaptive thinking, because it is now generating the reasoning chain as part of the response. This is especially useful for technical decisions where you want to verify the logic, not just the conclusion.

Pattern three: name the failure modes you want to avoid. "Suggest a refactor of this function. Avoid changes that would break backward compatibility, increase memory usage, or require new dependencies." The negative constraints force adaptive thinking to evaluate options against multiple criteria rather than producing the first plausible answer.

What does not work, contrary to what some prompt guides claim, is prefixing prompts with "think carefully" or "use deep reasoning." The model has been trained to evaluate task complexity from the structure of the prompt itself, not from instruction phrases. Saying "think hard" without giving the model anything hard to think about is the conversational equivalent of telling a person to "concentrate" on a multiplication problem they have already memorised. Nothing changes.

The Trust Calibration Problem

The biggest failure mode with adaptive thinking is not the model being wrong. It is users adjusting their trust incorrectly to the model's confidence.

When the model thinks for a long time and then produces an answer, most users assume the answer is more reliable. This is sometimes true and sometimes not. Long thinking can also indicate the model is unsure and exploring many paths, none of which it can verify. Anthropic's adaptive thinking documentation notes this directly: extended thinking improves reasoning on hard problems, but does not eliminate hallucination, especially on factual claims about specific dates, statistics, or citations.

The practical heuristic is the same one that applies to any AI output. Specific factual claims need a second source, regardless of how long the model thought. The model's reasoning chain is reliable as a record of what it considered. It is not reliable as evidence that the considerations were complete.

How Adaptive Thinking Pairs With Tools and Agents

Where adaptive thinking gets genuinely interesting is in agentic workflows. When Claude is operating as an agent, calling tools, executing code, or running multi-step research tasks, adaptive thinking lets it allocate deeper reasoning to the parts of the workflow that need it (deciding which tool to call next, evaluating the output of a previous step) and stay light on the parts that do not (formatting a response, executing a known-good code snippet).

The Resolve.ai team's evaluation found this was where Sonnet 4.6 outperformed Sonnet 4.5 most clearly: not on raw single-turn benchmarks, but on the back-end decision making inside long-running agent workflows. The model's tendency to think harder when the task got harder, and faster when it got easier, produced better end-to-end performance even when individual turns were not always faster.

For users running Claude in Cowork, Claude Code, or any of Anthropic's MCP-enabled integrations, this is the practical takeaway. The model is making good calls about reasoning depth in the background. Your job is to write prompts whose structure correctly communicates the complexity, so the model's calibration matches what you actually need.

The Same Pattern, Inside Excel

If the "describe before doing" idea sounds familiar, that is because Microsoft has now built the user-facing version of it directly into Excel. Plan Mode in Copilot for Excel, shipped in April 2026, makes the model write out its intended edits before touching the workbook. You read the plan, approve or amend it, then watch it execute.

The two patterns rhyme deliberately. Both are responses to the same underlying observation: the cheapest reliable correction in AI-assisted work is making the model's intent visible at the boundary where it meets the user's intent. Adaptive thinking surfaces depth automatically. Plan Mode surfaces intent explicitly. They are different points on the same design spectrum.

For a deeper walkthrough of how Plan Mode works in practice, see [our companion piece on Plan Mode in Excel Copilot](https://officeproductivityhacks.com/resources/excel-copilot-plan-mode-guide) over at Office Productivity Hacks.

Putting It Into Practice This Week

Three concrete moves to run this week:

Pick the five hardest prompts you ran in the last week. Re-run them on Claude Sonnet 4.6 or Opus 4.6 with the constraints stated explicitly and the failure modes named. Compare the answer quality to your original responses.
Stop typing "think carefully" into prompts. Spend the same effort writing a clearer task description instead. The model will think as hard as the task warrants if the task is described well.
For long agentic workflows, set max_tokens to at least 32k. If you are running Claude through the API or in a custom agent setup, this single change prevents the most common adaptive-thinking failure mode: getting cut off mid-reasoning with no clean answer.

Adaptive thinking is doing the hard work of allocating reasoning depth in the background. The leverage is in writing prompts whose structure lets it allocate correctly.

---

*Sources: Anthropic, "Introducing Claude Opus 4.6" (February 5, 2026); Anthropic, "Claude Sonnet 4.6 release notes" (February 17, 2026); Anthropic Claude Platform documentation, "Adaptive thinking" and "Building with extended thinking" (2026); Resolve.ai engineering blog, "Testing Claude Sonnet 4.6 Adaptive Thinking on Production AI Agents" (2026); AWS Bedrock documentation, "Adaptive thinking" (2026).*

---

*Join 132,000+ professionals at How Do I Use AI for evidence-based AI tutorials and frameworks.*

Found this helpful? Share it with others!

Follow for More

Getting Started

Adaptive Thinking in Claude 4.6: When the Model Should Slow Down (and When It Shouldn't)

What Adaptive Thinking Actually Is

The Three Things That Trigger Deeper Thinking

When Adaptive Thinking Earns Its Latency

Prompt Patterns That Get the Right Depth

The Trust Calibration Problem

How Adaptive Thinking Pairs With Tools and Agents

The Same Pattern, Inside Excel

Putting It Into Practice This Week

Related Articles

How to Write Your First ChatGPT Prompt

10 AI Tools That Will Save You 10 Hours a Week

ChatGPT vs Claude vs Gemini: Which AI Should You Use?