FIX Claude 3.7 Sonnet throws Expected thinking

This commit is contained in:
Jose Leon 2025-03-11 00:32:27 +00:00
parent 376fe18b83
commit 108244d091
8 changed files with 465 additions and 2 deletions

1
.gitignore vendored
View File

@ -16,3 +16,4 @@ appmap.log
*.swp
/vsc/node_modules
/vsc/dist
cline_docs/

View File

@ -13,6 +13,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Added model parameters for think tag support
- Added comprehensive testing for think tag functionality
- Added `--show-thoughts` flag to show thoughts of thinking models
- Added `--disable-thinking` flag to disable thinking mode for Claude 3.7 Sonnet
### Fixed
- Fixed unretryable API error (400) when using Claude 3.7 Sonnet with thinking mode enabled for extended periods
- Improved message formatting for Claude 3.7 Sonnet to ensure thinking blocks are properly included
### Changed
- Updated langchain/langgraph deps

View File

@ -105,6 +105,33 @@ RA.Aid configures the model to use its native thinking mode, and then processes
If you run RA.Aid without the `--show-thoughts` flag, the thinking content is still extracted from the model responses, but it won't be displayed in the console. This gives you a cleaner output focused only on the model's final responses.
## Disabling Thinking Mode
In some cases, you might want to disable the thinking mode feature for Claude 3.7 models, particularly if you're experiencing API errors or if you prefer the model to operate without the thinking capability.
### Using the disable_thinking Configuration Option
RA.Aid provides a `disable_thinking` configuration option that allows you to turn off the thinking mode for Claude 3.7 models:
```bash
ra-aid -m "Debug the database connection issue" --provider anthropic --model claude-3-7-sonnet-20250219 --disable-thinking
```
When this option is enabled:
- The thinking mode will not be activated for Claude 3.7 models
- The model will operate in standard mode without the structured thinking blocks
- This can help avoid certain API errors that might occur with thinking mode enabled
### When to Disable Thinking Mode
Consider disabling thinking mode in the following scenarios:
1. **API Errors**: If you encounter unretryable API errors (400) related to thinking blocks
2. **Long-Running Sessions**: For extended sessions that run for more than 10 minutes
3. **Performance Concerns**: If you need faster responses and don't require the thinking content
4. **Compatibility Issues**: If you're using tools or workflows that aren't fully compatible with thinking mode
## Troubleshooting and Best Practices
### Common Issues
@ -117,6 +144,20 @@ If you're not seeing thinking content despite using the `--show-thoughts` flag:
- Verify that the model is properly configured in your environment
- Check that the model is actually including thinking content in its responses (not all prompts will generate thinking)
#### API errors with Claude 3.7 Sonnet
If you encounter errors like:
```
Unretryable API error: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'messages.1.content.0.type: Expected thinking or redacted_thinking, but found text...'}}
```
This is related to the thinking mode format requirements. You can:
- Use the `--disable-thinking` flag to turn off thinking mode
- Upgrade to the latest version of RA.Aid which includes fixes for these errors
- For long-running sessions, consider restarting the assistant periodically
#### Excessive or irrelevant thinking
If the thinking content is too verbose or irrelevant:
@ -137,4 +178,3 @@ For the most effective use of thinking models:
4. **Compare thinking with output**: Use the thinking content to evaluate the quality of the model's reasoning and identify potential flaws in its approach.
5. **Provide clear instructions**: When the model's thinking seems off-track, provide clearer instructions in your next prompt to guide its reasoning process.

View File

@ -526,6 +526,64 @@ def _handle_fallback_response(
msg_list.extend(msg_list_response)
def _ensure_thinking_block(messages: list[BaseMessage], config: Dict[str, Any]) -> list[BaseMessage]:
"""
Ensure that messages sent to Claude 3.7 with thinking enabled have a thinking block at the start.
When thinking is enabled for Claude 3.7, the API requires that any assistant message
starts with a thinking block. This function checks if the model is Claude 3.7 with
thinking enabled, and if so, ensures that assistant messages have a thinking block.
Args:
messages: List of messages to check and potentially modify
config: Configuration dictionary
Returns:
Modified list of messages with thinking blocks added if needed
"""
# Check if we're using Claude 3.7 with thinking enabled
provider = config.get("provider", "")
model_name = config.get("model", "")
# Skip if thinking is disabled or not using Claude 3.7
if config.get("disable_thinking", False):
return messages
# Only apply to Claude 3.7 models
if not (provider.lower() == "anthropic" and "claude-3-7" in model_name.lower()):
return messages
# Get model configuration to check if thinking is supported
model_config = models_params.get(provider, {}).get(model_name, {})
if not model_config.get("supports_thinking", False):
return messages
# Make a copy of the messages to avoid modifying the original
modified_messages = messages.copy()
# Check each message
for i, message in enumerate(modified_messages):
# Only check assistant messages
if hasattr(message, "type") and message.type == "ai":
# If content is a list (structured format)
if isinstance(message.content, list):
# Check if the first item is a thinking block
if not (len(message.content) > 0 and
isinstance(message.content[0], dict) and
message.content[0].get("type") == "thinking"):
# Add a redacted_thinking block at the start
message.content.insert(0, {"type": "redacted_thinking"})
logger.debug("Added redacted_thinking block to assistant message")
# If content is a string, we can't modify it properly
# This shouldn't happen with Claude 3.7, but log it if it does
elif isinstance(message.content, str):
logger.warning(
"Found string content in assistant message with Claude 3.7 thinking enabled. "
"This may cause API errors if the message doesn't start with a thinking block."
)
return modified_messages
def _run_agent_stream(agent: RAgents, msg_list: list[BaseMessage]):
"""
Streams agent output while handling completion and interruption.
@ -552,6 +610,9 @@ def _run_agent_stream(agent: RAgents, msg_list: list[BaseMessage]):
stream_config["callbacks"] = []
stream_config["callbacks"].append(cb)
# Ensure messages have thinking blocks if needed
msg_list = _ensure_thinking_block(msg_list, config)
while True:
for chunk in agent.stream({"messages": msg_list}, stream_config):
logger.debug("Agent output: %s", chunk)
@ -636,6 +697,23 @@ def run_agent_with_retry(
return f"Agent has crashed: {crash_message}"
try:
# Ensure messages have thinking blocks if needed before each run
config = get_config_repository().get_all()
if is_anthropic_claude(config):
provider = config.get("provider", "")
model_name = config.get("model", "")
# Only apply to Claude 3.7 models with thinking enabled
if (provider.lower() == "anthropic" and
"claude-3-7" in model_name.lower() and
not config.get("disable_thinking", False)):
# Get model configuration to check if thinking is supported
model_config = models_params.get(provider, {}).get(model_name, {})
if model_config.get("supports_thinking", False):
logger.debug("Ensuring thinking blocks for Claude 3.7 before agent run")
msg_list = _ensure_thinking_block(msg_list, config)
_run_agent_stream(agent, msg_list)
if fallback_handler:
fallback_handler.reset_fallback_handler()

View File

@ -259,7 +259,7 @@ def create_llm_client(
else:
temp_kwargs = {}
if supports_thinking:
if supports_thinking and not config.get("disable_thinking", False):
temp_kwargs = {"thinking": {"type": "enabled", "budget_tokens": 12000}}
if provider == "deepseek":

View File

@ -0,0 +1,103 @@
import pytest
from unittest.mock import MagicMock, patch
from ra_aid.llm import create_llm_client
class TestDisableThinking:
"""Test suite for the disable_thinking configuration option."""
@pytest.mark.parametrize(
"test_id, config, model_config, expected_thinking_param, description",
[
# Test case 1: Claude 3.7 model with thinking enabled (default)
(
"claude_thinking_enabled",
{
"provider": "anthropic",
"model": "claude-3-7-sonnet-20250219",
},
{"supports_thinking": True},
{"thinking": {"type": "enabled", "budget_tokens": 12000}},
"Claude 3.7 should have thinking enabled by default",
),
# Test case 2: Claude 3.7 model with thinking explicitly disabled
(
"claude_thinking_disabled",
{
"provider": "anthropic",
"model": "claude-3-7-sonnet-20250219",
"disable_thinking": True,
},
{"supports_thinking": True},
{},
"Claude 3.7 with disable_thinking=True should not have thinking param",
),
# Test case 3: Non-thinking model should not have thinking param
(
"non_thinking_model",
{
"provider": "anthropic",
"model": "claude-3-5-sonnet-20240620",
},
{"supports_thinking": False},
{},
"Non-thinking model should not have thinking param",
),
# Test case 4: Non-Claude model should not have thinking param
(
"non_claude_model",
{
"provider": "openai",
"model": "gpt-4",
},
{},
{},
"Non-Claude model should not have thinking param",
),
],
)
def test_disable_thinking_option(
self, test_id, config, model_config, expected_thinking_param, description
):
"""Test that the disable_thinking option correctly controls thinking mode."""
# Mock the necessary dependencies
with patch("ra_aid.llm.ChatAnthropic") as mock_anthropic:
with patch("ra_aid.llm.ChatOpenAI") as mock_openai:
with patch("ra_aid.llm.models_params") as mock_models_params:
# Set up the mock to return the specified model_config
mock_models_params.get.return_value = {config["model"]: model_config}
# Set up the mock for get_provider_config
with patch("ra_aid.llm.get_provider_config") as mock_get_provider_config:
# Include disable_thinking in the provider config if it's in the test config
provider_config = {
"api_key": "test-key",
"base_url": None,
}
if "disable_thinking" in config:
provider_config["disable_thinking"] = config["disable_thinking"]
mock_get_provider_config.return_value = provider_config
# Call the function without passing disable_thinking directly
create_llm_client(
config["provider"],
config["model"],
temperature=None,
is_expert=False
)
# Check if the correct parameters were passed
if config["provider"] == "anthropic":
# Get the kwargs passed to ChatAnthropic
_, kwargs = mock_anthropic.call_args
# Check if thinking param was included or not
if expected_thinking_param:
assert "thinking" in kwargs, f"Test {test_id} failed: {description}"
assert kwargs["thinking"] == expected_thinking_param["thinking"], \
f"Test {test_id} failed: Thinking param doesn't match expected value"
else:
assert "thinking" not in kwargs, \
f"Test {test_id} failed: Thinking param should not be present"

View File

@ -0,0 +1,145 @@
import pytest
from unittest.mock import MagicMock, patch
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
from ra_aid.agent_utils import _ensure_thinking_block
class TestEnsureThinkingBlock:
"""Test suite for the _ensure_thinking_block function."""
@pytest.mark.parametrize(
"test_id, messages, config, expected_changes, description",
[
# Test case 1: Non-Claude 3.7 model should not modify messages
(
"non_claude_model",
[
HumanMessage(content="Hello"),
AIMessage(content=[{"type": "text", "text": "Hi there"}]),
],
{"provider": "openai", "model": "gpt-4"},
False,
"Non-Claude 3.7 model should not modify messages",
),
# Test case 2: Claude 3.7 model with thinking disabled should not modify messages
(
"claude_thinking_disabled",
[
HumanMessage(content="Hello"),
AIMessage(content=[{"type": "text", "text": "Hi there"}]),
],
{
"provider": "anthropic",
"model": "claude-3-7-sonnet-20250219",
"disable_thinking": True,
},
False,
"Claude 3.7 with thinking disabled should not modify messages",
),
# Test case 3: Claude 3.7 model with thinking enabled but no AI messages should not modify messages
(
"claude_no_ai_messages",
[
HumanMessage(content="Hello"),
SystemMessage(content="System message"),
],
{"provider": "anthropic", "model": "claude-3-7-sonnet-20250219"},
False,
"Claude 3.7 with no AI messages should not modify messages",
),
# Test case 4: Claude 3.7 model with thinking enabled and AI message with text content should log warning
(
"claude_ai_text_content",
[
HumanMessage(content="Hello"),
AIMessage(content="Text response"),
],
{"provider": "anthropic", "model": "claude-3-7-sonnet-20250219"},
False,
"Claude 3.7 with AI message with text content should log warning",
),
# Test case 5: Claude 3.7 model with thinking enabled and AI message with list content but no thinking block
(
"claude_ai_no_thinking_block",
[
HumanMessage(content="Hello"),
AIMessage(content=[{"type": "text", "text": "Hi there"}]),
],
{"provider": "anthropic", "model": "claude-3-7-sonnet-20250219"},
True,
"Claude 3.7 with AI message without thinking block should add one",
),
# Test case 6: Claude 3.7 model with thinking enabled and AI message with list content with thinking block
(
"claude_ai_with_thinking_block",
[
HumanMessage(content="Hello"),
AIMessage(
content=[
{"type": "thinking", "thinking": "Let me think..."},
{"type": "text", "text": "Hi there"},
]
),
],
{"provider": "anthropic", "model": "claude-3-7-sonnet-20250219"},
False,
"Claude 3.7 with AI message with thinking block should not modify it",
),
# Test case 7: Claude 3.7 model with thinking enabled and multiple AI messages
(
"claude_multiple_ai_messages",
[
HumanMessage(content="Hello"),
AIMessage(content=[{"type": "text", "text": "First response"}]),
HumanMessage(content="Follow-up"),
AIMessage(content=[{"type": "text", "text": "Second response"}]),
],
{"provider": "anthropic", "model": "claude-3-7-sonnet-20250219"},
True,
"Claude 3.7 with multiple AI messages should add thinking blocks to all",
),
],
)
def test_ensure_thinking_block(
self, test_id, messages, config, expected_changes, description
):
"""Test the _ensure_thinking_block function with various inputs."""
# Mock the logger
with patch("ra_aid.agent_utils.logger") as mock_logger:
# Mock the models_params dictionary
with patch("ra_aid.agent_utils.models_params") as mock_models_params:
# Set up the mock to return supports_thinking=True for Claude 3.7 models
if "claude-3-7" in config.get("model", ""):
mock_models_params.get.return_value = {
config["model"]: {"supports_thinking": True}
}
else:
mock_models_params.get.return_value = {}
# Make a copy of the original messages for comparison
original_messages = [
AIMessage(content=msg.content) if isinstance(msg, AIMessage) else msg
for msg in messages
]
# Call the function
result = _ensure_thinking_block(messages, config)
# Check if the result is different from the input when expected
if expected_changes:
assert result != original_messages, f"Test {test_id} failed: {description}"
# Check that AI messages have thinking blocks
for msg in result:
if hasattr(msg, "type") and msg.type == "ai":
if isinstance(msg.content, list) and len(msg.content) > 0:
assert msg.content[0].get("type") == "thinking" or msg.content[0].get("type") == "redacted_thinking", \
f"Test {test_id} failed: AI message should have thinking block"
else:
# Check that the messages were not modified
assert result == messages, f"Test {test_id} failed: {description}"
# Check for warning logs for text content
if "claude_ai_text_content" == test_id:
mock_logger.warning.assert_called_once()

View File

@ -0,0 +1,91 @@
import pytest
from unittest.mock import MagicMock, patch
from langchain_core.messages import AIMessage, HumanMessage
from ra_aid.agent_utils import run_agent_with_retry
class TestThinkingIntegration:
"""Test suite for the integration of thinking block functionality in run_agent_with_retry."""
@pytest.mark.parametrize(
"test_id, config, model_name, should_ensure_thinking, description",
[
# Test case 1: Claude 3.7 model with thinking enabled should ensure thinking blocks
(
"claude_thinking_enabled",
{
"provider": "anthropic",
"model": "claude-3-7-sonnet-20250219",
},
"claude-3-7-sonnet-20250219",
True,
"Claude 3.7 should ensure thinking blocks",
),
# Test case 2: Claude 3.7 model with thinking disabled should not ensure thinking blocks
(
"claude_thinking_disabled",
{
"provider": "anthropic",
"model": "claude-3-7-sonnet-20250219",
"disable_thinking": True,
},
"claude-3-7-sonnet-20250219",
False,
"Claude 3.7 with thinking disabled should not ensure thinking blocks",
),
# Test case 3: Non-Claude 3.7 model should not ensure thinking blocks
(
"non_claude_model",
{
"provider": "openai",
"model": "gpt-4",
},
"gpt-4",
False,
"Non-Claude model should not ensure thinking blocks",
),
],
)
def test_run_agent_with_retry_thinking_integration(
self, test_id, config, model_name, should_ensure_thinking, description
):
"""Test that run_agent_with_retry correctly integrates thinking block functionality."""
# Mock the necessary dependencies
with patch("ra_aid.agent_utils.get_config_repository") as mock_get_config:
with patch("ra_aid.agent_utils._ensure_thinking_block") as mock_ensure_thinking:
with patch("ra_aid.agent_utils._run_agent_stream") as mock_run_agent_stream:
with patch("ra_aid.agent_utils._setup_interrupt_handling") as mock_setup:
with patch("ra_aid.agent_utils._restore_interrupt_handling"):
with patch("ra_aid.agent_utils.agent_context"):
with patch("ra_aid.agent_utils.InterruptibleSection"):
with patch("ra_aid.agent_utils.is_anthropic_claude") as mock_is_anthropic:
with patch("ra_aid.agent_utils.models_params") as mock_models_params:
# Set up the mocks
mock_get_config.return_value.get_all.return_value = config
mock_get_config.return_value.get.return_value = False
mock_setup.return_value = None
mock_run_agent_stream.return_value = True
# Set up is_anthropic_claude to return True for Claude models
mock_is_anthropic.return_value = "claude" in model_name.lower()
# Set up models_params to return supports_thinking=True for Claude 3.7 models
if "claude-3-7" in model_name:
mock_models_params.get.return_value = {
model_name: {"supports_thinking": True}
}
else:
mock_models_params.get.return_value = {}
# Create a mock agent
mock_agent = MagicMock()
# Call the function
run_agent_with_retry(mock_agent, "Test prompt")
# Check if _ensure_thinking_block was called correctly
if should_ensure_thinking:
mock_ensure_thinking.assert_called()
else:
mock_ensure_thinking.assert_not_called()