FEAT automatically detect the sonnet3.7 error and auto apply a workaround

This commit is contained in:
Jose Leon 2025-03-11 02:18:07 +00:00
parent 108244d091
commit 36678969a9
6 changed files with 230 additions and 10 deletions

View File

@ -14,6 +14,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Added comprehensive testing for think tag functionality
- Added `--show-thoughts` flag to show thoughts of thinking models
- Added `--disable-thinking` flag to disable thinking mode for Claude 3.7 Sonnet
- Added automatic workaround for Claude 3.7 Sonnet thinking block errors
- Added `--skip-sonnet37-workaround` flag to opt out of automatic error handling
### Fixed
- Fixed unretryable API error (400) when using Claude 3.7 Sonnet with thinking mode enabled for extended periods

View File

@ -105,13 +105,37 @@ RA.Aid configures the model to use its native thinking mode, and then processes
If you run RA.Aid without the `--show-thoughts` flag, the thinking content is still extracted from the model responses, but it won't be displayed in the console. This gives you a cleaner output focused only on the model's final responses.
## Disabling Thinking Mode
## Automatic Error Handling for Claude 3.7 Thinking Mode
In some cases, you might want to disable the thinking mode feature for Claude 3.7 models, particularly if you're experiencing API errors or if you prefer the model to operate without the thinking capability.
RA.Aid includes an automatic workaround for a known issue with Claude 3.7 Sonnet's thinking mode. When using Claude 3.7 Sonnet for extended periods (typically more than 10 minutes), you might encounter an unretryable API error (400) related to thinking blocks:
### Using the disable_thinking Configuration Option
```
Unretryable API error: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'messages.1.content.0.type: Expected thinking or redacted_thinking, but found text...'}}
```
RA.Aid provides a `disable_thinking` configuration option that allows you to turn off the thinking mode for Claude 3.7 models:
### Automatic Workaround
When RA.Aid detects this specific error:
1. It automatically disables thinking mode for Claude 3.7 Sonnet
2. It continues the session without interruption
3. It logs a warning message about the workaround being applied
This automatic behavior ensures that your session continues smoothly even if the thinking block error occurs, without requiring manual intervention.
### Opting Out of the Automatic Workaround
If you prefer to handle these errors differently or want to maintain thinking mode at all costs, you can opt out of the automatic workaround using the `--skip-sonnet37-workaround` flag:
```bash
ra-aid -m "Debug the database connection issue" --provider anthropic --model claude-3-7-sonnet-20250219 --skip-sonnet37-workaround
```
When this flag is used, RA.Aid will not automatically disable thinking mode when the error occurs, and will instead crash with the unretryable API error.
### Manually Disabling Thinking Mode
You can also choose to disable thinking mode from the start using the `--disable-thinking` flag:
```bash
ra-aid -m "Debug the database connection issue" --provider anthropic --model claude-3-7-sonnet-20250219 --disable-thinking
@ -125,12 +149,11 @@ When this option is enabled:
### When to Disable Thinking Mode
Consider disabling thinking mode in the following scenarios:
Consider manually disabling thinking mode in the following scenarios:
1. **API Errors**: If you encounter unretryable API errors (400) related to thinking blocks
2. **Long-Running Sessions**: For extended sessions that run for more than 10 minutes
3. **Performance Concerns**: If you need faster responses and don't require the thinking content
4. **Compatibility Issues**: If you're using tools or workflows that aren't fully compatible with thinking mode
1. **Long-Running Sessions**: For extended sessions that you know will run for more than 10 minutes
2. **Performance Concerns**: If you need faster responses and don't require the thinking content
3. **Compatibility Issues**: If you're using tools or workflows that aren't fully compatible with thinking mode
## Troubleshooting and Best Practices

View File

@ -290,6 +290,11 @@ Examples:
action="store_true",
help="Display model thinking content extracted from think tags when supported by the model",
)
parser.add_argument(
"--skip-sonnet37-workaround",
action="store_true",
help="Skip automatic workaround for Claude 3.7 Sonnet thinking block errors",
)
parser.add_argument(
"--reasoning-assistance",
action="store_true",

View File

@ -760,11 +760,45 @@ def run_agent_with_retry(
) as e:
# Check if this is a BadRequestError (HTTP 400) which is unretryable
error_str = str(e).lower()
# Special handling for Claude 3.7 Sonnet thinking block error
if (
"400" in error_str or "bad request" in error_str
) and isinstance(e, APIError):
from ra_aid.agent_context import mark_agent_crashed
) and isinstance(e, APIError) and "expected thinking or redacted_thinking" in error_str:
# This is the specific Claude 3.7 Sonnet thinking block error
config = get_config_repository().get_all()
provider = config.get("provider", "")
model_name = config.get("model", "")
# Check if this is Claude 3.7 Sonnet and the user hasn't opted out of the workaround
if (
provider.lower() == "anthropic" and
"claude-3-7" in model_name.lower() and
not config.get("skip_sonnet37_workaround", False)
):
# Apply the workaround by enabling disable_thinking
logger.warning(
"Detected Claude 3.7 Sonnet thinking block error. "
"Automatically applying workaround by disabling thinking mode. "
"Use --skip-sonnet37-workaround to disable this behavior."
)
config_repo = get_config_repository()
config_repo.set("disable_thinking", True)
# Continue with the next attempt
continue
else:
# User has opted out of the workaround or this isn't Claude 3.7 Sonnet
from ra_aid.agent_context import mark_agent_crashed
crash_message = f"Unretryable API error: {str(e)}"
mark_agent_crashed(crash_message)
logger.error("Agent has crashed: %s", crash_message)
return f"Agent has crashed: {crash_message}"
elif (
"400" in error_str or "bad request" in error_str
) and isinstance(e, APIError):
# Other 400 errors are still unretryable
from ra_aid.agent_context import mark_agent_crashed
crash_message = f"Unretryable API error: {str(e)}"
mark_agent_crashed(crash_message)
logger.error("Agent has crashed: %s", crash_message)

View File

@ -55,6 +55,18 @@ class TestDisableThinking:
{},
"Non-Claude model should not have thinking param",
),
# Test case 5: Claude 3.7 model with skip_sonnet37_workaround enabled
(
"skip_sonnet37_workaround",
{
"provider": "anthropic",
"model": "claude-3-7-sonnet-20250219",
"skip_sonnet37_workaround": True,
},
{"supports_thinking": True},
{"thinking": {"type": "enabled", "budget_tokens": 12000}},
"Claude 3.7 with skip_sonnet37_workaround=True should still have thinking param",
),
],
)
def test_disable_thinking_option(

View File

@ -0,0 +1,144 @@
import pytest
from unittest.mock import MagicMock, patch
from ra_aid.agent_utils import run_agent_with_retry
from ra_aid.agent_context import reset_completion_flags
# Create a mock APIError class for testing
class MockAPIError(Exception):
"""Mock version of Anthropic's APIError for testing."""
pass
class TestSonnet37Workaround:
"""Test suite for the automatic Claude 3.7 Sonnet thinking block error workaround."""
def test_automatic_workaround_applied(self):
"""Test that the workaround is automatically applied when the specific error occurs."""
# Mock dependencies
mock_agent = MagicMock()
# Create a mock error that simulates the thinking block error
thinking_error = MockAPIError("400 Bad Request: messages.1.content.0.type: Expected thinking or redacted_thinking, but found text")
# Set up the run_agent_stream to first raise the error, then succeed
mock_run_stream = MagicMock()
mock_run_stream.side_effect = [
thinking_error, # First call raises error
None, # Second call succeeds
]
# Mock config repository
mock_config = {
"provider": "anthropic",
"model": "claude-3-7-sonnet-20250219",
}
with patch("ra_aid.agent_utils.APIError", MockAPIError):
with patch("ra_aid.agent_utils.get_config_repository") as mock_get_config:
# Create a mock repository that returns our test config
mock_repo = MagicMock()
mock_repo.get_all.return_value = mock_config
mock_repo.get.side_effect = lambda key, default=None: mock_config.get(key, default)
mock_repo.set = MagicMock()
mock_get_config.return_value = mock_repo
# Mock other dependencies to prevent actual execution
with patch("ra_aid.agent_utils._run_agent_stream", side_effect=mock_run_stream.side_effect):
with patch("ra_aid.agent_utils._execute_test_command_wrapper") as mock_test_cmd:
# Mock the test command wrapper to return a tuple indicating success
mock_test_cmd.return_value = (True, "", False, 0) # (should_break, prompt, auto_test, test_attempts)
# Run the function
result = run_agent_with_retry(mock_agent, "Test prompt")
# Verify the workaround was applied
mock_repo.set.assert_any_call("disable_thinking", True)
# The result might be None since we're mocking _run_agent_stream
# Just verify that the workaround was applied
assert mock_repo.set.call_count > 0
def test_skip_sonnet37_workaround(self):
"""Test that the workaround is not applied when skip_sonnet37_workaround is True."""
# Mock dependencies
mock_agent = MagicMock()
# Create a mock error that simulates the thinking block error
thinking_error = MockAPIError("400 Bad Request: messages.1.content.0.type: Expected thinking or redacted_thinking, but found text")
# Set up the run_agent_stream to raise the error
mock_run_stream = MagicMock()
mock_run_stream.side_effect = thinking_error
# Mock config repository with skip_sonnet37_workaround=True
mock_config = {
"provider": "anthropic",
"model": "claude-3-7-sonnet-20250219",
"skip_sonnet37_workaround": True,
}
with patch("ra_aid.agent_utils.APIError", MockAPIError):
with patch("ra_aid.agent_utils.get_config_repository") as mock_get_config:
# Create a mock repository that returns our test config
mock_repo = MagicMock()
mock_repo.get_all.return_value = mock_config
mock_repo.get.side_effect = lambda key, default=None: mock_config.get(key, default)
mock_get_config.return_value = mock_repo
# Mock agent_context.mark_agent_crashed to verify it's called
with patch("ra_aid.agent_context.mark_agent_crashed") as mock_mark_crashed:
# Mock other dependencies to prevent actual execution
with patch("ra_aid.agent_utils._run_agent_stream", side_effect=mock_run_stream.side_effect):
# Run the function - should crash with unretryable error
result = run_agent_with_retry(mock_agent, "Test prompt")
# Verify the agent was marked as crashed
mock_mark_crashed.assert_called_once()
# Verify the function returned a crash message
assert "Agent has crashed" in result
assert "Unretryable API error" in result
def test_non_thinking_error_not_handled(self):
"""Test that other 400 errors are not handled by the workaround."""
# Mock dependencies
mock_agent = MagicMock()
# Create a mock error that simulates a different 400 error
other_error = MockAPIError("400 Bad Request: Some other error message")
# Set up the run_agent_stream to raise the error
mock_run_stream = MagicMock()
mock_run_stream.side_effect = other_error
# Mock config repository
mock_config = {
"provider": "anthropic",
"model": "claude-3-7-sonnet-20250219",
}
with patch("ra_aid.agent_utils.APIError", MockAPIError):
with patch("ra_aid.agent_utils.get_config_repository") as mock_get_config:
# Create a mock repository that returns our test config
mock_repo = MagicMock()
mock_repo.get_all.return_value = mock_config
mock_repo.get.side_effect = lambda key, default=None: mock_config.get(key, default)
mock_get_config.return_value = mock_repo
# Mock agent_context.mark_agent_crashed to verify it's called
with patch("ra_aid.agent_context.mark_agent_crashed") as mock_mark_crashed:
# Mock other dependencies to prevent actual execution
with patch("ra_aid.agent_utils._run_agent_stream", side_effect=mock_run_stream.side_effect):
# Run the function - should crash with unretryable error
result = run_agent_with_retry(mock_agent, "Test prompt")
# Verify the agent was marked as crashed
mock_mark_crashed.assert_called_once()
# Verify the function returned a crash message
assert "Agent has crashed" in result
assert "Unretryable API error" in result