FEAT automatically detect the sonnet3.7 error and auto apply a workaround

2025-03-11 02:18:07 +00:00 · 2025-03-11 02:18:07 +00:00 · 36678969a9
parent 108244d091
commit 36678969a9
6 changed files with 230 additions and 10 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -14,6 +14,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Added comprehensive testing for think tag functionality
 - Added `--show-thoughts` flag to show thoughts of thinking models
 - Added `--disable-thinking` flag to disable thinking mode for Claude 3.7 Sonnet
+- Added automatic workaround for Claude 3.7 Sonnet thinking block errors
+- Added `--skip-sonnet37-workaround` flag to opt out of automatic error handling

 ### Fixed
 - Fixed unretryable API error (400) when using Claude 3.7 Sonnet with thinking mode enabled for extended periods
--- a/docs/docs/configuration/thinking-models.md
+++ b/docs/docs/configuration/thinking-models.md
@ -105,13 +105,37 @@ RA.Aid configures the model to use its native thinking mode, and then processes

 If you run RA.Aid without the `--show-thoughts` flag, the thinking content is still extracted from the model responses, but it won't be displayed in the console. This gives you a cleaner output focused only on the model's final responses.

-## Disabling Thinking Mode
+## Automatic Error Handling for Claude 3.7 Thinking Mode

-In some cases, you might want to disable the thinking mode feature for Claude 3.7 models, particularly if you're experiencing API errors or if you prefer the model to operate without the thinking capability.
+RA.Aid includes an automatic workaround for a known issue with Claude 3.7 Sonnet's thinking mode. When using Claude 3.7 Sonnet for extended periods (typically more than 10 minutes), you might encounter an unretryable API error (400) related to thinking blocks:

-### Using the disable_thinking Configuration Option
+```
+Unretryable API error: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'messages.1.content.0.type: Expected thinking or redacted_thinking, but found text...'}}
+```

-RA.Aid provides a `disable_thinking` configuration option that allows you to turn off the thinking mode for Claude 3.7 models:
+### Automatic Workaround
+
+When RA.Aid detects this specific error:
+
+1. It automatically disables thinking mode for Claude 3.7 Sonnet
+2. It continues the session without interruption
+3. It logs a warning message about the workaround being applied
+
+This automatic behavior ensures that your session continues smoothly even if the thinking block error occurs, without requiring manual intervention.
+
+### Opting Out of the Automatic Workaround
+
+If you prefer to handle these errors differently or want to maintain thinking mode at all costs, you can opt out of the automatic workaround using the `--skip-sonnet37-workaround` flag:
+
+```bash
+ra-aid -m "Debug the database connection issue" --provider anthropic --model claude-3-7-sonnet-20250219 --skip-sonnet37-workaround
+```
+
+When this flag is used, RA.Aid will not automatically disable thinking mode when the error occurs, and will instead crash with the unretryable API error.
+
+### Manually Disabling Thinking Mode
+
+You can also choose to disable thinking mode from the start using the `--disable-thinking` flag:

 ```bash
 ra-aid -m "Debug the database connection issue" --provider anthropic --model claude-3-7-sonnet-20250219 --disable-thinking
@ -125,12 +149,11 @@ When this option is enabled:

 ### When to Disable Thinking Mode

-Consider disabling thinking mode in the following scenarios:
+Consider manually disabling thinking mode in the following scenarios:

-1. **API Errors**: If you encounter unretryable API errors (400) related to thinking blocks
-2. **Long-Running Sessions**: For extended sessions that run for more than 10 minutes
-3. **Performance Concerns**: If you need faster responses and don't require the thinking content
-4. **Compatibility Issues**: If you're using tools or workflows that aren't fully compatible with thinking mode
+1. **Long-Running Sessions**: For extended sessions that you know will run for more than 10 minutes
+2. **Performance Concerns**: If you need faster responses and don't require the thinking content
+3. **Compatibility Issues**: If you're using tools or workflows that aren't fully compatible with thinking mode

 ## Troubleshooting and Best Practices

--- a/ra_aid/main.py
+++ b/ra_aid/main.py
@ -290,6 +290,11 @@ Examples:
        action="store_true",
        help="Display model thinking content extracted from think tags when supported by the model",
    )
+    parser.add_argument(
+        "--skip-sonnet37-workaround",
+        action="store_true",
+        help="Skip automatic workaround for Claude 3.7 Sonnet thinking block errors",
+    )
    parser.add_argument(
        "--reasoning-assistance",
        action="store_true",
--- a/ra_aid/agent_utils.py
+++ b/ra_aid/agent_utils.py
@ -760,11 +760,45 @@ def run_agent_with_retry(
                ) as e:
                    # Check if this is a BadRequestError (HTTP 400) which is unretryable
                    error_str = str(e).lower()
+                    
+                    # Special handling for Claude 3.7 Sonnet thinking block error
                    if (
                        "400" in error_str or "bad request" in error_str
-                    ) and isinstance(e, APIError):
-                        from ra_aid.agent_context import mark_agent_crashed
+                    ) and isinstance(e, APIError) and "expected thinking or redacted_thinking" in error_str:
+                        # This is the specific Claude 3.7 Sonnet thinking block error
+                        config = get_config_repository().get_all()
+                        provider = config.get("provider", "")
+                        model_name = config.get("model", "")
                        
+                        # Check if this is Claude 3.7 Sonnet and the user hasn't opted out of the workaround
+                        if (
+                            provider.lower() == "anthropic" and 
+                            "claude-3-7" in model_name.lower() and
+                            not config.get("skip_sonnet37_workaround", False)
+                        ):
+                            # Apply the workaround by enabling disable_thinking
+                            logger.warning(
+                                "Detected Claude 3.7 Sonnet thinking block error. "
+                                "Automatically applying workaround by disabling thinking mode. "
+                                "Use --skip-sonnet37-workaround to disable this behavior."
+                            )
+                            config_repo = get_config_repository()
+                            config_repo.set("disable_thinking", True)
+                            
+                            # Continue with the next attempt
+                            continue
+                        else:
+                            # User has opted out of the workaround or this isn't Claude 3.7 Sonnet
+                            from ra_aid.agent_context import mark_agent_crashed
+                            crash_message = f"Unretryable API error: {str(e)}"
+                            mark_agent_crashed(crash_message)
+                            logger.error("Agent has crashed: %s", crash_message)
+                            return f"Agent has crashed: {crash_message}"
+                    elif (
+                        "400" in error_str or "bad request" in error_str
+                    ) and isinstance(e, APIError):
+                        # Other 400 errors are still unretryable
+                        from ra_aid.agent_context import mark_agent_crashed
                        crash_message = f"Unretryable API error: {str(e)}"
                        mark_agent_crashed(crash_message)
                        logger.error("Agent has crashed: %s", crash_message)
--- a/tests/ra_aid/test_disable_thinking.py
+++ b/tests/ra_aid/test_disable_thinking.py
@ -55,6 +55,18 @@ class TestDisableThinking:
                {},
                "Non-Claude model should not have thinking param",
            ),
+            # Test case 5: Claude 3.7 model with skip_sonnet37_workaround enabled
+            (
+                "skip_sonnet37_workaround",
+                {
+                    "provider": "anthropic",
+                    "model": "claude-3-7-sonnet-20250219",
+                    "skip_sonnet37_workaround": True,
+                },
+                {"supports_thinking": True},
+                {"thinking": {"type": "enabled", "budget_tokens": 12000}},
+                "Claude 3.7 with skip_sonnet37_workaround=True should still have thinking param",
+            ),
        ],
    )
    def test_disable_thinking_option(
--- a/tests/ra_aid/test_sonnet37_workaround.py
+++ b/tests/ra_aid/test_sonnet37_workaround.py
@ -0,0 +1,144 @@
+import pytest
+from unittest.mock import MagicMock, patch
+
+from ra_aid.agent_utils import run_agent_with_retry
+from ra_aid.agent_context import reset_completion_flags
+
+
+# Create a mock APIError class for testing
+class MockAPIError(Exception):
+    """Mock version of Anthropic's APIError for testing."""
+    pass
+
+
+class TestSonnet37Workaround:
+    """Test suite for the automatic Claude 3.7 Sonnet thinking block error workaround."""
+
+    def test_automatic_workaround_applied(self):
+        """Test that the workaround is automatically applied when the specific error occurs."""
+        # Mock dependencies
+        mock_agent = MagicMock()
+        
+        # Create a mock error that simulates the thinking block error
+        thinking_error = MockAPIError("400 Bad Request: messages.1.content.0.type: Expected thinking or redacted_thinking, but found text")
+        
+        # Set up the run_agent_stream to first raise the error, then succeed
+        mock_run_stream = MagicMock()
+        mock_run_stream.side_effect = [
+            thinking_error,  # First call raises error
+            None,  # Second call succeeds
+        ]
+        
+        # Mock config repository
+        mock_config = {
+            "provider": "anthropic",
+            "model": "claude-3-7-sonnet-20250219",
+        }
+        
+        with patch("ra_aid.agent_utils.APIError", MockAPIError):
+            with patch("ra_aid.agent_utils.get_config_repository") as mock_get_config:
+                # Create a mock repository that returns our test config
+                mock_repo = MagicMock()
+                mock_repo.get_all.return_value = mock_config
+                mock_repo.get.side_effect = lambda key, default=None: mock_config.get(key, default)
+                mock_repo.set = MagicMock()
+                mock_get_config.return_value = mock_repo
+                
+                # Mock other dependencies to prevent actual execution
+                with patch("ra_aid.agent_utils._run_agent_stream", side_effect=mock_run_stream.side_effect):
+                    with patch("ra_aid.agent_utils._execute_test_command_wrapper") as mock_test_cmd:
+                        # Mock the test command wrapper to return a tuple indicating success
+                        mock_test_cmd.return_value = (True, "", False, 0)  # (should_break, prompt, auto_test, test_attempts)
+                        
+                        # Run the function
+                        result = run_agent_with_retry(mock_agent, "Test prompt")
+                        
+                        # Verify the workaround was applied
+                        mock_repo.set.assert_any_call("disable_thinking", True)
+                        
+                        # The result might be None since we're mocking _run_agent_stream
+                        # Just verify that the workaround was applied
+                        assert mock_repo.set.call_count > 0
+
+    def test_skip_sonnet37_workaround(self):
+        """Test that the workaround is not applied when skip_sonnet37_workaround is True."""
+        # Mock dependencies
+        mock_agent = MagicMock()
+        
+        # Create a mock error that simulates the thinking block error
+        thinking_error = MockAPIError("400 Bad Request: messages.1.content.0.type: Expected thinking or redacted_thinking, but found text")
+        
+        # Set up the run_agent_stream to raise the error
+        mock_run_stream = MagicMock()
+        mock_run_stream.side_effect = thinking_error
+        
+        # Mock config repository with skip_sonnet37_workaround=True
+        mock_config = {
+            "provider": "anthropic",
+            "model": "claude-3-7-sonnet-20250219",
+            "skip_sonnet37_workaround": True,
+        }
+        
+        with patch("ra_aid.agent_utils.APIError", MockAPIError):
+            with patch("ra_aid.agent_utils.get_config_repository") as mock_get_config:
+                # Create a mock repository that returns our test config
+                mock_repo = MagicMock()
+                mock_repo.get_all.return_value = mock_config
+                mock_repo.get.side_effect = lambda key, default=None: mock_config.get(key, default)
+                mock_get_config.return_value = mock_repo
+                
+                # Mock agent_context.mark_agent_crashed to verify it's called
+                with patch("ra_aid.agent_context.mark_agent_crashed") as mock_mark_crashed:
+                    # Mock other dependencies to prevent actual execution
+                    with patch("ra_aid.agent_utils._run_agent_stream", side_effect=mock_run_stream.side_effect):
+                        
+                        # Run the function - should crash with unretryable error
+                        result = run_agent_with_retry(mock_agent, "Test prompt")
+                        
+                        # Verify the agent was marked as crashed
+                        mock_mark_crashed.assert_called_once()
+                        
+                        # Verify the function returned a crash message
+                        assert "Agent has crashed" in result
+                        assert "Unretryable API error" in result
+
+    def test_non_thinking_error_not_handled(self):
+        """Test that other 400 errors are not handled by the workaround."""
+        # Mock dependencies
+        mock_agent = MagicMock()
+        
+        # Create a mock error that simulates a different 400 error
+        other_error = MockAPIError("400 Bad Request: Some other error message")
+        
+        # Set up the run_agent_stream to raise the error
+        mock_run_stream = MagicMock()
+        mock_run_stream.side_effect = other_error
+        
+        # Mock config repository
+        mock_config = {
+            "provider": "anthropic",
+            "model": "claude-3-7-sonnet-20250219",
+        }
+        
+        with patch("ra_aid.agent_utils.APIError", MockAPIError):
+            with patch("ra_aid.agent_utils.get_config_repository") as mock_get_config:
+                # Create a mock repository that returns our test config
+                mock_repo = MagicMock()
+                mock_repo.get_all.return_value = mock_config
+                mock_repo.get.side_effect = lambda key, default=None: mock_config.get(key, default)
+                mock_get_config.return_value = mock_repo
+                
+                # Mock agent_context.mark_agent_crashed to verify it's called
+                with patch("ra_aid.agent_context.mark_agent_crashed") as mock_mark_crashed:
+                    # Mock other dependencies to prevent actual execution
+                    with patch("ra_aid.agent_utils._run_agent_stream", side_effect=mock_run_stream.side_effect):
+                        
+                        # Run the function - should crash with unretryable error
+                        result = run_agent_with_retry(mock_agent, "Test prompt")
+                        
+                        # Verify the agent was marked as crashed
+                        mock_mark_crashed.assert_called_once()
+                        
+                        # Verify the function returned a crash message
+                        assert "Agent has crashed" in result
+                        assert "Unretryable API error" in result