FIX Claude 3.7 Sonnet throws Expected thinking

2025-03-11 00:32:27 +00:00 · 2025-03-11 00:32:27 +00:00 · 108244d091
parent 376fe18b83
commit 108244d091
8 changed files with 465 additions and 2 deletions
--- a/.gitignore
+++ b/.gitignore
@ -16,3 +16,4 @@ appmap.log
 *.swp
 /vsc/node_modules
 /vsc/dist
+cline_docs/
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -13,6 +13,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Added model parameters for think tag support
 - Added comprehensive testing for think tag functionality
 - Added `--show-thoughts` flag to show thoughts of thinking models
+- Added `--disable-thinking` flag to disable thinking mode for Claude 3.7 Sonnet
+
+### Fixed
+- Fixed unretryable API error (400) when using Claude 3.7 Sonnet with thinking mode enabled for extended periods
+- Improved message formatting for Claude 3.7 Sonnet to ensure thinking blocks are properly included

 ### Changed
 - Updated langchain/langgraph deps
--- a/docs/docs/configuration/thinking-models.md
+++ b/docs/docs/configuration/thinking-models.md
@ -105,6 +105,33 @@ RA.Aid configures the model to use its native thinking mode, and then processes

 If you run RA.Aid without the `--show-thoughts` flag, the thinking content is still extracted from the model responses, but it won't be displayed in the console. This gives you a cleaner output focused only on the model's final responses.

+## Disabling Thinking Mode
+
+In some cases, you might want to disable the thinking mode feature for Claude 3.7 models, particularly if you're experiencing API errors or if you prefer the model to operate without the thinking capability.
+
+### Using the disable_thinking Configuration Option
+
+RA.Aid provides a `disable_thinking` configuration option that allows you to turn off the thinking mode for Claude 3.7 models:
+
+```bash
+ra-aid -m "Debug the database connection issue" --provider anthropic --model claude-3-7-sonnet-20250219 --disable-thinking
+```
+
+When this option is enabled:
+
+- The thinking mode will not be activated for Claude 3.7 models
+- The model will operate in standard mode without the structured thinking blocks
+- This can help avoid certain API errors that might occur with thinking mode enabled
+
+### When to Disable Thinking Mode
+
+Consider disabling thinking mode in the following scenarios:
+
+1. **API Errors**: If you encounter unretryable API errors (400) related to thinking blocks
+2. **Long-Running Sessions**: For extended sessions that run for more than 10 minutes
+3. **Performance Concerns**: If you need faster responses and don't require the thinking content
+4. **Compatibility Issues**: If you're using tools or workflows that aren't fully compatible with thinking mode
+
 ## Troubleshooting and Best Practices

 ### Common Issues
@ -117,6 +144,20 @@ If you're not seeing thinking content despite using the `--show-thoughts` flag:
 - Verify that the model is properly configured in your environment
 - Check that the model is actually including thinking content in its responses (not all prompts will generate thinking)

+#### API errors with Claude 3.7 Sonnet
+
+If you encounter errors like:
+
+```
+Unretryable API error: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'messages.1.content.0.type: Expected thinking or redacted_thinking, but found text...'}}
+```
+
+This is related to the thinking mode format requirements. You can:
+
+- Use the `--disable-thinking` flag to turn off thinking mode
+- Upgrade to the latest version of RA.Aid which includes fixes for these errors
+- For long-running sessions, consider restarting the assistant periodically
+
 #### Excessive or irrelevant thinking

 If the thinking content is too verbose or irrelevant:
@ -137,4 +178,3 @@ For the most effective use of thinking models:
 4. **Compare thinking with output**: Use the thinking content to evaluate the quality of the model's reasoning and identify potential flaws in its approach.

 5. **Provide clear instructions**: When the model's thinking seems off-track, provide clearer instructions in your next prompt to guide its reasoning process.
-
--- a/ra_aid/agent_utils.py
+++ b/ra_aid/agent_utils.py
@ -526,6 +526,64 @@ def _handle_fallback_response(
        msg_list.extend(msg_list_response)


+def _ensure_thinking_block(messages: list[BaseMessage], config: Dict[str, Any]) -> list[BaseMessage]:
+    """
+    Ensure that messages sent to Claude 3.7 with thinking enabled have a thinking block at the start.
+    
+    When thinking is enabled for Claude 3.7, the API requires that any assistant message
+    starts with a thinking block. This function checks if the model is Claude 3.7 with
+    thinking enabled, and if so, ensures that assistant messages have a thinking block.
+    
+    Args:
+        messages: List of messages to check and potentially modify
+        config: Configuration dictionary
+        
+    Returns:
+        Modified list of messages with thinking blocks added if needed
+    """
+    # Check if we're using Claude 3.7 with thinking enabled
+    provider = config.get("provider", "")
+    model_name = config.get("model", "")
+    
+    # Skip if thinking is disabled or not using Claude 3.7
+    if config.get("disable_thinking", False):
+        return messages
+        
+    # Only apply to Claude 3.7 models
+    if not (provider.lower() == "anthropic" and "claude-3-7" in model_name.lower()):
+        return messages
+    
+    # Get model configuration to check if thinking is supported
+    model_config = models_params.get(provider, {}).get(model_name, {})
+    if not model_config.get("supports_thinking", False):
+        return messages
+    
+    # Make a copy of the messages to avoid modifying the original
+    modified_messages = messages.copy()
+    
+    # Check each message
+    for i, message in enumerate(modified_messages):
+        # Only check assistant messages
+        if hasattr(message, "type") and message.type == "ai":
+            # If content is a list (structured format)
+            if isinstance(message.content, list):
+                # Check if the first item is a thinking block
+                if not (len(message.content) > 0 and 
+                        isinstance(message.content[0], dict) and 
+                        message.content[0].get("type") == "thinking"):
+                    # Add a redacted_thinking block at the start
+                    message.content.insert(0, {"type": "redacted_thinking"})
+                    logger.debug("Added redacted_thinking block to assistant message")
+            # If content is a string, we can't modify it properly
+            # This shouldn't happen with Claude 3.7, but log it if it does
+            elif isinstance(message.content, str):
+                logger.warning(
+                    "Found string content in assistant message with Claude 3.7 thinking enabled. "
+                    "This may cause API errors if the message doesn't start with a thinking block."
+                )
+    
+    return modified_messages
+
 def _run_agent_stream(agent: RAgents, msg_list: list[BaseMessage]):
    """
    Streams agent output while handling completion and interruption.
@ -552,6 +610,9 @@ def _run_agent_stream(agent: RAgents, msg_list: list[BaseMessage]):
            stream_config["callbacks"] = []
        stream_config["callbacks"].append(cb)
        
+        # Ensure messages have thinking blocks if needed
+        msg_list = _ensure_thinking_block(msg_list, config)
+
    while True:
        for chunk in agent.stream({"messages": msg_list}, stream_config):
            logger.debug("Agent output: %s", chunk)
@ -636,6 +697,23 @@ def run_agent_with_retry(
                    return f"Agent has crashed: {crash_message}"

                try:
+                    # Ensure messages have thinking blocks if needed before each run
+                    config = get_config_repository().get_all()
+                    if is_anthropic_claude(config):
+                        provider = config.get("provider", "")
+                        model_name = config.get("model", "")
+                        
+                        # Only apply to Claude 3.7 models with thinking enabled
+                        if (provider.lower() == "anthropic" and 
+                            "claude-3-7" in model_name.lower() and 
+                            not config.get("disable_thinking", False)):
+                            
+                            # Get model configuration to check if thinking is supported
+                            model_config = models_params.get(provider, {}).get(model_name, {})
+                            if model_config.get("supports_thinking", False):
+                                logger.debug("Ensuring thinking blocks for Claude 3.7 before agent run")
+                                msg_list = _ensure_thinking_block(msg_list, config)
+                    
                    _run_agent_stream(agent, msg_list)
                    if fallback_handler:
                        fallback_handler.reset_fallback_handler()
--- a/ra_aid/llm.py
+++ b/ra_aid/llm.py
@ -259,7 +259,7 @@ def create_llm_client(
    else:
        temp_kwargs = {}

-    if supports_thinking:
+    if supports_thinking and not config.get("disable_thinking", False):
        temp_kwargs = {"thinking": {"type": "enabled", "budget_tokens": 12000}}

    if provider == "deepseek":
--- a/tests/ra_aid/test_disable_thinking.py
+++ b/tests/ra_aid/test_disable_thinking.py
@ -0,0 +1,103 @@
+import pytest
+from unittest.mock import MagicMock, patch
+
+from ra_aid.llm import create_llm_client
+
+
+class TestDisableThinking:
+    """Test suite for the disable_thinking configuration option."""
+
+    @pytest.mark.parametrize(
+        "test_id, config, model_config, expected_thinking_param, description",
+        [
+            # Test case 1: Claude 3.7 model with thinking enabled (default)
+            (
+                "claude_thinking_enabled",
+                {
+                    "provider": "anthropic",
+                    "model": "claude-3-7-sonnet-20250219",
+                },
+                {"supports_thinking": True},
+                {"thinking": {"type": "enabled", "budget_tokens": 12000}},
+                "Claude 3.7 should have thinking enabled by default",
+            ),
+            # Test case 2: Claude 3.7 model with thinking explicitly disabled
+            (
+                "claude_thinking_disabled",
+                {
+                    "provider": "anthropic",
+                    "model": "claude-3-7-sonnet-20250219",
+                    "disable_thinking": True,
+                },
+                {"supports_thinking": True},
+                {},
+                "Claude 3.7 with disable_thinking=True should not have thinking param",
+            ),
+            # Test case 3: Non-thinking model should not have thinking param
+            (
+                "non_thinking_model",
+                {
+                    "provider": "anthropic",
+                    "model": "claude-3-5-sonnet-20240620",
+                },
+                {"supports_thinking": False},
+                {},
+                "Non-thinking model should not have thinking param",
+            ),
+            # Test case 4: Non-Claude model should not have thinking param
+            (
+                "non_claude_model",
+                {
+                    "provider": "openai",
+                    "model": "gpt-4",
+                },
+                {},
+                {},
+                "Non-Claude model should not have thinking param",
+            ),
+        ],
+    )
+    def test_disable_thinking_option(
+        self, test_id, config, model_config, expected_thinking_param, description
+    ):
+        """Test that the disable_thinking option correctly controls thinking mode."""
+        # Mock the necessary dependencies
+        with patch("ra_aid.llm.ChatAnthropic") as mock_anthropic:
+            with patch("ra_aid.llm.ChatOpenAI") as mock_openai:
+                with patch("ra_aid.llm.models_params") as mock_models_params:
+                    # Set up the mock to return the specified model_config
+                    mock_models_params.get.return_value = {config["model"]: model_config}
+                    
+                    # Set up the mock for get_provider_config
+                    with patch("ra_aid.llm.get_provider_config") as mock_get_provider_config:
+                        # Include disable_thinking in the provider config if it's in the test config
+                        provider_config = {
+                            "api_key": "test-key",
+                            "base_url": None,
+                        }
+                        if "disable_thinking" in config:
+                            provider_config["disable_thinking"] = config["disable_thinking"]
+                        
+                        mock_get_provider_config.return_value = provider_config
+                        
+                        # Call the function without passing disable_thinking directly
+                        create_llm_client(
+                            config["provider"],
+                            config["model"],
+                            temperature=None,
+                            is_expert=False
+                        )
+                        
+                        # Check if the correct parameters were passed
+                        if config["provider"] == "anthropic":
+                            # Get the kwargs passed to ChatAnthropic
+                            _, kwargs = mock_anthropic.call_args
+                            
+                            # Check if thinking param was included or not
+                            if expected_thinking_param:
+                                assert "thinking" in kwargs, f"Test {test_id} failed: {description}"
+                                assert kwargs["thinking"] == expected_thinking_param["thinking"], \
+                                    f"Test {test_id} failed: Thinking param doesn't match expected value"
+                            else:
+                                assert "thinking" not in kwargs, \
+                                    f"Test {test_id} failed: Thinking param should not be present"
--- a/tests/ra_aid/test_ensure_thinking_block.py
+++ b/tests/ra_aid/test_ensure_thinking_block.py
@ -0,0 +1,145 @@
+import pytest
+from unittest.mock import MagicMock, patch
+
+from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
+from ra_aid.agent_utils import _ensure_thinking_block
+
+
+class TestEnsureThinkingBlock:
+    """Test suite for the _ensure_thinking_block function."""
+
+    @pytest.mark.parametrize(
+        "test_id, messages, config, expected_changes, description",
+        [
+            # Test case 1: Non-Claude 3.7 model should not modify messages
+            (
+                "non_claude_model",
+                [
+                    HumanMessage(content="Hello"),
+                    AIMessage(content=[{"type": "text", "text": "Hi there"}]),
+                ],
+                {"provider": "openai", "model": "gpt-4"},
+                False,
+                "Non-Claude 3.7 model should not modify messages",
+            ),
+            # Test case 2: Claude 3.7 model with thinking disabled should not modify messages
+            (
+                "claude_thinking_disabled",
+                [
+                    HumanMessage(content="Hello"),
+                    AIMessage(content=[{"type": "text", "text": "Hi there"}]),
+                ],
+                {
+                    "provider": "anthropic",
+                    "model": "claude-3-7-sonnet-20250219",
+                    "disable_thinking": True,
+                },
+                False,
+                "Claude 3.7 with thinking disabled should not modify messages",
+            ),
+            # Test case 3: Claude 3.7 model with thinking enabled but no AI messages should not modify messages
+            (
+                "claude_no_ai_messages",
+                [
+                    HumanMessage(content="Hello"),
+                    SystemMessage(content="System message"),
+                ],
+                {"provider": "anthropic", "model": "claude-3-7-sonnet-20250219"},
+                False,
+                "Claude 3.7 with no AI messages should not modify messages",
+            ),
+            # Test case 4: Claude 3.7 model with thinking enabled and AI message with text content should log warning
+            (
+                "claude_ai_text_content",
+                [
+                    HumanMessage(content="Hello"),
+                    AIMessage(content="Text response"),
+                ],
+                {"provider": "anthropic", "model": "claude-3-7-sonnet-20250219"},
+                False,
+                "Claude 3.7 with AI message with text content should log warning",
+            ),
+            # Test case 5: Claude 3.7 model with thinking enabled and AI message with list content but no thinking block
+            (
+                "claude_ai_no_thinking_block",
+                [
+                    HumanMessage(content="Hello"),
+                    AIMessage(content=[{"type": "text", "text": "Hi there"}]),
+                ],
+                {"provider": "anthropic", "model": "claude-3-7-sonnet-20250219"},
+                True,
+                "Claude 3.7 with AI message without thinking block should add one",
+            ),
+            # Test case 6: Claude 3.7 model with thinking enabled and AI message with list content with thinking block
+            (
+                "claude_ai_with_thinking_block",
+                [
+                    HumanMessage(content="Hello"),
+                    AIMessage(
+                        content=[
+                            {"type": "thinking", "thinking": "Let me think..."},
+                            {"type": "text", "text": "Hi there"},
+                        ]
+                    ),
+                ],
+                {"provider": "anthropic", "model": "claude-3-7-sonnet-20250219"},
+                False,
+                "Claude 3.7 with AI message with thinking block should not modify it",
+            ),
+            # Test case 7: Claude 3.7 model with thinking enabled and multiple AI messages
+            (
+                "claude_multiple_ai_messages",
+                [
+                    HumanMessage(content="Hello"),
+                    AIMessage(content=[{"type": "text", "text": "First response"}]),
+                    HumanMessage(content="Follow-up"),
+                    AIMessage(content=[{"type": "text", "text": "Second response"}]),
+                ],
+                {"provider": "anthropic", "model": "claude-3-7-sonnet-20250219"},
+                True,
+                "Claude 3.7 with multiple AI messages should add thinking blocks to all",
+            ),
+        ],
+    )
+    def test_ensure_thinking_block(
+        self, test_id, messages, config, expected_changes, description
+    ):
+        """Test the _ensure_thinking_block function with various inputs."""
+        # Mock the logger
+        with patch("ra_aid.agent_utils.logger") as mock_logger:
+            # Mock the models_params dictionary
+            with patch("ra_aid.agent_utils.models_params") as mock_models_params:
+                # Set up the mock to return supports_thinking=True for Claude 3.7 models
+                if "claude-3-7" in config.get("model", ""):
+                    mock_models_params.get.return_value = {
+                        config["model"]: {"supports_thinking": True}
+                    }
+                else:
+                    mock_models_params.get.return_value = {}
+
+                # Make a copy of the original messages for comparison
+                original_messages = [
+                    AIMessage(content=msg.content) if isinstance(msg, AIMessage) else msg
+                    for msg in messages
+                ]
+
+                # Call the function
+                result = _ensure_thinking_block(messages, config)
+
+                # Check if the result is different from the input when expected
+                if expected_changes:
+                    assert result != original_messages, f"Test {test_id} failed: {description}"
+                    
+                    # Check that AI messages have thinking blocks
+                    for msg in result:
+                        if hasattr(msg, "type") and msg.type == "ai":
+                            if isinstance(msg.content, list) and len(msg.content) > 0:
+                                assert msg.content[0].get("type") == "thinking" or msg.content[0].get("type") == "redacted_thinking", \
+                                    f"Test {test_id} failed: AI message should have thinking block"
+                else:
+                    # Check that the messages were not modified
+                    assert result == messages, f"Test {test_id} failed: {description}"
+
+                # Check for warning logs for text content
+                if "claude_ai_text_content" == test_id:
+                    mock_logger.warning.assert_called_once()
--- a/tests/ra_aid/test_thinking_integration.py
+++ b/tests/ra_aid/test_thinking_integration.py
@ -0,0 +1,91 @@
+import pytest
+from unittest.mock import MagicMock, patch
+
+from langchain_core.messages import AIMessage, HumanMessage
+from ra_aid.agent_utils import run_agent_with_retry
+
+
+class TestThinkingIntegration:
+    """Test suite for the integration of thinking block functionality in run_agent_with_retry."""
+
+    @pytest.mark.parametrize(
+        "test_id, config, model_name, should_ensure_thinking, description",
+        [
+            # Test case 1: Claude 3.7 model with thinking enabled should ensure thinking blocks
+            (
+                "claude_thinking_enabled",
+                {
+                    "provider": "anthropic",
+                    "model": "claude-3-7-sonnet-20250219",
+                },
+                "claude-3-7-sonnet-20250219",
+                True,
+                "Claude 3.7 should ensure thinking blocks",
+            ),
+            # Test case 2: Claude 3.7 model with thinking disabled should not ensure thinking blocks
+            (
+                "claude_thinking_disabled",
+                {
+                    "provider": "anthropic",
+                    "model": "claude-3-7-sonnet-20250219",
+                    "disable_thinking": True,
+                },
+                "claude-3-7-sonnet-20250219",
+                False,
+                "Claude 3.7 with thinking disabled should not ensure thinking blocks",
+            ),
+            # Test case 3: Non-Claude 3.7 model should not ensure thinking blocks
+            (
+                "non_claude_model",
+                {
+                    "provider": "openai",
+                    "model": "gpt-4",
+                },
+                "gpt-4",
+                False,
+                "Non-Claude model should not ensure thinking blocks",
+            ),
+        ],
+    )
+    def test_run_agent_with_retry_thinking_integration(
+        self, test_id, config, model_name, should_ensure_thinking, description
+    ):
+        """Test that run_agent_with_retry correctly integrates thinking block functionality."""
+        # Mock the necessary dependencies
+        with patch("ra_aid.agent_utils.get_config_repository") as mock_get_config:
+            with patch("ra_aid.agent_utils._ensure_thinking_block") as mock_ensure_thinking:
+                with patch("ra_aid.agent_utils._run_agent_stream") as mock_run_agent_stream:
+                    with patch("ra_aid.agent_utils._setup_interrupt_handling") as mock_setup:
+                        with patch("ra_aid.agent_utils._restore_interrupt_handling"):
+                            with patch("ra_aid.agent_utils.agent_context"):
+                                with patch("ra_aid.agent_utils.InterruptibleSection"):
+                                    with patch("ra_aid.agent_utils.is_anthropic_claude") as mock_is_anthropic:
+                                        with patch("ra_aid.agent_utils.models_params") as mock_models_params:
+                                            # Set up the mocks
+                                            mock_get_config.return_value.get_all.return_value = config
+                                            mock_get_config.return_value.get.return_value = False
+                                            mock_setup.return_value = None
+                                            mock_run_agent_stream.return_value = True
+                                            
+                                            # Set up is_anthropic_claude to return True for Claude models
+                                            mock_is_anthropic.return_value = "claude" in model_name.lower()
+                                            
+                                            # Set up models_params to return supports_thinking=True for Claude 3.7 models
+                                            if "claude-3-7" in model_name:
+                                                mock_models_params.get.return_value = {
+                                                    model_name: {"supports_thinking": True}
+                                                }
+                                            else:
+                                                mock_models_params.get.return_value = {}
+                                            
+                                            # Create a mock agent
+                                            mock_agent = MagicMock()
+                                            
+                                            # Call the function
+                                            run_agent_with_retry(mock_agent, "Test prompt")
+                                            
+                                            # Check if _ensure_thinking_block was called correctly
+                                            if should_ensure_thinking:
+                                                mock_ensure_thinking.assert_called()
+                                            else:
+                                                mock_ensure_thinking.assert_not_called()