eval optimization

2025-02-21 17:26:31 -05:00 · 2025-02-21 17:26:31 -05:00 · 5102e1fabb
parent a022bb3586
commit 5102e1fabb
2 changed files with 12 additions and 9 deletions
--- a/ra_aid/prompts.py
+++ b/ra_aid/prompts.py
@ -127,7 +127,9 @@ Because this is a new project:
 # Research stage prompt - guides initial codebase analysis
 RESEARCH_PROMPT = """Current Date: {current_date}

-User query: {base_task} consult with the expert frequently --keep it simple
+User query: {base_task} 
+
+Consult with the expert frequently.

 Context from Previous Research (if available):
 Key Facts:
@ -220,10 +222,6 @@ No Planning or Problem-Solving
 You must remain strictly within the bounds of describing what currently exists.

 If the task requires *ANY* compilation, unit tests, or any other non-trivial changes, call request_implementation.
-If this is a trivial task that can be completed in one shot, do the change using tools available, call one_shot_completed, and immediately exit without saying anything.
-  Remember, many tasks are more complex and nuanced than they seem and still require requesting implementation.
-  For one shot tasks, still take some time to consider whether compilation, testing, or additional validation should be done to check your work.
-  If you implement the task yourself, do not request implementation.

 Thoroughness and Completeness:
    If this is determined to be a new/empty project (shown in Project Info), focus directly on the task.
@ -288,7 +286,11 @@ You have often been criticized for:
 {human_section}
 {web_research_section}

-NEVER ANNOUNCE WHAT YOU ARE DOING, JUST DO IT!
+
+DO NOT CHANGE ANY EXISTING TESTS
+YOU MUST RUN RELEVANT TESTS USING run_shell_command AS SOON AS POSSIBLE AS PART OF THE RESEARCH PROCESS.
+INSTALL TEST DEPS IF YOU NEED TO
+BEFORE DOING ANYTHING, CALL request_research TO FIND OUT HOW TO RUN TESTS ON THIS PROJECT IN GENERAL.
 """

 # Web research prompt - guides web search and information gathering
@ -586,6 +588,7 @@ You have often been criticized for:
  - Not calling tools/functions properly, e.g. leaving off required arguments, calling a tool in a loop, calling tools inappropriately.

 NEVER ANNOUNCE WHAT YOU ARE DOING, JUST DO IT!
+DO NOT CHANGE ANY EXISTING TESTS, BUT YOU MAY ADD YOUR OWN.
 """

 # Implementation stage prompt - guides specific task implementation
@ -980,4 +983,4 @@ You have often been criticized for:
 Remember, if you do not make any tool call (e.g. ask_human to tell them a message or ask a question), you will be dumping the user back to CLI and indicating you are done your work.

 NEVER ANNOUNCE WHAT YOU ARE DOING, JUST DO IT!
-"""
+"""
--- a/ra_aid/tool_configs.py
+++ b/ra_aid/tool_configs.py
@ -78,7 +78,7 @@ COMMON_TOOLS = get_read_only_tools()
 EXPERT_TOOLS = [emit_expert_context, ask_expert]
 RESEARCH_TOOLS = [
    emit_research_notes,
-    one_shot_completed,
+    #one_shot_completed,
    # *TEMPORARILY* disabled to improve tool calling perf.
    # monorepo_detected,
    # ui_detected,
@ -216,4 +216,4 @@ def get_chat_tools(
    if web_research_enabled:
        tools.append(request_web_research)

-    return tools
+    return tools