RA.Aid/adr/webui.md

2.5 KiB

A couple thoughts:

  1. I like that you're using websockets. I think a bi-directional message passing is what we will ultimately need, and this sets up the groundwork for that.
  2. Since rn we're just sending raw ra-aid text output to the front-end, an easy win to make it render nicer would be to put it in a non-wrapping or similar.
  3. I like the minimal html/js as an initial setup.

Thoughts for down the line if we merge this (the following would be in future PRs):

  1. We'll ultimately want to stream structured objects from the agent
  • Rn, the code has a ton of cases like console.print(Panel(panel_content, title="Expert Context", border_style="blue")). We will want to abstract these calls and have something like a send_agent_output function. Depending on config (e.g. CLI use vs web ui use), that would send an object onto the web socket stream, or send output to the console.
  1. We need a way to send user inputs back to the agent, e.g. anything currently using stdin (we don't use stdin directly rn, but we do have things that use tty or prompt the user for input which ultimately do use stdin)
  2. Tricky run_shell_command is really powerful, and it essentially passes fully interactive TTY through to the user. If we want to keep the same UX, we could do this potentially by using tty.js on the front-end and streaming the tty over websocket. Alternatively, we could simply disable TTY tools like run_shell_command or restrict them to a non-tty mode (e.g. simple command running, stdout/stderr capture only)
  3. Ideally we should keep the websocket and any API endpoints friendly to machine endpoints. JSON objects over websocket + any JSON REST endpoints we might need would satisfy this.
  4. The UX itself should be carefully considered. IMO one of the big benefits of a web UI is to be able to have the agent doing work which I can direct from a mobile device, so I think an efficient mobile-first UI would be ideal. For this, my initial thought was something like react + shadcn.
  5. Initially I had considered https://socket.io/. In this PR, we're just going straight to websockets, which is cool and has one less dependency. Not sure if socket.io would be worth it, but wanted to mention it. The overall architecture wouldn't change.
  6. Semi-related: we should have a way to send logs to a directory or to a logging backend.

@leonj1 I think starting with a small PR like this and then incrementally adding to the web UI is a good approach.

@sosacrazy126 would be interested in your thoughts since I know you were working on webui as well, want to respect the work you've done.