waveterm/aiprompts/usechat-backend-design.md
Mike Sawka 6e3554407b
AI SDK Backend (#2336)
Working on AI SDK compatible backends for OpenAI and Anthropic. Thinking + ToolUse etc.  For use with AI SDK useChat on frontend.  Still needs more testing, WIP, but this is a good start.  Want to get this committed to so I can work on more integrations.
2025-09-12 12:56:24 -07:00

463 lines
No EOL
14 KiB
Markdown

# useChat Compatible Backend Design for Wave Terminal
## Overview
This document outlines how to create a `useChat()` compatible backend API using Go and Server-Sent Events (SSE) to replace the current complex RPC-based AI chat system. The goal is to leverage Vercel AI SDK's `useChat()` hook while maintaining all existing AI provider functionality.
## Current vs Target Architecture
### Current Architecture
```
Frontend (React) → Custom RPC → Go Backend → AI Providers
- 10+ Jotai atoms for state management
- Custom WaveAIStreamRequest/WaveAIPacketType
- Complex configuration merging in frontend
- Custom streaming protocol over WebSocket
```
### Target Architecture
```
Frontend (useChat) → HTTP/SSE → Go Backend → AI Providers
- Single useChat() hook manages all state
- Standard HTTP POST + SSE streaming
- Backend-driven configuration resolution
- Standard AI SDK streaming format
```
## API Design
### 1. Endpoint Structure
**Chat Streaming Endpoint:**
```
POST /api/ai/chat/{blockId}?preset={presetKey}
```
**Conversation Persistence Endpoints:**
```
POST /api/ai/conversations/{blockId} # Save conversation
GET /api/ai/conversations/{blockId} # Load conversation
```
**Why this approach:**
- `blockId`: Identifies the conversation context (existing Wave concept)
- `preset`: URL parameter for AI configuration preset
- **Separate persistence**: Clean separation of streaming vs storage
- **Fast localhost calls**: Frontend can call both endpoints quickly
- **Simple backend**: Each endpoint has single responsibility
### 2. Request Format & Message Flow
**Simplified Approach:**
- Frontend manages **entire conversation state** (like all modern chat apps)
- Frontend sends **complete message history** with each request
- Backend just processes the messages and streams response
- Frontend handles persistence via existing Wave file system
**Standard useChat() Request:**
```json
{
"messages": [
{
"id": "msg-1",
"role": "user",
"content": "Hello world"
},
{
"id": "msg-2",
"role": "assistant",
"content": "Hi there!"
},
{
"id": "msg-3",
"role": "user",
"content": "How are you?" // <- NEW message user just typed
}
]
}
```
**Backend Processing:**
1. **Receive complete conversation** from frontend
2. **Resolve AI configuration** (preset, model, etc.)
3. **Send messages directly** to AI provider
4. **Stream response** back to frontend
5. **Frontend calls separate persistence endpoint** when needed
**Optional Extensions:**
```json
{
"messages": [...],
"options": {
"temperature": 0.7,
"maxTokens": 1000,
"model": "gpt-4" // Override preset model
}
}
```
### 3. Configuration Resolution
**Priority Order (backend resolves):**
1. **Request options** (highest priority)
2. **URL preset parameter**
3. **Block metadata** (`block.meta["ai:preset"]`)
4. **Global settings** (`settings["ai:preset"]`)
5. **Default preset** (lowest priority)
**Backend Logic:**
```go
func resolveAIConfig(blockId, presetKey string, requestOptions map[string]any) (*WaveAIOptsType, error) {
// 1. Load block metadata
block := getBlock(blockId)
blockPreset := block.Meta["ai:preset"]
// 2. Load global settings
settings := getGlobalSettings()
globalPreset := settings["ai:preset"]
// 3. Resolve preset hierarchy
finalPreset := presetKey
if finalPreset == "" {
finalPreset = blockPreset
}
if finalPreset == "" {
finalPreset = globalPreset
}
if finalPreset == "" {
finalPreset = "default"
}
// 4. Load and merge preset config
presetConfig := loadPreset(finalPreset)
// 5. Apply request overrides
return mergeAIConfig(presetConfig, requestOptions), nil
}
```
### 4. Response Format (SSE)
**Key Insight: Minimal Conversion**
Most AI providers (OpenAI, Anthropic) already return SSE streams. Instead of converting to our custom format and back, we can **proxy/transform** their streams directly to useChat format.
**Headers:**
```
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
Access-Control-Allow-Origin: *
```
**useChat Expected Format:**
```
data: {"type":"text","text":"Hello"}
data: {"type":"text","text":" world"}
data: {"type":"text","text":"!"}
data: {"type":"finish","finish_reason":"stop","usage":{"prompt_tokens":10,"completion_tokens":3,"total_tokens":13}}
data: [DONE]
```
**Provider Stream Transformation:**
- **OpenAI**: Already SSE → direct proxy (no conversion needed)
- **Anthropic**: Already SSE → direct proxy (minimal field mapping)
- **Google**: Already streaming → direct proxy
- **Perplexity**: OpenAI-compatible → direct proxy
- **Wave Cloud**: WebSocket → **requires conversion** (only one needing transformation)
**Error Format:**
```
data: {"type":"error","error":"API key invalid"}
data: [DONE]
```
## Implementation Plan
### Phase 1: HTTP Handler
```go
// Simplified approach: Direct provider streaming with minimal transformation
func (s *WshServer) HandleAIChat(w http.ResponseWriter, r *http.Request) {
// 1. Parse URL parameters
blockId := mux.Vars(r)["blockId"]
presetKey := r.URL.Query().Get("preset")
// 2. Parse request body
var req struct {
Messages []struct {
Role string `json:"role"`
Content string `json:"content"`
} `json:"messages"`
Options map[string]any `json:"options,omitempty"`
}
json.NewDecoder(r.Body).Decode(&req)
// 3. Resolve configuration
aiOpts, err := resolveAIConfig(blockId, presetKey, req.Options)
if err != nil {
http.Error(w, err.Error(), 400)
return
}
// 4. Set SSE headers
w.Header().Set("Content-Type", "text/event-stream")
w.Header().Set("Cache-Control", "no-cache")
w.Header().Set("Connection", "keep-alive")
// 5. Route to provider and stream directly
switch aiOpts.APIType {
case "openai", "perplexity":
// Direct proxy - these are already SSE compatible
streamDirectSSE(w, r.Context(), aiOpts, req.Messages)
case "anthropic":
// Direct proxy with minimal field mapping
streamAnthropicSSE(w, r.Context(), aiOpts, req.Messages)
case "google":
// Direct proxy
streamGoogleSSE(w, r.Context(), aiOpts, req.Messages)
default:
// Wave Cloud - only one requiring conversion (WebSocket → SSE)
if isCloudAIRequest(aiOpts) {
streamWaveCloudToUseChat(w, r.Context(), aiOpts, req.Messages)
} else {
http.Error(w, "Unsupported provider", 400)
}
}
}
// Example: Direct OpenAI streaming (minimal conversion)
func streamOpenAIToUseChat(w http.ResponseWriter, ctx context.Context, opts *WaveAIOptsType, messages []Message) {
client := openai.NewClient(opts.APIToken)
stream, err := client.CreateChatCompletionStream(ctx, openai.ChatCompletionRequest{
Model: opts.Model,
Messages: convertToOpenAIMessages(messages),
Stream: true,
})
if err != nil {
fmt.Fprintf(w, "data: {\"type\":\"error\",\"error\":%q}\n\n", err.Error())
fmt.Fprintf(w, "data: [DONE]\n\n")
return
}
defer stream.Close()
for {
response, err := stream.Recv()
if errors.Is(err, io.EOF) {
fmt.Fprintf(w, "data: [DONE]\n\n")
return
}
if err != nil {
fmt.Fprintf(w, "data: {\"type\":\"error\",\"error\":%q}\n\n", err.Error())
fmt.Fprintf(w, "data: [DONE]\n\n")
return
}
// Direct transformation: OpenAI format → useChat format
for _, choice := range response.Choices {
if choice.Delta.Content != "" {
fmt.Fprintf(w, "data: {\"type\":\"text\",\"text\":%q}\n\n", choice.Delta.Content)
}
if choice.FinishReason != "" {
fmt.Fprintf(w, "data: {\"type\":\"finish\",\"finish_reason\":%q}\n\n", choice.FinishReason)
}
}
w.(http.Flusher).Flush()
}
}
// Wave Cloud conversion (only provider needing transformation)
func streamWaveCloudToUseChat(w http.ResponseWriter, ctx context.Context, opts *WaveAIOptsType, messages []Message) {
// Use existing Wave Cloud WebSocket logic
waveReq := wshrpc.WaveAIStreamRequest{
Opts: opts,
Prompt: convertMessagesToPrompt(messages),
}
stream := waveai.RunAICommand(ctx, waveReq) // Returns WebSocket stream
// Convert Wave Cloud packets to useChat SSE format
for packet := range stream {
if packet.Error != nil {
fmt.Fprintf(w, "data: {\"type\":\"error\",\"error\":%q}\n\n", packet.Error.Error())
break
}
resp := packet.Response
if resp.Text != "" {
fmt.Fprintf(w, "data: {\"type\":\"text\",\"text\":%q}\n\n", resp.Text)
}
if resp.FinishReason != "" {
usage := ""
if resp.Usage != nil {
usage = fmt.Sprintf(",\"usage\":{\"prompt_tokens\":%d,\"completion_tokens\":%d,\"total_tokens\":%d}",
resp.Usage.PromptTokens, resp.Usage.CompletionTokens, resp.Usage.TotalTokens)
}
fmt.Fprintf(w, "data: {\"type\":\"finish\",\"finish_reason\":%q%s}\n\n", resp.FinishReason, usage)
}
w.(http.Flusher).Flush()
}
fmt.Fprintf(w, "data: [DONE]\n\n")
}
```
### Phase 2: Frontend Integration
```typescript
import { useChat } from '@ai-sdk/react';
function WaveAI({ blockId }: { blockId: string }) {
// Get current preset from block metadata or settings
const preset = useAtomValue(currentPresetAtom);
const { messages, input, handleInputChange, handleSubmit, isLoading, error } = useChat({
api: `/api/ai/chat/${blockId}?preset=${preset}`,
initialMessages: [], // Load from existing aidata file
onFinish: (message) => {
// Save conversation to aidata file
saveConversation(blockId, messages);
}
});
return (
<div className="flex flex-col h-full">
<div className="flex-1 overflow-y-auto">
{messages.map(message => (
<div key={message.id} className={`message ${message.role}`}>
<Markdown text={message.content} />
</div>
))}
{isLoading && <TypingIndicator />}
{error && <div className="error">{error.message}</div>}
</div>
<form onSubmit={handleSubmit} className="border-t p-4">
<input
value={input}
onChange={handleInputChange}
placeholder="Type a message..."
className="w-full p-2 border rounded"
/>
</form>
</div>
);
}
```
### Phase 3: Advanced Features
#### Multi-modal Support
```typescript
// useChat supports multi-modal out of the box
const { messages, append } = useChat({
api: `/api/ai/chat/${blockId}`,
});
// Send image + text
await append({
role: 'user',
content: [
{ type: 'text', text: 'What do you see in this image?' },
{ type: 'image', image: imageFile }
]
});
```
#### Thinking Models
```go
// Backend detects thinking models and formats appropriately
if isThinkingModel(aiOpts.Model) {
// Send thinking content separately
fmt.Fprintf(w, "data: {\"type\":\"thinking\",\"text\":%q}\n\n", thinkingText)
fmt.Fprintf(w, "data: {\"type\":\"text\",\"text\":%q}\n\n", responseText)
}
```
#### Context Injection
```typescript
// Add system messages or context via useChat options
const { messages, append } = useChat({
api: `/api/ai/chat/${blockId}`,
initialMessages: [
{
role: 'system',
content: 'You are a helpful terminal assistant...'
}
]
});
```
## Migration Strategy
### 1. Parallel Implementation
- Keep existing RPC system running
- Add new HTTP/SSE endpoint alongside
- Feature flag to switch between systems
### 2. Gradual Migration
- Start with new blocks using useChat
- Migrate existing conversations on first interaction
- Remove RPC system once stable
### 3. Backward Compatibility
- Existing aidata files work unchanged
- Same provider backends (OpenAI, Anthropic, etc.)
- Same configuration system
## Benefits
### Complexity Reduction
- **Frontend**: ~900 lines → ~100 lines (90% reduction)
- **State Management**: 10+ atoms → 1 useChat hook
- **Configuration**: Frontend merging → Backend resolution
- **Streaming**: Custom protocol → Standard SSE
### Modern Features
- **Multi-modal**: Images, files, audio support
- **Thinking Models**: Built-in reasoning trace support
- **Conversation Management**: Edit, retry, branch conversations
- **Error Handling**: Automatic retry and error boundaries
- **Performance**: Optimized streaming and batching
### Developer Experience
- **Type Safety**: Full TypeScript support
- **Testing**: Standard HTTP endpoints easier to test
- **Debugging**: Standard browser dev tools work
- **Documentation**: Leverage AI SDK docs and community
## Configuration Examples
### URL-based Configuration
```
POST /api/ai/chat/block-123?preset=claude-coding
POST /api/ai/chat/block-456?preset=gpt4-creative
```
### Header-based Overrides
```
POST /api/ai/chat/block-123
X-AI-Model: gpt-4-turbo
X-AI-Temperature: 0.8
```
### Request Body Options
```json
{
"messages": [...],
"options": {
"model": "claude-3-sonnet",
"temperature": 0.7,
"maxTokens": 2000
}
}
```
This design maintains all existing functionality while dramatically simplifying the implementation and adding modern AI chat capabilities.