waveterm/aiprompts/usechat-backend-design.md
Mike Sawka 6e3554407b
AI SDK Backend (#2336)
Working on AI SDK compatible backends for OpenAI and Anthropic. Thinking + ToolUse etc.  For use with AI SDK useChat on frontend.  Still needs more testing, WIP, but this is a good start.  Want to get this committed to so I can work on more integrations.
2025-09-12 12:56:24 -07:00

14 KiB

useChat Compatible Backend Design for Wave Terminal

Overview

This document outlines how to create a useChat() compatible backend API using Go and Server-Sent Events (SSE) to replace the current complex RPC-based AI chat system. The goal is to leverage Vercel AI SDK's useChat() hook while maintaining all existing AI provider functionality.

Current vs Target Architecture

Current Architecture

Frontend (React) → Custom RPC → Go Backend → AI Providers
- 10+ Jotai atoms for state management
- Custom WaveAIStreamRequest/WaveAIPacketType
- Complex configuration merging in frontend
- Custom streaming protocol over WebSocket

Target Architecture

Frontend (useChat) → HTTP/SSE → Go Backend → AI Providers
- Single useChat() hook manages all state
- Standard HTTP POST + SSE streaming
- Backend-driven configuration resolution
- Standard AI SDK streaming format

API Design

1. Endpoint Structure

Chat Streaming Endpoint:

POST /api/ai/chat/{blockId}?preset={presetKey}

Conversation Persistence Endpoints:

POST /api/ai/conversations/{blockId}     # Save conversation
GET  /api/ai/conversations/{blockId}     # Load conversation

Why this approach:

  • blockId: Identifies the conversation context (existing Wave concept)
  • preset: URL parameter for AI configuration preset
  • Separate persistence: Clean separation of streaming vs storage
  • Fast localhost calls: Frontend can call both endpoints quickly
  • Simple backend: Each endpoint has single responsibility

2. Request Format & Message Flow

Simplified Approach:

  • Frontend manages entire conversation state (like all modern chat apps)
  • Frontend sends complete message history with each request
  • Backend just processes the messages and streams response
  • Frontend handles persistence via existing Wave file system

Standard useChat() Request:

{
  "messages": [
    {
      "id": "msg-1",
      "role": "user",
      "content": "Hello world"
    },
    {
      "id": "msg-2",
      "role": "assistant",
      "content": "Hi there!"
    },
    {
      "id": "msg-3",
      "role": "user",
      "content": "How are you?"  // <- NEW message user just typed
    }
  ]
}

Backend Processing:

  1. Receive complete conversation from frontend
  2. Resolve AI configuration (preset, model, etc.)
  3. Send messages directly to AI provider
  4. Stream response back to frontend
  5. Frontend calls separate persistence endpoint when needed

Optional Extensions:

{
  "messages": [...],
  "options": {
    "temperature": 0.7,
    "maxTokens": 1000,
    "model": "gpt-4"  // Override preset model
  }
}

3. Configuration Resolution

Priority Order (backend resolves):

  1. Request options (highest priority)
  2. URL preset parameter
  3. Block metadata (block.meta["ai:preset"])
  4. Global settings (settings["ai:preset"])
  5. Default preset (lowest priority)

Backend Logic:

func resolveAIConfig(blockId, presetKey string, requestOptions map[string]any) (*WaveAIOptsType, error) {
    // 1. Load block metadata
    block := getBlock(blockId)
    blockPreset := block.Meta["ai:preset"]
    
    // 2. Load global settings
    settings := getGlobalSettings()
    globalPreset := settings["ai:preset"]
    
    // 3. Resolve preset hierarchy
    finalPreset := presetKey
    if finalPreset == "" {
        finalPreset = blockPreset
    }
    if finalPreset == "" {
        finalPreset = globalPreset
    }
    if finalPreset == "" {
        finalPreset = "default"
    }
    
    // 4. Load and merge preset config
    presetConfig := loadPreset(finalPreset)
    
    // 5. Apply request overrides
    return mergeAIConfig(presetConfig, requestOptions), nil
}

4. Response Format (SSE)

Key Insight: Minimal Conversion Most AI providers (OpenAI, Anthropic) already return SSE streams. Instead of converting to our custom format and back, we can proxy/transform their streams directly to useChat format.

Headers:

Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
Access-Control-Allow-Origin: *

useChat Expected Format:

data: {"type":"text","text":"Hello"}

data: {"type":"text","text":" world"}

data: {"type":"text","text":"!"}

data: {"type":"finish","finish_reason":"stop","usage":{"prompt_tokens":10,"completion_tokens":3,"total_tokens":13}}

data: [DONE]

Provider Stream Transformation:

  • OpenAI: Already SSE → direct proxy (no conversion needed)
  • Anthropic: Already SSE → direct proxy (minimal field mapping)
  • Google: Already streaming → direct proxy
  • Perplexity: OpenAI-compatible → direct proxy
  • Wave Cloud: WebSocket → requires conversion (only one needing transformation)

Error Format:

data: {"type":"error","error":"API key invalid"}

data: [DONE]

Implementation Plan

Phase 1: HTTP Handler

// Simplified approach: Direct provider streaming with minimal transformation
func (s *WshServer) HandleAIChat(w http.ResponseWriter, r *http.Request) {
    // 1. Parse URL parameters
    blockId := mux.Vars(r)["blockId"]
    presetKey := r.URL.Query().Get("preset")
    
    // 2. Parse request body
    var req struct {
        Messages []struct {
            Role    string `json:"role"`
            Content string `json:"content"`
        } `json:"messages"`
        Options map[string]any `json:"options,omitempty"`
    }
    json.NewDecoder(r.Body).Decode(&req)
    
    // 3. Resolve configuration
    aiOpts, err := resolveAIConfig(blockId, presetKey, req.Options)
    if err != nil {
        http.Error(w, err.Error(), 400)
        return
    }
    
    // 4. Set SSE headers
    w.Header().Set("Content-Type", "text/event-stream")
    w.Header().Set("Cache-Control", "no-cache")
    w.Header().Set("Connection", "keep-alive")
    
    // 5. Route to provider and stream directly
    switch aiOpts.APIType {
    case "openai", "perplexity":
        // Direct proxy - these are already SSE compatible
        streamDirectSSE(w, r.Context(), aiOpts, req.Messages)
    case "anthropic":
        // Direct proxy with minimal field mapping
        streamAnthropicSSE(w, r.Context(), aiOpts, req.Messages)
    case "google":
        // Direct proxy
        streamGoogleSSE(w, r.Context(), aiOpts, req.Messages)
    default:
        // Wave Cloud - only one requiring conversion (WebSocket → SSE)
        if isCloudAIRequest(aiOpts) {
            streamWaveCloudToUseChat(w, r.Context(), aiOpts, req.Messages)
        } else {
            http.Error(w, "Unsupported provider", 400)
        }
    }
}

// Example: Direct OpenAI streaming (minimal conversion)
func streamOpenAIToUseChat(w http.ResponseWriter, ctx context.Context, opts *WaveAIOptsType, messages []Message) {
    client := openai.NewClient(opts.APIToken)
    
    stream, err := client.CreateChatCompletionStream(ctx, openai.ChatCompletionRequest{
        Model:    opts.Model,
        Messages: convertToOpenAIMessages(messages),
        Stream:   true,
    })
    if err != nil {
        fmt.Fprintf(w, "data: {\"type\":\"error\",\"error\":%q}\n\n", err.Error())
        fmt.Fprintf(w, "data: [DONE]\n\n")
        return
    }
    defer stream.Close()
    
    for {
        response, err := stream.Recv()
        if errors.Is(err, io.EOF) {
            fmt.Fprintf(w, "data: [DONE]\n\n")
            return
        }
        if err != nil {
            fmt.Fprintf(w, "data: {\"type\":\"error\",\"error\":%q}\n\n", err.Error())
            fmt.Fprintf(w, "data: [DONE]\n\n")
            return
        }
        
        // Direct transformation: OpenAI format → useChat format
        for _, choice := range response.Choices {
            if choice.Delta.Content != "" {
                fmt.Fprintf(w, "data: {\"type\":\"text\",\"text\":%q}\n\n", choice.Delta.Content)
            }
            if choice.FinishReason != "" {
                fmt.Fprintf(w, "data: {\"type\":\"finish\",\"finish_reason\":%q}\n\n", choice.FinishReason)
            }
        }
        
        w.(http.Flusher).Flush()
    }
}

// Wave Cloud conversion (only provider needing transformation)
func streamWaveCloudToUseChat(w http.ResponseWriter, ctx context.Context, opts *WaveAIOptsType, messages []Message) {
    // Use existing Wave Cloud WebSocket logic
    waveReq := wshrpc.WaveAIStreamRequest{
        Opts:   opts,
        Prompt: convertMessagesToPrompt(messages),
    }
    
    stream := waveai.RunAICommand(ctx, waveReq) // Returns WebSocket stream
    
    // Convert Wave Cloud packets to useChat SSE format
    for packet := range stream {
        if packet.Error != nil {
            fmt.Fprintf(w, "data: {\"type\":\"error\",\"error\":%q}\n\n", packet.Error.Error())
            break
        }
        
        resp := packet.Response
        if resp.Text != "" {
            fmt.Fprintf(w, "data: {\"type\":\"text\",\"text\":%q}\n\n", resp.Text)
        }
        if resp.FinishReason != "" {
            usage := ""
            if resp.Usage != nil {
                usage = fmt.Sprintf(",\"usage\":{\"prompt_tokens\":%d,\"completion_tokens\":%d,\"total_tokens\":%d}",
                    resp.Usage.PromptTokens, resp.Usage.CompletionTokens, resp.Usage.TotalTokens)
            }
            fmt.Fprintf(w, "data: {\"type\":\"finish\",\"finish_reason\":%q%s}\n\n", resp.FinishReason, usage)
        }
        
        w.(http.Flusher).Flush()
    }
    
    fmt.Fprintf(w, "data: [DONE]\n\n")
}

Phase 2: Frontend Integration

import { useChat } from '@ai-sdk/react';

function WaveAI({ blockId }: { blockId: string }) {
    // Get current preset from block metadata or settings
    const preset = useAtomValue(currentPresetAtom);
    
    const { messages, input, handleInputChange, handleSubmit, isLoading, error } = useChat({
        api: `/api/ai/chat/${blockId}?preset=${preset}`,
        initialMessages: [], // Load from existing aidata file
        onFinish: (message) => {
            // Save conversation to aidata file
            saveConversation(blockId, messages);
        }
    });
    
    return (
        <div className="flex flex-col h-full">
            <div className="flex-1 overflow-y-auto">
                {messages.map(message => (
                    <div key={message.id} className={`message ${message.role}`}>
                        <Markdown text={message.content} />
                    </div>
                ))}
                {isLoading && <TypingIndicator />}
                {error && <div className="error">{error.message}</div>}
            </div>
            
            <form onSubmit={handleSubmit} className="border-t p-4">
                <input
                    value={input}
                    onChange={handleInputChange}
                    placeholder="Type a message..."
                    className="w-full p-2 border rounded"
                />
            </form>
        </div>
    );
}

Phase 3: Advanced Features

Multi-modal Support

// useChat supports multi-modal out of the box
const { messages, append } = useChat({
    api: `/api/ai/chat/${blockId}`,
});

// Send image + text
await append({
    role: 'user',
    content: [
        { type: 'text', text: 'What do you see in this image?' },
        { type: 'image', image: imageFile }
    ]
});

Thinking Models

// Backend detects thinking models and formats appropriately
if isThinkingModel(aiOpts.Model) {
    // Send thinking content separately
    fmt.Fprintf(w, "data: {\"type\":\"thinking\",\"text\":%q}\n\n", thinkingText)
    fmt.Fprintf(w, "data: {\"type\":\"text\",\"text\":%q}\n\n", responseText)
}

Context Injection

// Add system messages or context via useChat options
const { messages, append } = useChat({
    api: `/api/ai/chat/${blockId}`,
    initialMessages: [
        {
            role: 'system',
            content: 'You are a helpful terminal assistant...'
        }
    ]
});

Migration Strategy

1. Parallel Implementation

  • Keep existing RPC system running
  • Add new HTTP/SSE endpoint alongside
  • Feature flag to switch between systems

2. Gradual Migration

  • Start with new blocks using useChat
  • Migrate existing conversations on first interaction
  • Remove RPC system once stable

3. Backward Compatibility

  • Existing aidata files work unchanged
  • Same provider backends (OpenAI, Anthropic, etc.)
  • Same configuration system

Benefits

Complexity Reduction

  • Frontend: ~900 lines → ~100 lines (90% reduction)
  • State Management: 10+ atoms → 1 useChat hook
  • Configuration: Frontend merging → Backend resolution
  • Streaming: Custom protocol → Standard SSE

Modern Features

  • Multi-modal: Images, files, audio support
  • Thinking Models: Built-in reasoning trace support
  • Conversation Management: Edit, retry, branch conversations
  • Error Handling: Automatic retry and error boundaries
  • Performance: Optimized streaming and batching

Developer Experience

  • Type Safety: Full TypeScript support
  • Testing: Standard HTTP endpoints easier to test
  • Debugging: Standard browser dev tools work
  • Documentation: Leverage AI SDK docs and community

Configuration Examples

URL-based Configuration

POST /api/ai/chat/block-123?preset=claude-coding
POST /api/ai/chat/block-456?preset=gpt4-creative

Header-based Overrides

POST /api/ai/chat/block-123
X-AI-Model: gpt-4-turbo
X-AI-Temperature: 0.8

Request Body Options

{
  "messages": [...],
  "options": {
    "model": "claude-3-sonnet",
    "temperature": 0.7,
    "maxTokens": 2000
  }
}

This design maintains all existing functionality while dramatically simplifying the implementation and adding modern AI chat capabilities.