mirror of https://github.com/wavetermdev/waveterm.git synced 2025-11-28 05:00:26 +08:00

Mike Sawka 6e3554407b

Working on AI SDK compatible backends for OpenAI and Anthropic. Thinking + ToolUse etc.  For use with AI SDK useChat on frontend.  Still needs more testing, WIP, but this is a good start.  Want to get this committed to so I can work on more integrations.

2025-09-12 12:56:24 -07:00

14 KiB

Raw Permalink Blame History

useChat Compatible Backend Design for Wave Terminal

Overview

This document outlines how to create a useChat() compatible backend API using Go and Server-Sent Events (SSE) to replace the current complex RPC-based AI chat system. The goal is to leverage Vercel AI SDK's useChat() hook while maintaining all existing AI provider functionality.

Current vs Target Architecture

Current Architecture

Frontend (React) → Custom RPC → Go Backend → AI Providers
- 10+ Jotai atoms for state management
- Custom WaveAIStreamRequest/WaveAIPacketType
- Complex configuration merging in frontend
- Custom streaming protocol over WebSocket

Target Architecture

Frontend (useChat) → HTTP/SSE → Go Backend → AI Providers
- Single useChat() hook manages all state
- Standard HTTP POST + SSE streaming
- Backend-driven configuration resolution
- Standard AI SDK streaming format

API Design

1. Endpoint Structure

Chat Streaming Endpoint:

POST /api/ai/chat/{blockId}?preset={presetKey}

Conversation Persistence Endpoints:

POST /api/ai/conversations/{blockId}     # Save conversation
GET  /api/ai/conversations/{blockId}     # Load conversation

Why this approach:

blockId: Identifies the conversation context (existing Wave concept)
preset: URL parameter for AI configuration preset
Separate persistence: Clean separation of streaming vs storage
Fast localhost calls: Frontend can call both endpoints quickly
Simple backend: Each endpoint has single responsibility

2. Request Format & Message Flow

Simplified Approach:

Frontend manages entire conversation state (like all modern chat apps)
Frontend sends complete message history with each request
Backend just processes the messages and streams response
Frontend handles persistence via existing Wave file system

Standard useChat() Request:

{
  "messages": [
    {
      "id": "msg-1",
      "role": "user",
      "content": "Hello world"
    },
    {
      "id": "msg-2",
      "role": "assistant",
      "content": "Hi there!"
    },
    {
      "id": "msg-3",
      "role": "user",
      "content": "How are you?"  // <- NEW message user just typed
    }
  ]
}

Backend Processing:

Receive complete conversation from frontend
Resolve AI configuration (preset, model, etc.)
Send messages directly to AI provider
Stream response back to frontend
Frontend calls separate persistence endpoint when needed

Optional Extensions:

{
  "messages": [...],
  "options": {
    "temperature": 0.7,
    "maxTokens": 1000,
    "model": "gpt-4"  // Override preset model
  }
}

3. Configuration Resolution

Priority Order (backend resolves):

Request options (highest priority)
URL preset parameter
Block metadata (block.meta["ai:preset"])
Global settings (settings["ai:preset"])
Default preset (lowest priority)

Backend Logic:

func resolveAIConfig(blockId, presetKey string, requestOptions map[string]any) (*WaveAIOptsType, error) {
    // 1. Load block metadata
    block := getBlock(blockId)
    blockPreset := block.Meta["ai:preset"]
    
    // 2. Load global settings
    settings := getGlobalSettings()
    globalPreset := settings["ai:preset"]
    
    // 3. Resolve preset hierarchy
    finalPreset := presetKey
    if finalPreset == "" {
        finalPreset = blockPreset
    }
    if finalPreset == "" {
        finalPreset = globalPreset
    }
    if finalPreset == "" {
        finalPreset = "default"
    }
    
    // 4. Load and merge preset config
    presetConfig := loadPreset(finalPreset)
    
    // 5. Apply request overrides
    return mergeAIConfig(presetConfig, requestOptions), nil
}

4. Response Format (SSE)

Key Insight: Minimal Conversion Most AI providers (OpenAI, Anthropic) already return SSE streams. Instead of converting to our custom format and back, we can proxy/transform their streams directly to useChat format.

Headers:

Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
Access-Control-Allow-Origin: *

useChat Expected Format:

data: {"type":"text","text":"Hello"}

data: {"type":"text","text":" world"}

data: {"type":"text","text":"!"}

data: {"type":"finish","finish_reason":"stop","usage":{"prompt_tokens":10,"completion_tokens":3,"total_tokens":13}}

data: [DONE]

Provider Stream Transformation:

OpenAI: Already SSE → direct proxy (no conversion needed)
Anthropic: Already SSE → direct proxy (minimal field mapping)
Google: Already streaming → direct proxy
Perplexity: OpenAI-compatible → direct proxy
Wave Cloud: WebSocket → requires conversion (only one needing transformation)

Error Format:

data: {"type":"error","error":"API key invalid"}

data: [DONE]

Implementation Plan

Phase 1: HTTP Handler

// Simplified approach: Direct provider streaming with minimal transformation
func (s *WshServer) HandleAIChat(w http.ResponseWriter, r *http.Request) {
    // 1. Parse URL parameters
    blockId := mux.Vars(r)["blockId"]
    presetKey := r.URL.Query().Get("preset")
    
    // 2. Parse request body
    var req struct {
        Messages []struct {
            Role    string `json:"role"`
            Content string `json:"content"`
        } `json:"messages"`
        Options map[string]any `json:"options,omitempty"`
    }
    json.NewDecoder(r.Body).Decode(&req)
    
    // 3. Resolve configuration
    aiOpts, err := resolveAIConfig(blockId, presetKey, req.Options)
    if err != nil {
        http.Error(w, err.Error(), 400)
        return
    }
    
    // 4. Set SSE headers
    w.Header().Set("Content-Type", "text/event-stream")
    w.Header().Set("Cache-Control", "no-cache")
    w.Header().Set("Connection", "keep-alive")
    
    // 5. Route to provider and stream directly
    switch aiOpts.APIType {
    case "openai", "perplexity":
        // Direct proxy - these are already SSE compatible
        streamDirectSSE(w, r.Context(), aiOpts, req.Messages)
    case "anthropic":
        // Direct proxy with minimal field mapping
        streamAnthropicSSE(w, r.Context(), aiOpts, req.Messages)
    case "google":
        // Direct proxy
        streamGoogleSSE(w, r.Context(), aiOpts, req.Messages)
    default:
        // Wave Cloud - only one requiring conversion (WebSocket → SSE)
        if isCloudAIRequest(aiOpts) {
            streamWaveCloudToUseChat(w, r.Context(), aiOpts, req.Messages)
        } else {
            http.Error(w, "Unsupported provider", 400)
        }
    }
}

// Example: Direct OpenAI streaming (minimal conversion)
func streamOpenAIToUseChat(w http.ResponseWriter, ctx context.Context, opts *WaveAIOptsType, messages []Message) {
    client := openai.NewClient(opts.APIToken)
    
    stream, err := client.CreateChatCompletionStream(ctx, openai.ChatCompletionRequest{
        Model:    opts.Model,
        Messages: convertToOpenAIMessages(messages),
        Stream:   true,
    })
    if err != nil {
        fmt.Fprintf(w, "data: {\"type\":\"error\",\"error\":%q}\n\n", err.Error())
        fmt.Fprintf(w, "data: [DONE]\n\n")
        return
    }
    defer stream.Close()
    
    for {
        response, err := stream.Recv()
        if errors.Is(err, io.EOF) {
            fmt.Fprintf(w, "data: [DONE]\n\n")
            return
        }
        if err != nil {
            fmt.Fprintf(w, "data: {\"type\":\"error\",\"error\":%q}\n\n", err.Error())
            fmt.Fprintf(w, "data: [DONE]\n\n")
            return
        }
        
        // Direct transformation: OpenAI format → useChat format
        for _, choice := range response.Choices {
            if choice.Delta.Content != "" {
                fmt.Fprintf(w, "data: {\"type\":\"text\",\"text\":%q}\n\n", choice.Delta.Content)
            }
            if choice.FinishReason != "" {
                fmt.Fprintf(w, "data: {\"type\":\"finish\",\"finish_reason\":%q}\n\n", choice.FinishReason)
            }
        }
        
        w.(http.Flusher).Flush()
    }
}

// Wave Cloud conversion (only provider needing transformation)
func streamWaveCloudToUseChat(w http.ResponseWriter, ctx context.Context, opts *WaveAIOptsType, messages []Message) {
    // Use existing Wave Cloud WebSocket logic
    waveReq := wshrpc.WaveAIStreamRequest{
        Opts:   opts,
        Prompt: convertMessagesToPrompt(messages),
    }
    
    stream := waveai.RunAICommand(ctx, waveReq) // Returns WebSocket stream
    
    // Convert Wave Cloud packets to useChat SSE format
    for packet := range stream {
        if packet.Error != nil {
            fmt.Fprintf(w, "data: {\"type\":\"error\",\"error\":%q}\n\n", packet.Error.Error())
            break
        }
        
        resp := packet.Response
        if resp.Text != "" {
            fmt.Fprintf(w, "data: {\"type\":\"text\",\"text\":%q}\n\n", resp.Text)
        }
        if resp.FinishReason != "" {
            usage := ""
            if resp.Usage != nil {
                usage = fmt.Sprintf(",\"usage\":{\"prompt_tokens\":%d,\"completion_tokens\":%d,\"total_tokens\":%d}",
                    resp.Usage.PromptTokens, resp.Usage.CompletionTokens, resp.Usage.TotalTokens)
            }
            fmt.Fprintf(w, "data: {\"type\":\"finish\",\"finish_reason\":%q%s}\n\n", resp.FinishReason, usage)
        }
        
        w.(http.Flusher).Flush()
    }
    
    fmt.Fprintf(w, "data: [DONE]\n\n")
}

Phase 2: Frontend Integration

import { useChat } from '@ai-sdk/react';

function WaveAI({ blockId }: { blockId: string }) {
    // Get current preset from block metadata or settings
    const preset = useAtomValue(currentPresetAtom);
    
    const { messages, input, handleInputChange, handleSubmit, isLoading, error } = useChat({
        api: `/api/ai/chat/${blockId}?preset=${preset}`,
        initialMessages: [], // Load from existing aidata file
        onFinish: (message) => {
            // Save conversation to aidata file
            saveConversation(blockId, messages);
        }
    });
    
    return (
        <div className="flex flex-col h-full">
            <div className="flex-1 overflow-y-auto">
                {messages.map(message => (
                    <div key={message.id} className={`message ${message.role}`}>
                        <Markdown text={message.content} />
                    </div>
                ))}
                {isLoading && <TypingIndicator />}
                {error && <div className="error">{error.message}</div>}
            </div>
            
            <form onSubmit={handleSubmit} className="border-t p-4">
                <input
                    value={input}
                    onChange={handleInputChange}
                    placeholder="Type a message..."
                    className="w-full p-2 border rounded"
                />
            </form>
        </div>
    );
}

Phase 3: Advanced Features

// useChat supports multi-modal out of the box
const { messages, append } = useChat({
    api: `/api/ai/chat/${blockId}`,
});

// Send image + text
await append({
    role: 'user',
    content: [
        { type: 'text', text: 'What do you see in this image?' },
        { type: 'image', image: imageFile }
    ]
});

Thinking Models

// Backend detects thinking models and formats appropriately
if isThinkingModel(aiOpts.Model) {
    // Send thinking content separately
    fmt.Fprintf(w, "data: {\"type\":\"thinking\",\"text\":%q}\n\n", thinkingText)
    fmt.Fprintf(w, "data: {\"type\":\"text\",\"text\":%q}\n\n", responseText)
}

Context Injection

// Add system messages or context via useChat options
const { messages, append } = useChat({
    api: `/api/ai/chat/${blockId}`,
    initialMessages: [
        {
            role: 'system',
            content: 'You are a helpful terminal assistant...'
        }
    ]
});

Migration Strategy

1. Parallel Implementation

Keep existing RPC system running
Add new HTTP/SSE endpoint alongside
Feature flag to switch between systems

2. Gradual Migration

Start with new blocks using useChat
Migrate existing conversations on first interaction
Remove RPC system once stable

3. Backward Compatibility

Existing aidata files work unchanged
Same provider backends (OpenAI, Anthropic, etc.)
Same configuration system

Benefits

Complexity Reduction

Frontend: ~900 lines → ~100 lines (90% reduction)
State Management: 10+ atoms → 1 useChat hook
Configuration: Frontend merging → Backend resolution
Streaming: Custom protocol → Standard SSE

Modern Features

Multi-modal: Images, files, audio support
Thinking Models: Built-in reasoning trace support
Conversation Management: Edit, retry, branch conversations
Error Handling: Automatic retry and error boundaries
Performance: Optimized streaming and batching

Developer Experience

Type Safety: Full TypeScript support
Testing: Standard HTTP endpoints easier to test
Debugging: Standard browser dev tools work
Documentation: Leverage AI SDK docs and community

Configuration Examples

URL-based Configuration

POST /api/ai/chat/block-123?preset=claude-coding
POST /api/ai/chat/block-456?preset=gpt4-creative

Header-based Overrides

POST /api/ai/chat/block-123
X-AI-Model: gpt-4-turbo
X-AI-Temperature: 0.8

Request Body Options

{
  "messages": [...],
  "options": {
    "model": "claude-3-sonnet",
    "temperature": 0.7,
    "maxTokens": 2000
  }
}

This design maintains all existing functionality while dramatically simplifying the implementation and adding modern AI chat capabilities.

14 KiB Raw Permalink Blame History