Public/waveterm

Fork 0

mirror of https://github.com/wavetermdev/waveterm.git synced 2025-11-28 05:00:26 +08:00

Mike Sawka 6845cd5a72

standalone fixes, linting errors, speedups, QOL changes ported from wave-12 branch (#2271 )

2025-08-20 18:07:11 -07:00

12 KiB

Raw Permalink Blame History

Wave AI Architecture Documentation

Overview

Wave AI is a chat-based AI assistant feature integrated into Wave Terminal. It provides a conversational interface for interacting with various AI providers (OpenAI, Anthropic, Perplexity, Google, and Wave's cloud proxy) through a unified streaming architecture. The feature is implemented as a block view within Wave Terminal's modular system.

Architecture Components

Frontend Architecture (`frontend/app/view/waveai/`)

Core Components

1. WaveAiModel Class

Purpose: Main view model implementing the ViewModel interface
Responsibilities:
- State management using Jotai atoms
- Configuration management (presets, AI options)
- Message handling and persistence
- RPC communication with backend
- UI state coordination

2. AiWshClient Class

Purpose: Specialized WSH RPC client for AI operations
Extends: WshClient
Responsibilities:
- Handle incoming aisendmessage RPC calls
- Route messages to the model's sendMessage method

3. React Components

WaveAi: Main container component
ChatWindow: Scrollable message display with auto-scroll behavior
ChatItem: Individual message renderer with role-based styling
ChatInput: Auto-resizing textarea with keyboard navigation

State Management (Jotai Atoms)

Message State:

messagesAtom: PrimitiveAtom<Array<ChatMessageType>>
messagesSplitAtom: SplitAtom<Array<ChatMessageType>>
latestMessageAtom: Atom<ChatMessageType>
addMessageAtom: WritableAtom<unknown, [message: ChatMessageType], void>
updateLastMessageAtom: WritableAtom<unknown, [text: string, isUpdating: boolean], void>
removeLastMessageAtom: WritableAtom<unknown, [], void>

Configuration State:

presetKey: Atom<string>           // Current AI preset selection
presetMap: Atom<{[k: string]: MetaType}>  // Available AI presets
mergedPresets: Atom<MetaType>     // Merged configuration hierarchy
aiOpts: Atom<WaveAIOptsType>      // Final AI options for requests

UI State:

locked: PrimitiveAtom<boolean>    // Prevents input during AI response
viewIcon: Atom<string>            // Header icon
viewName: Atom<string>            // Header title
viewText: Atom<HeaderElem[]>      // Dynamic header elements
endIconButtons: Atom<IconButtonDecl[]>  // Header action buttons

Configuration Hierarchy

The AI configuration follows a three-tier hierarchy (lowest to highest priority):

Global Settings: atoms.settingsAtom["ai:*"]
Preset Configuration: presets[presetKey]["ai:*"]
Block Metadata: block.meta["ai:*"]

Configuration is merged using mergeMeta() utility, allowing fine-grained overrides at each level.

Data Flow - Frontend

User Input → sendMessage() → 
├── Add user message to UI
├── Create WaveAIStreamRequest
├── Call RpcApi.StreamWaveAiCommand()
├── Add typing indicator
└── Stream response handling:
    ├── Update message incrementally
    ├── Handle errors
    └── Save complete conversation

Backend Architecture (`pkg/waveai/`)

Core Interface

AIBackend Interface:

type AIBackend interface {
    StreamCompletion(
        ctx context.Context,
        request wshrpc.WaveAIStreamRequest,
    ) chan wshrpc.RespOrErrorUnion[wshrpc.WaveAIPacketType]
}

Backend Implementations

1. OpenAIBackend (openaibackend.go)

Providers: OpenAI, Azure OpenAI, Cloudflare Azure
Features:
- Reasoning model support (o1, o3, o4, gpt-5)
- Proxy support
- Multiple API types (OpenAI, Azure, AzureAD, CloudflareAzure)
Streaming: Uses go-openai library for SSE streaming

2. AnthropicBackend (anthropicbackend.go)

Provider: Anthropic Claude
Features:
- Custom SSE parser for Anthropic's event format
- System message handling
- Usage token tracking
Events: message_start, content_block_delta, message_stop, etc.

3. WaveAICloudBackend (cloudbackend.go)

Provider: Wave's cloud proxy service
Transport: WebSocket connection to Wave cloud
Features:
- Fallback when no API token/baseURL provided
- Built-in rate limiting and abuse protection

4. PerplexityBackend (perplexitybackend.go)

Provider: Perplexity AI
Implementation: Similar to OpenAI backend

5. GoogleBackend (googlebackend.go)

Provider: Google AI (Gemini)
Implementation: Custom integration for Google's API

Backend Routing Logic

func RunAICommand(ctx context.Context, request wshrpc.WaveAIStreamRequest) chan wshrpc.RespOrErrorUnion[wshrpc.WaveAIPacketType] {
    // Route based on request.Opts.APIType:
    switch request.Opts.APIType {
    case "anthropic":
        backend = AnthropicBackend{}
    case "perplexity":
        backend = PerplexityBackend{}
    case "google":
        backend = GoogleBackend{}
    default:
        if IsCloudAIRequest(request.Opts) {
            backend = WaveAICloudBackend{}
        } else {
            backend = OpenAIBackend{}
        }
    }
    return backend.StreamCompletion(ctx, request)
}

RPC Communication Layer

WSH RPC Integration

Command: streamwaveai Type: Response Stream (one request, multiple responses)

Request Type (WaveAIStreamRequest):

type WaveAIStreamRequest struct {
    ClientId string                    `json:"clientid,omitempty"`
    Opts     *WaveAIOptsType           `json:"opts"`
    Prompt   []WaveAIPromptMessageType `json:"prompt"`
}

Response Type (WaveAIPacketType):

type WaveAIPacketType struct {
    Type         string           `json:"type"`
    Model        string           `json:"model,omitempty"`
    Created      int64            `json:"created,omitempty"`
    FinishReason string           `json:"finish_reason,omitempty"`
    Usage        *WaveAIUsageType `json:"usage,omitempty"`
    Index        int              `json:"index,omitempty"`
    Text         string           `json:"text,omitempty"`
    Error        string           `json:"error,omitempty"`
}

Configuration Types

AI Options (WaveAIOptsType):

type WaveAIOptsType struct {
    Model      string `json:"model"`
    APIType    string `json:"apitype,omitempty"`
    APIToken   string `json:"apitoken"`
    OrgID      string `json:"orgid,omitempty"`
    APIVersion string `json:"apiversion,omitempty"`
    BaseURL    string `json:"baseurl,omitempty"`
    ProxyURL   string `json:"proxyurl,omitempty"`
    MaxTokens  int    `json:"maxtokens,omitempty"`
    MaxChoices int    `json:"maxchoices,omitempty"`
    TimeoutMs  int    `json:"timeoutms,omitempty"`
}

Data Persistence

Chat History Storage

Frontend:

Method: fetchWaveFile(blockId, "aidata")
Format: JSON array of WaveAIPromptMessageType
Sliding Window: Last 30 messages (slidingWindowSize = 30)

Backend:

Service: BlockService.SaveWaveAiData(blockId, history)
Storage: Block-associated file storage
Persistence: Automatic save after each complete exchange

Message Format

UI Messages (ChatMessageType):

interface ChatMessageType {
    id: string;
    user: string;        // "user" | "assistant" | "error"
    text: string;
    isUpdating?: boolean;
}

Stored Messages (WaveAIPromptMessageType):

type WaveAIPromptMessageType struct {
    Role    string `json:"role"`     // "user" | "assistant" | "system" | "error"
    Content string `json:"content"`
    Name    string `json:"name,omitempty"`
}

Error Handling

Frontend Error Handling

Network Errors: Caught in streaming loop, displayed as error messages
Empty Responses: Automatically remove typing indicator
Cancellation: User can cancel via stop button (model.cancel = true)
Partial Responses: Saved even if incomplete due to errors

Backend Error Handling

Panic Recovery: All backends use panichandler.PanicHandler()
Context Cancellation: Proper cleanup on request cancellation
Provider Errors: Wrapped and forwarded to frontend
Connection Errors: Detailed error messages for debugging

UI Features

Message Rendering

Markdown Support: Full markdown rendering with syntax highlighting
Role-based Styling: Different colors/layouts for user/assistant/error messages
Typing Indicator: Animated dots during AI response
Font Configuration: Configurable font sizes via presets

Input Handling

Auto-resize: Textarea grows/shrinks with content (max 5 lines)
Keyboard Navigation:
- Enter to send
- Cmd+L to clear history
- Arrow keys for code block selection
Code Block Selection: Navigate through code blocks in responses

Scroll Management

Auto-scroll: Automatically scrolls to new messages
User Scroll Detection: Pauses auto-scroll when user manually scrolls
Smart Resume: Resumes auto-scroll when near bottom

Configuration Management

Preset System

Preset Structure:

{
  "ai@preset-name": {
    "display:name": "Preset Display Name",
    "display:order": 1,
    "ai:model": "gpt-4",
    "ai:apitype": "openai",
    "ai:apitoken": "sk-...",
    "ai:baseurl": "https://api.openai.com/v1",
    "ai:maxtokens": 4000,
    "ai:fontsize": "14px",
    "ai:fixedfontsize": "12px"
  }
}

Configuration Keys:

ai:model - AI model name
ai:apitype - Provider type (openai, anthropic, perplexity, google)
ai:apitoken - API authentication token
ai:baseurl - Custom API endpoint
ai:proxyurl - HTTP proxy URL
ai:maxtokens - Maximum response tokens
ai:timeoutms - Request timeout
ai:fontsize - UI font size
ai:fixedfontsize - Code block font size

Provider Detection

The UI automatically detects and displays the active provider:

Cloud: Wave's proxy (no token/baseURL)
Local: localhost/127.0.0.1 endpoints
Remote: External API endpoints
Provider-specific: Anthropic, Perplexity with custom icons

Performance Considerations

Frontend Optimizations

Jotai Atoms: Granular reactivity, only re-render affected components
Memo Components: ChatWindow and ChatItem are memoized
Throttled Scrolling: Scroll events throttled to 100ms
Debounced Scroll Detection: User scroll detection debounced to 300ms

Backend Optimizations

Streaming: All responses are streamed for immediate feedback
Context Cancellation: Proper cleanup prevents resource leaks
Connection Pooling: HTTP clients reuse connections
Error Recovery: Graceful degradation on provider failures

Security Considerations

API Token Handling

Storage: Tokens stored in encrypted configuration
Transmission: Tokens only sent to configured endpoints
Validation: Backend validates token format and permissions

Request Validation

Input Sanitization: User input validated before sending
Rate Limiting: Cloud backend includes built-in rate limiting
Error Filtering: Sensitive error details filtered from UI

Extension Points

Adding New Providers

Implement AIBackend Interface: Create new backend struct
Add Provider Detection: Update RunAICommand() routing logic
Add Configuration: Define provider-specific config keys
Update UI: Add provider detection in viewText atom

Custom Message Types

Extend ChatMessageType: Add new user types
Update ChatItem Rendering: Handle new message types
Modify Storage: Update persistence format if needed

This architecture provides a flexible, extensible foundation for AI chat functionality while maintaining clean separation between UI, business logic, and provider integrations.

12 KiB Raw Permalink Blame History