waveterm/aiprompts/waveai-architecture.md

366 lines
No EOL
12 KiB
Markdown

# Wave AI Architecture Documentation
## Overview
Wave AI is a chat-based AI assistant feature integrated into Wave Terminal. It provides a conversational interface for interacting with various AI providers (OpenAI, Anthropic, Perplexity, Google, and Wave's cloud proxy) through a unified streaming architecture. The feature is implemented as a block view within Wave Terminal's modular system.
## Architecture Components
### Frontend Architecture (`frontend/app/view/waveai/`)
#### Core Components
**1. WaveAiModel Class**
- **Purpose**: Main view model implementing the `ViewModel` interface
- **Responsibilities**:
- State management using Jotai atoms
- Configuration management (presets, AI options)
- Message handling and persistence
- RPC communication with backend
- UI state coordination
**2. AiWshClient Class**
- **Purpose**: Specialized WSH RPC client for AI operations
- **Extends**: `WshClient`
- **Responsibilities**:
- Handle incoming `aisendmessage` RPC calls
- Route messages to the model's `sendMessage` method
**3. React Components**
- **WaveAi**: Main container component
- **ChatWindow**: Scrollable message display with auto-scroll behavior
- **ChatItem**: Individual message renderer with role-based styling
- **ChatInput**: Auto-resizing textarea with keyboard navigation
#### State Management (Jotai Atoms)
**Message State**:
```typescript
messagesAtom: PrimitiveAtom<Array<ChatMessageType>>
messagesSplitAtom: SplitAtom<Array<ChatMessageType>>
latestMessageAtom: Atom<ChatMessageType>
addMessageAtom: WritableAtom<unknown, [message: ChatMessageType], void>
updateLastMessageAtom: WritableAtom<unknown, [text: string, isUpdating: boolean], void>
removeLastMessageAtom: WritableAtom<unknown, [], void>
```
**Configuration State**:
```typescript
presetKey: Atom<string> // Current AI preset selection
presetMap: Atom<{[k: string]: MetaType}> // Available AI presets
mergedPresets: Atom<MetaType> // Merged configuration hierarchy
aiOpts: Atom<WaveAIOptsType> // Final AI options for requests
```
**UI State**:
```typescript
locked: PrimitiveAtom<boolean> // Prevents input during AI response
viewIcon: Atom<string> // Header icon
viewName: Atom<string> // Header title
viewText: Atom<HeaderElem[]> // Dynamic header elements
endIconButtons: Atom<IconButtonDecl[]> // Header action buttons
```
#### Configuration Hierarchy
The AI configuration follows a three-tier hierarchy (lowest to highest priority):
1. **Global Settings**: `atoms.settingsAtom["ai:*"]`
2. **Preset Configuration**: `presets[presetKey]["ai:*"]`
3. **Block Metadata**: `block.meta["ai:*"]`
Configuration is merged using `mergeMeta()` utility, allowing fine-grained overrides at each level.
#### Data Flow - Frontend
```
User Input → sendMessage() →
├── Add user message to UI
├── Create WaveAIStreamRequest
├── Call RpcApi.StreamWaveAiCommand()
├── Add typing indicator
└── Stream response handling:
├── Update message incrementally
├── Handle errors
└── Save complete conversation
```
### Backend Architecture (`pkg/waveai/`)
#### Core Interface
**AIBackend Interface**:
```go
type AIBackend interface {
StreamCompletion(
ctx context.Context,
request wshrpc.WaveAIStreamRequest,
) chan wshrpc.RespOrErrorUnion[wshrpc.WaveAIPacketType]
}
```
#### Backend Implementations
**1. OpenAIBackend** (`openaibackend.go`)
- **Providers**: OpenAI, Azure OpenAI, Cloudflare Azure
- **Features**:
- Reasoning model support (o1, o3, o4, gpt-5)
- Proxy support
- Multiple API types (OpenAI, Azure, AzureAD, CloudflareAzure)
- **Streaming**: Uses `go-openai` library for SSE streaming
**2. AnthropicBackend** (`anthropicbackend.go`)
- **Provider**: Anthropic Claude
- **Features**:
- Custom SSE parser for Anthropic's event format
- System message handling
- Usage token tracking
- **Events**: `message_start`, `content_block_delta`, `message_stop`, etc.
**3. WaveAICloudBackend** (`cloudbackend.go`)
- **Provider**: Wave's cloud proxy service
- **Transport**: WebSocket connection to Wave cloud
- **Features**:
- Fallback when no API token/baseURL provided
- Built-in rate limiting and abuse protection
**4. PerplexityBackend** (`perplexitybackend.go`)
- **Provider**: Perplexity AI
- **Implementation**: Similar to OpenAI backend
**5. GoogleBackend** (`googlebackend.go`)
- **Provider**: Google AI (Gemini)
- **Implementation**: Custom integration for Google's API
#### Backend Routing Logic
```go
func RunAICommand(ctx context.Context, request wshrpc.WaveAIStreamRequest) chan wshrpc.RespOrErrorUnion[wshrpc.WaveAIPacketType] {
// Route based on request.Opts.APIType:
switch request.Opts.APIType {
case "anthropic":
backend = AnthropicBackend{}
case "perplexity":
backend = PerplexityBackend{}
case "google":
backend = GoogleBackend{}
default:
if IsCloudAIRequest(request.Opts) {
backend = WaveAICloudBackend{}
} else {
backend = OpenAIBackend{}
}
}
return backend.StreamCompletion(ctx, request)
}
```
### RPC Communication Layer
#### WSH RPC Integration
**Command**: `streamwaveai`
**Type**: Response Stream (one request, multiple responses)
**Request Type** (`WaveAIStreamRequest`):
```go
type WaveAIStreamRequest struct {
ClientId string `json:"clientid,omitempty"`
Opts *WaveAIOptsType `json:"opts"`
Prompt []WaveAIPromptMessageType `json:"prompt"`
}
```
**Response Type** (`WaveAIPacketType`):
```go
type WaveAIPacketType struct {
Type string `json:"type"`
Model string `json:"model,omitempty"`
Created int64 `json:"created,omitempty"`
FinishReason string `json:"finish_reason,omitempty"`
Usage *WaveAIUsageType `json:"usage,omitempty"`
Index int `json:"index,omitempty"`
Text string `json:"text,omitempty"`
Error string `json:"error,omitempty"`
}
```
#### Configuration Types
**AI Options** (`WaveAIOptsType`):
```go
type WaveAIOptsType struct {
Model string `json:"model"`
APIType string `json:"apitype,omitempty"`
APIToken string `json:"apitoken"`
OrgID string `json:"orgid,omitempty"`
APIVersion string `json:"apiversion,omitempty"`
BaseURL string `json:"baseurl,omitempty"`
ProxyURL string `json:"proxyurl,omitempty"`
MaxTokens int `json:"maxtokens,omitempty"`
MaxChoices int `json:"maxchoices,omitempty"`
TimeoutMs int `json:"timeoutms,omitempty"`
}
```
### Data Persistence
#### Chat History Storage
**Frontend**:
- **Method**: `fetchWaveFile(blockId, "aidata")`
- **Format**: JSON array of `WaveAIPromptMessageType`
- **Sliding Window**: Last 30 messages (`slidingWindowSize = 30`)
**Backend**:
- **Service**: `BlockService.SaveWaveAiData(blockId, history)`
- **Storage**: Block-associated file storage
- **Persistence**: Automatic save after each complete exchange
#### Message Format
**UI Messages** (`ChatMessageType`):
```typescript
interface ChatMessageType {
id: string;
user: string; // "user" | "assistant" | "error"
text: string;
isUpdating?: boolean;
}
```
**Stored Messages** (`WaveAIPromptMessageType`):
```go
type WaveAIPromptMessageType struct {
Role string `json:"role"` // "user" | "assistant" | "system" | "error"
Content string `json:"content"`
Name string `json:"name,omitempty"`
}
```
### Error Handling
#### Frontend Error Handling
1. **Network Errors**: Caught in streaming loop, displayed as error messages
2. **Empty Responses**: Automatically remove typing indicator
3. **Cancellation**: User can cancel via stop button (`model.cancel = true`)
4. **Partial Responses**: Saved even if incomplete due to errors
#### Backend Error Handling
1. **Panic Recovery**: All backends use `panichandler.PanicHandler()`
2. **Context Cancellation**: Proper cleanup on request cancellation
3. **Provider Errors**: Wrapped and forwarded to frontend
4. **Connection Errors**: Detailed error messages for debugging
### UI Features
#### Message Rendering
- **Markdown Support**: Full markdown rendering with syntax highlighting
- **Role-based Styling**: Different colors/layouts for user/assistant/error messages
- **Typing Indicator**: Animated dots during AI response
- **Font Configuration**: Configurable font sizes via presets
#### Input Handling
- **Auto-resize**: Textarea grows/shrinks with content (max 5 lines)
- **Keyboard Navigation**:
- Enter to send
- Cmd+L to clear history
- Arrow keys for code block selection
- **Code Block Selection**: Navigate through code blocks in responses
#### Scroll Management
- **Auto-scroll**: Automatically scrolls to new messages
- **User Scroll Detection**: Pauses auto-scroll when user manually scrolls
- **Smart Resume**: Resumes auto-scroll when near bottom
### Configuration Management
#### Preset System
**Preset Structure**:
```json
{
"ai@preset-name": {
"display:name": "Preset Display Name",
"display:order": 1,
"ai:model": "gpt-4",
"ai:apitype": "openai",
"ai:apitoken": "sk-...",
"ai:baseurl": "https://api.openai.com/v1",
"ai:maxtokens": 4000,
"ai:fontsize": "14px",
"ai:fixedfontsize": "12px"
}
}
```
**Configuration Keys**:
- `ai:model` - AI model name
- `ai:apitype` - Provider type (openai, anthropic, perplexity, google)
- `ai:apitoken` - API authentication token
- `ai:baseurl` - Custom API endpoint
- `ai:proxyurl` - HTTP proxy URL
- `ai:maxtokens` - Maximum response tokens
- `ai:timeoutms` - Request timeout
- `ai:fontsize` - UI font size
- `ai:fixedfontsize` - Code block font size
#### Provider Detection
The UI automatically detects and displays the active provider:
- **Cloud**: Wave's proxy (no token/baseURL)
- **Local**: localhost/127.0.0.1 endpoints
- **Remote**: External API endpoints
- **Provider-specific**: Anthropic, Perplexity with custom icons
### Performance Considerations
#### Frontend Optimizations
- **Jotai Atoms**: Granular reactivity, only re-render affected components
- **Memo Components**: `ChatWindow` and `ChatItem` are memoized
- **Throttled Scrolling**: Scroll events throttled to 100ms
- **Debounced Scroll Detection**: User scroll detection debounced to 300ms
#### Backend Optimizations
- **Streaming**: All responses are streamed for immediate feedback
- **Context Cancellation**: Proper cleanup prevents resource leaks
- **Connection Pooling**: HTTP clients reuse connections
- **Error Recovery**: Graceful degradation on provider failures
### Security Considerations
#### API Token Handling
- **Storage**: Tokens stored in encrypted configuration
- **Transmission**: Tokens only sent to configured endpoints
- **Validation**: Backend validates token format and permissions
#### Request Validation
- **Input Sanitization**: User input validated before sending
- **Rate Limiting**: Cloud backend includes built-in rate limiting
- **Error Filtering**: Sensitive error details filtered from UI
### Extension Points
#### Adding New Providers
1. **Implement AIBackend Interface**: Create new backend struct
2. **Add Provider Detection**: Update `RunAICommand()` routing logic
3. **Add Configuration**: Define provider-specific config keys
4. **Update UI**: Add provider detection in `viewText` atom
#### Custom Message Types
1. **Extend ChatMessageType**: Add new user types
2. **Update ChatItem Rendering**: Handle new message types
3. **Modify Storage**: Update persistence format if needed
This architecture provides a flexible, extensible foundation for AI chat functionality while maintaining clean separation between UI, business logic, and provider integrations.