claude-code/plugins/plugin-dev/skills/command-development/references/testing-strategies.md
Daisy S. Hollman 387dc35db7
feat: Add plugin-dev toolkit for comprehensive plugin development
Adds the plugin-dev plugin to public marketplace. A comprehensive toolkit for
developing Claude Code plugins with 7 expert skills, 3 AI-assisted agents, and
extensive documentation covering the complete plugin development lifecycle.

Key features:
- 7 skills: hook-development, mcp-integration, plugin-structure, plugin-settings,
  command-development, agent-development, skill-development
- 3 agents: agent-creator (AI-assisted generation), plugin-validator (structure
  validation), skill-reviewer (quality review)
- 1 command: /plugin-dev:create-plugin (guided 8-phase workflow)
- 10 utility scripts for validation and testing
- 21 reference docs with deep-dive guidance (~11k words)
- 9 working examples demonstrating best practices

Changes for public release:
- Replaced all references to internal repositories with "Claude Code"
- Updated MCP examples: internal.company.com → api.example.com
- Updated token variables: ${INTERNAL_TOKEN} → ${API_TOKEN}
- Reframed agent-creation-system-prompt as "proven in production"
- Preserved all ${CLAUDE_PLUGIN_ROOT} references (186 total)
- Preserved valuable test blocks in core modules

Validation:
- All 3 agents validated successfully with validate-agent.sh
- All JSON files validated with jq
- Zero internal references remaining
- 59 files migrated, 21,971 lines added

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 04:09:00 -08:00

14 KiB

Command Testing Strategies

Comprehensive strategies for testing slash commands before deployment and distribution.

Overview

Testing commands ensures they work correctly, handle edge cases, and provide good user experience. A systematic testing approach catches issues early and builds confidence in command reliability.

Testing Levels

Level 1: Syntax and Structure Validation

What to test:

  • YAML frontmatter syntax
  • Markdown format
  • File location and naming

How to test:

# Validate YAML frontmatter
head -n 20 .claude/commands/my-command.md | grep -A 10 "^---"

# Check for closing frontmatter marker
head -n 20 .claude/commands/my-command.md | grep -c "^---" # Should be 2

# Verify file has .md extension
ls .claude/commands/*.md

# Check file is in correct location
test -f .claude/commands/my-command.md && echo "Found" || echo "Missing"

Automated validation script:

#!/bin/bash
# validate-command.sh

COMMAND_FILE="$1"

if [ ! -f "$COMMAND_FILE" ]; then
  echo "ERROR: File not found: $COMMAND_FILE"
  exit 1
fi

# Check .md extension
if [[ ! "$COMMAND_FILE" =~ \.md$ ]]; then
  echo "ERROR: File must have .md extension"
  exit 1
fi

# Validate YAML frontmatter if present
if head -n 1 "$COMMAND_FILE" | grep -q "^---"; then
  # Count frontmatter markers
  MARKERS=$(head -n 50 "$COMMAND_FILE" | grep -c "^---")
  if [ "$MARKERS" -ne 2 ]; then
    echo "ERROR: Invalid YAML frontmatter (need exactly 2 '---' markers)"
    exit 1
  fi
  echo "✓ YAML frontmatter syntax valid"
fi

# Check for empty file
if [ ! -s "$COMMAND_FILE" ]; then
  echo "ERROR: File is empty"
  exit 1
fi

echo "✓ Command file structure valid"

Level 2: Frontmatter Field Validation

What to test:

  • Field types correct
  • Values in valid ranges
  • Required fields present (if any)

Validation script:

#!/bin/bash
# validate-frontmatter.sh

COMMAND_FILE="$1"

# Extract YAML frontmatter
FRONTMATTER=$(sed -n '/^---$/,/^---$/p' "$COMMAND_FILE" | sed '1d;$d')

if [ -z "$FRONTMATTER" ]; then
  echo "No frontmatter to validate"
  exit 0
fi

# Check 'model' field if present
if echo "$FRONTMATTER" | grep -q "^model:"; then
  MODEL=$(echo "$FRONTMATTER" | grep "^model:" | cut -d: -f2 | tr -d ' ')
  if ! echo "sonnet opus haiku" | grep -qw "$MODEL"; then
    echo "ERROR: Invalid model '$MODEL' (must be sonnet, opus, or haiku)"
    exit 1
  fi
  echo "✓ Model field valid: $MODEL"
fi

# Check 'allowed-tools' field format
if echo "$FRONTMATTER" | grep -q "^allowed-tools:"; then
  echo "✓ allowed-tools field present"
  # Could add more sophisticated validation here
fi

# Check 'description' length
if echo "$FRONTMATTER" | grep -q "^description:"; then
  DESC=$(echo "$FRONTMATTER" | grep "^description:" | cut -d: -f2-)
  LENGTH=${#DESC}
  if [ "$LENGTH" -gt 80 ]; then
    echo "WARNING: Description length $LENGTH (recommend < 60 chars)"
  else
    echo "✓ Description length acceptable: $LENGTH chars"
  fi
fi

echo "✓ Frontmatter fields valid"

Level 3: Manual Command Invocation

What to test:

  • Command appears in /help
  • Command executes without errors
  • Output is as expected

Test procedure:

# 1. Start Claude Code
claude --debug

# 2. Check command appears in help
> /help
# Look for your command in the list

# 3. Invoke command without arguments
> /my-command
# Check for reasonable error or behavior

# 4. Invoke with valid arguments
> /my-command arg1 arg2
# Verify expected behavior

# 5. Check debug logs
tail -f ~/.claude/debug-logs/latest
# Look for errors or warnings

Level 4: Argument Testing

What to test:

  • Positional arguments work ($1, $2, etc.)
  • $ARGUMENTS captures all arguments
  • Missing arguments handled gracefully
  • Invalid arguments detected

Test matrix:

Test Case Command Expected Result
No args /cmd Graceful handling or useful message
One arg /cmd arg1 $1 substituted correctly
Two args /cmd arg1 arg2 $1 and $2 substituted
Extra args /cmd a b c d All captured or extras ignored appropriately
Special chars /cmd "arg with spaces" Quotes handled correctly
Empty arg /cmd "" Empty string handled

Test script:

#!/bin/bash
# test-command-arguments.sh

COMMAND="$1"

echo "Testing argument handling for /$COMMAND"
echo

echo "Test 1: No arguments"
echo "  Command: /$COMMAND"
echo "  Expected: [describe expected behavior]"
echo "  Manual test required"
echo

echo "Test 2: Single argument"
echo "  Command: /$COMMAND test-value"
echo "  Expected: 'test-value' appears in output"
echo "  Manual test required"
echo

echo "Test 3: Multiple arguments"
echo "  Command: /$COMMAND arg1 arg2 arg3"
echo "  Expected: All arguments used appropriately"
echo "  Manual test required"
echo

echo "Test 4: Special characters"
echo "  Command: /$COMMAND \"value with spaces\""
echo "  Expected: Entire phrase captured"
echo "  Manual test required"

Level 5: File Reference Testing

What to test:

  • @ syntax loads file contents
  • Non-existent files handled
  • Large files handled appropriately
  • Multiple file references work

Test procedure:

# Create test files
echo "Test content" > /tmp/test-file.txt
echo "Second file" > /tmp/test-file-2.txt

# Test single file reference
> /my-command /tmp/test-file.txt
# Verify file content is read

# Test non-existent file
> /my-command /tmp/nonexistent.txt
# Verify graceful error handling

# Test multiple files
> /my-command /tmp/test-file.txt /tmp/test-file-2.txt
# Verify both files processed

# Test large file
dd if=/dev/zero of=/tmp/large-file.bin bs=1M count=100
> /my-command /tmp/large-file.bin
# Verify reasonable behavior (may truncate or warn)

# Cleanup
rm /tmp/test-file*.txt /tmp/large-file.bin

Level 6: Bash Execution Testing

What to test:

  • !` commands execute correctly
  • Command output included in prompt
  • Command failures handled
  • Security: only allowed commands run

Test procedure:

# Create test command with bash execution
cat > .claude/commands/test-bash.md << 'EOF'
---
description: Test bash execution
allowed-tools: Bash(echo:*), Bash(date:*)
---

Current date: !`date`
Test output: !`echo "Hello from bash"`

Analysis of output above...
EOF

# Test in Claude Code
> /test-bash
# Verify:
# 1. Date appears correctly
# 2. Echo output appears
# 3. No errors in debug logs

# Test with disallowed command (should fail or be blocked)
cat > .claude/commands/test-forbidden.md << 'EOF'
---
description: Test forbidden command
allowed-tools: Bash(echo:*)
---

Trying forbidden: !`ls -la /`
EOF

> /test-forbidden
# Verify: Permission denied or appropriate error

Level 7: Integration Testing

What to test:

  • Commands work with other plugin components
  • Commands interact correctly with each other
  • State management works across invocations
  • Workflow commands execute in sequence

Test scenarios:

Scenario 1: Command + Hook Integration

# Setup: Command that triggers a hook
# Test: Invoke command, verify hook executes

# Command: .claude/commands/risky-operation.md
# Hook: PreToolUse that validates the operation

> /risky-operation
# Verify: Hook executes and validates before command completes

Scenario 2: Command Sequence

# Setup: Multi-command workflow
> /workflow-init
# Verify: State file created

> /workflow-step2
# Verify: State file read, step 2 executes

> /workflow-complete
# Verify: State file cleaned up

Scenario 3: Command + MCP Integration

# Setup: Command uses MCP tools
# Test: Verify MCP server accessible

> /mcp-command
# Verify:
# 1. MCP server starts (if stdio)
# 2. Tool calls succeed
# 3. Results included in output

Automated Testing Approaches

Command Test Suite

Create a test suite script:

#!/bin/bash
# test-commands.sh - Command test suite

TEST_DIR=".claude/commands"
FAILED_TESTS=0

echo "Command Test Suite"
echo "=================="
echo

for cmd_file in "$TEST_DIR"/*.md; do
  cmd_name=$(basename "$cmd_file" .md)
  echo "Testing: $cmd_name"

  # Validate structure
  if ./validate-command.sh "$cmd_file"; then
    echo "  ✓ Structure valid"
  else
    echo "  ✗ Structure invalid"
    ((FAILED_TESTS++))
  fi

  # Validate frontmatter
  if ./validate-frontmatter.sh "$cmd_file"; then
    echo "  ✓ Frontmatter valid"
  else
    echo "  ✗ Frontmatter invalid"
    ((FAILED_TESTS++))
  fi

  echo
done

echo "=================="
echo "Tests complete"
echo "Failed: $FAILED_TESTS"

exit $FAILED_TESTS

Pre-Commit Hook

Validate commands before committing:

#!/bin/bash
# .git/hooks/pre-commit

echo "Validating commands..."

COMMANDS_CHANGED=$(git diff --cached --name-only | grep "\.claude/commands/.*\.md")

if [ -z "$COMMANDS_CHANGED" ]; then
  echo "No commands changed"
  exit 0
fi

for cmd in $COMMANDS_CHANGED; do
  echo "Checking: $cmd"

  if ! ./scripts/validate-command.sh "$cmd"; then
    echo "ERROR: Command validation failed: $cmd"
    exit 1
  fi
done

echo "✓ All commands valid"

Continuous Testing

Test commands in CI/CD:

# .github/workflows/test-commands.yml
name: Test Commands

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2

      - name: Validate command structure
        run: |
          for cmd in .claude/commands/*.md; do
            echo "Testing: $cmd"
            ./scripts/validate-command.sh "$cmd"
          done

      - name: Validate frontmatter
        run: |
          for cmd in .claude/commands/*.md; do
            ./scripts/validate-frontmatter.sh "$cmd"
          done

      - name: Check for TODOs
        run: |
          if grep -r "TODO" .claude/commands/; then
            echo "ERROR: TODOs found in commands"
            exit 1
          fi

Edge Case Testing

Test Edge Cases

Empty arguments:

> /cmd ""
> /cmd '' ''

Special characters:

> /cmd "arg with spaces"
> /cmd arg-with-dashes
> /cmd arg_with_underscores
> /cmd arg/with/slashes
> /cmd 'arg with "quotes"'

Long arguments:

> /cmd $(python -c "print('a' * 10000)")

Unusual file paths:

> /cmd ./file
> /cmd ../file
> /cmd ~/file
> /cmd "/path with spaces/file"

Bash command edge cases:

# Commands that might fail
!`exit 1`
!`false`
!`command-that-does-not-exist`

# Commands with special output
!`echo ""`
!`cat /dev/null`
!`yes | head -n 1000000`

Performance Testing

Response Time Testing

#!/bin/bash
# test-command-performance.sh

COMMAND="$1"

echo "Testing performance of /$COMMAND"
echo

for i in {1..5}; do
  echo "Run $i:"
  START=$(date +%s%N)

  # Invoke command (manual step - record time)
  echo "  Invoke: /$COMMAND"
  echo "  Start time: $START"
  echo "  (Record end time manually)"
  echo
done

echo "Analyze results:"
echo "  - Average response time"
echo "  - Variance"
echo "  - Acceptable threshold: < 3 seconds for fast commands"

Resource Usage Testing

# Monitor Claude Code during command execution
# In terminal 1:
claude --debug

# In terminal 2:
watch -n 1 'ps aux | grep claude'

# Execute command and observe:
# - Memory usage
# - CPU usage
# - Process count

User Experience Testing

Usability Checklist

  • Command name is intuitive
  • Description is clear in /help
  • Arguments are well-documented
  • Error messages are helpful
  • Output is formatted readably
  • Long-running commands show progress
  • Results are actionable
  • Edge cases have good UX

User Acceptance Testing

Recruit testers:

# Testing Guide for Beta Testers

## Command: /my-new-command

### Test Scenarios

1. **Basic usage:**
   - Run: `/my-new-command`
   - Expected: [describe]
   - Rate clarity: 1-5

2. **With arguments:**
   - Run: `/my-new-command arg1 arg2`
   - Expected: [describe]
   - Rate usefulness: 1-5

3. **Error case:**
   - Run: `/my-new-command invalid-input`
   - Expected: Helpful error message
   - Rate error message: 1-5

### Feedback Questions

1. Was the command easy to understand?
2. Did the output meet your expectations?
3. What would you change?
4. Would you use this command regularly?

Testing Checklist

Before releasing a command:

Structure

  • File in correct location
  • Correct .md extension
  • Valid YAML frontmatter (if present)
  • Markdown syntax correct

Functionality

  • Command appears in /help
  • Description is clear
  • Command executes without errors
  • Arguments work as expected
  • File references work
  • Bash execution works (if used)

Edge Cases

  • Missing arguments handled
  • Invalid arguments detected
  • Non-existent files handled
  • Special characters work
  • Long inputs handled

Integration

  • Works with other commands
  • Works with hooks (if applicable)
  • Works with MCP (if applicable)
  • State management works

Quality

  • Performance acceptable
  • No security issues
  • Error messages helpful
  • Output formatted well
  • Documentation complete

Distribution

  • Tested by others
  • Feedback incorporated
  • README updated
  • Examples provided

Debugging Failed Tests

Common Issues and Solutions

Issue: Command not appearing in /help

# Check file location
ls -la .claude/commands/my-command.md

# Check permissions
chmod 644 .claude/commands/my-command.md

# Check syntax
head -n 20 .claude/commands/my-command.md

# Restart Claude Code
claude --debug

Issue: Arguments not substituting

# Verify syntax
grep '\$1' .claude/commands/my-command.md
grep '\$ARGUMENTS' .claude/commands/my-command.md

# Test with simple command first
echo "Test: \$1 and \$2" > .claude/commands/test-args.md

Issue: Bash commands not executing

# Check allowed-tools
grep "allowed-tools" .claude/commands/my-command.md

# Verify command syntax
grep '!\`' .claude/commands/my-command.md

# Test command manually
date
echo "test"

Issue: File references not working

# Check @ syntax
grep '@' .claude/commands/my-command.md

# Verify file exists
ls -la /path/to/referenced/file

# Check permissions
chmod 644 /path/to/referenced/file

Best Practices

  1. Test early, test often: Validate as you develop
  2. Automate validation: Use scripts for repeatable checks
  3. Test edge cases: Don't just test the happy path
  4. Get feedback: Have others test before wide release
  5. Document tests: Keep test scenarios for regression testing
  6. Monitor in production: Watch for issues after release
  7. Iterate: Improve based on real usage data