ubicloud/spike/postgres-fuzz-tests/README.md
2025-07-07 23:39:15 +05:30

432 lines
14 KiB
Markdown

# PostgreSQL Fuzz Testing Framework for Ubicloud
A comprehensive testing framework for PostgreSQL databases on Ubicloud, designed to validate database operations, high availability, replication, scaling, maintenance operations, and data integrity.
## Features
- **Comprehensive Testing**: Tests database creation, connectivity, data operations, and cleanup
- **High Availability Validation**: Tests synchronous and asynchronous replication
- **Scaling Operations**: Tests database scaling up/down for compute and storage
- **HA Type Changes**: Tests changing between none, async, and sync HA types
- **Maintenance Operations**: Tests maintenance windows, restarts, and configuration updates
- **Security Testing**: Tests password resets and firewall rule management
- **Backup & Restore**: Tests database backup and restore functionality
- **Read Replica Testing**: Creates and validates read replicas with failover scenarios
- **Data Integrity Checks**: Validates data consistency using checksums and row counts
- **Performance Metrics**: Collects and analyzes database performance metrics
- **Stress Testing**: Simulates high-load conditions and concurrent operations
- **Chaos Engineering**: Simulates failures and validates recovery
- **Parallel Execution**: Supports running multiple test scenarios concurrently
- **Detailed Reporting**: Generates comprehensive JSON reports with test results
## Prerequisites
- Python 3.8 or higher
- Access to Ubicloud API with valid credentials
- PostgreSQL client libraries
## Installation
1. Install Python dependencies:
```bash
pip install -r requirements.txt
```
2. Ensure you have valid Ubicloud API credentials in a configuration file (see Configuration section).
## Configuration
Create a configuration file (e.g., `.local-api-credentials`) with your Ubicloud API credentials:
```
BASE_URL=http://api.localhost:3000
BEARER_AUTH_TOKEN=your-bearer-token-here
TEST_PROJECT=your-project-id
TEST_LOCATION=eu-central-h1
```
## Usage
### Basic Usage
Run the fuzz tester with a simple test scenario:
```bash
python postgres_fuzz_tester.py --config .local-api-credentials --scenarios sample_test.yaml
```
### Advanced Usage
Run with custom output file and verbose logging:
```bash
python postgres_fuzz_tester.py \
--config .local-api-credentials \
--scenarios test_scenarios.yaml \
--output my_test_report.json \
--verbose
```
Run scenarios in parallel:
```bash
python postgres_fuzz_tester.py \
--config .local-api-credentials \
--scenarios test_scenarios.yaml \
--parallel 3
```
### Command Line Options
- `--config, -c`: Path to configuration file (required)
- `--scenarios, -s`: Path to test scenarios file (required)
- `--output, -o`: Output file for test report (optional)
- `--parallel, -p`: Number of parallel test scenarios (default: 1)
- `--verbose, -v`: Enable verbose logging (optional)
## Available Test Actions
The fuzz tester supports the following test actions:
### Database Lifecycle
- **create_database**: Create a new PostgreSQL database with specified configuration
- **validate_connection**: Test database connectivity and basic operations
- **cleanup**: Clean up created databases and resources
### Data Operations
- **write_data**: Insert test data into multiple tables with integrity validation
- **validate_replication**: Check replication status and data consistency
### Scaling Operations
- **scale_up**: Scale database compute size and/or storage capacity
- **scale_down**: Scale down database compute size (storage cannot be reduced)
### High Availability Management
- **change_ha_type**: Change between none, async, and sync HA types
- **create_read_replica**: Create and validate read replicas
- **test_failover**: Test failover scenarios with data integrity checks
### Maintenance & Configuration
- **set_maintenance_window**: Configure maintenance window timing
- **restart_database**: Restart database and validate recovery
- **update_config**: Update PostgreSQL and PgBouncer configuration parameters
### Security Operations
- **reset_password**: Reset superuser password and validate new credentials
- **test_firewall_rules**: Create and test firewall rules for database access
### Backup & Recovery
- **test_backup_restore**: Test database backup and restore with data integrity validation
### Performance Testing
- **collect_metrics**: Collect and analyze database performance metrics
- **stress_test**: Run stress tests with concurrent operations for specified duration
## Test Scenarios
Test scenarios are defined in YAML files. Each scenario contains a series of steps that test different aspects of PostgreSQL functionality.
### Sample Scenario - Basic Operations
```yaml
scenarios:
- name: "basic_database_test"
description: "Test basic database operations"
steps:
- action: "create_database"
params:
size: "standard-2"
storage_size: 64
ha_type: "none"
version: "17"
flavor: "standard"
- action: "validate_connection"
- action: "write_data"
params:
table_count: 3
rows_per_table: 100
- action: "collect_metrics"
- action: "cleanup"
```
### Sample Scenario - Scaling Operations
```yaml
scenarios:
- name: "database_scaling_test"
description: "Test database scaling operations"
steps:
- action: "create_database"
params:
size: "standard-2"
storage_size: 64
ha_type: "none"
- action: "validate_connection"
- action: "scale_up"
params:
new_size: "standard-4"
new_storage_size: 128
- action: "validate_connection"
- action: "scale_down"
params:
new_size: "standard-2"
- action: "cleanup"
```
### Sample Scenario - HA Type Changes
```yaml
scenarios:
- name: "ha_type_change_test"
description: "Test changing HA types"
steps:
- action: "create_database"
params:
size: "standard-2"
storage_size: 64
ha_type: "none"
- action: "validate_connection"
- action: "change_ha_type"
params:
new_ha_type: "async"
- action: "validate_replication"
- action: "change_ha_type"
params:
new_ha_type: "sync"
- action: "validate_replication"
- action: "cleanup"
```
## Test Scenarios Included
### 1. Basic Database Operations
- Tests database creation, connection, and basic data operations
- Validates CRUD operations and data integrity
### 2. Database Scaling Test
- Tests scaling up compute and storage resources
- Tests scaling down compute resources
- Validates connectivity and operations after scaling
### 3. HA Type Change Test
- Tests changing from none → async → sync HA types
- Validates replication functionality after each change
### 4. Maintenance & Configuration Test
- Tests setting maintenance windows
- Tests configuration updates for PostgreSQL and PgBouncer
- Tests database restarts and recovery
### 5. Security & Firewall Test
- Tests password reset functionality
- Tests firewall rule creation and management
- Validates security configurations
### 6. Backup & Restore Test
- Tests database backup and restore operations
- Validates data integrity after restore
- Tests point-in-time recovery
### 7. Comprehensive HA Test
- Tests complete high availability setup
- Tests read replica creation and failover
- Validates data consistency across replicas
### 8. Performance Stress Test
- Tests database under high load conditions
- Runs concurrent operations for specified duration
- Collects performance metrics during stress
### 9. Complete Lifecycle Test
- Tests entire database lifecycle with all operations
- Combines scaling, HA changes, configuration updates
- Comprehensive validation of all features
### 10. Flavor Testing
- Tests different PostgreSQL flavors (standard, paradedb, lantern)
- Validates flavor-specific functionality and scaling
### 11. Version Compatibility Test
- Tests different PostgreSQL versions (16, 17)
- Validates version-specific features and operations
### 12. Edge Case Testing
- Tests error conditions and recovery scenarios
- Validates system behavior under unusual conditions
## Action Parameters
### create_database
- `size`: VM size (standard-2, standard-4, standard-8)
- `storage_size`: Storage size in GiB
- `ha_type`: High availability type (none, async, sync)
- `version`: PostgreSQL version (16, 17)
- `flavor`: Database flavor (standard, paradedb, lantern)
### write_data
- `table_count`: Number of tables to create
- `rows_per_table`: Number of rows to insert per table
### scale_up
- `new_size`: Target VM size
- `new_storage_size`: Target storage size in GiB
### scale_down
- `new_size`: Target VM size (storage cannot be reduced)
### change_ha_type
- `new_ha_type`: Target HA type (none, async, sync)
### set_maintenance_window
- `maintenance_hour`: Hour of day for maintenance (0-23)
### stress_test
- `duration_minutes`: Duration of stress test in minutes
## Validation Mechanisms
### Data Integrity
- **Checksums**: MD5 checksums of table data to detect corruption
- **Row Counts**: Verification of expected vs actual row counts
- **Cross-Replica Consistency**: Validates data consistency across replicas
### Replication Validation
- **LSN Monitoring**: Tracks Log Sequence Numbers for replication lag
- **Standby Status**: Monitors replication connection status
- **Data Consistency**: Ensures data appears correctly on all replicas
- **HA Type Verification**: Confirms HA configuration changes
### Scaling Validation
- **Resource Verification**: Confirms compute and storage changes
- **Performance Impact**: Measures performance before/after scaling
- **Data Preservation**: Ensures no data loss during scaling
### Configuration Validation
- **Parameter Verification**: Confirms configuration changes applied
- **Restart Recovery**: Validates database recovery after restarts
- **Maintenance Window**: Confirms maintenance window settings
### Security Validation
- **Password Changes**: Validates new password functionality
- **Firewall Rules**: Tests network access with new rules
- **Connection Security**: Ensures secure connections maintained
## Output and Reporting
The framework generates detailed JSON reports containing:
- **Test Summary**: Overall success/failure rates and timing
- **Scenario Results**: Individual scenario outcomes and performance
- **Step Details**: Granular information about each test step
- **Error Information**: Detailed error messages and stack traces
- **Performance Metrics**: Database and replication performance data
- **Scaling Results**: Before/after resource configurations
- **Security Test Results**: Password and firewall validation outcomes
### Sample Report Structure
```json
{
"test_run_summary": {
"timestamp": "2025-01-07T22:30:00",
"total_scenarios": 15,
"successful_scenarios": 14,
"failed_scenarios": 1,
"total_duration": 2456.78
},
"scenarios": [
{
"name": "database_scaling_test",
"success": true,
"duration": 345.67,
"summary": {
"total_steps": 7,
"successful_steps": 7,
"failed_steps": 0,
"databases_created": 1
},
"steps": [
{
"name": "scale_up",
"success": true,
"duration": 120.45,
"message": "Database scaled from standard-2/64GB to standard-4/128GB"
}
]
}
]
}
```
## Database Cleanup
The framework automatically cleans up created databases after each scenario. However, if tests are interrupted, you may need to manually clean up resources through the Ubicloud console.
## Troubleshooting
### Common Issues
1. **Connection Timeouts**: Increase timeout values in the configuration
2. **API Rate Limits**: Add delays between API calls or reduce parallelism
3. **Database Creation Failures**: Check quotas and resource availability
4. **SSL Certificate Issues**: Verify SSL configuration for database connections
5. **Scaling Timeouts**: Allow more time for scaling operations (up to 15 minutes)
6. **HA Type Change Failures**: Ensure sufficient resources for HA configurations
### Debug Mode
Enable verbose logging to get detailed information about test execution:
```bash
python postgres_fuzz_tester.py --config .local-api-credentials --scenarios test_scenarios.yaml --verbose
```
## Performance Considerations
- **Scaling Operations**: Can take 5-15 minutes depending on size changes
- **HA Type Changes**: May take 10-20 minutes for sync/async configurations
- **Backup/Restore**: Duration depends on database size and data volume
- **Stress Tests**: Configurable duration, recommended 2-10 minutes for testing
## Extending the Framework
### Adding New Test Actions
1. Implement the test method in the `PostgresFuzzTester` class
2. Add the action handler in the `run_scenario` method
3. Update the documentation with the new action
### Custom Validation
Extend the `PostgreSQLClient` class to add custom validation methods for specific use cases.
### Additional Metrics
Add new metrics collection by extending the `test_performance_metrics` method.
## Security Considerations
- Store API credentials securely and never commit them to version control
- Use environment variables or secure credential management systems
- Limit API token permissions to only required operations
- Monitor test activities for unexpected resource usage
- Test firewall rules carefully to avoid blocking legitimate access
## Best Practices
1. **Start Small**: Begin with basic scenarios before running comprehensive tests
2. **Monitor Resources**: Watch for quota limits and resource consumption
3. **Cleanup Verification**: Ensure all test databases are properly cleaned up
4. **Parallel Limits**: Don't exceed API rate limits with too many parallel tests
5. **Test Isolation**: Each scenario should be independent and self-contained
## Contributing
1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Submit a pull request with detailed description
## License
This project is licensed under the same terms as the Ubicloud project.