Infrastructure Error Handling
This directory contains comprehensive error handling and recovery mechanisms for infrastructure operations.
Modules
error.rs
Defines comprehensive error types with detailed context and actionable suggestions.
Key Types:
InfraError: Main error enum with variants for all failure scenariosInstallPhase: Installation phase tracking for detailed error reportingHelmOperation: Helm operation typesValidationError: Validation error details
Features:
- Detailed error messages with context
- Actionable recovery suggestions
- Transient vs fatal error classification
- Automatic retry delay calculation
retry.rs
Implements retry logic with exponential backoff for transient failures.
Key Types:
RetryConfig: Configurable retry behaviorretry_with_backoff: Synchronous retry functionretry_with_backoff_async: Asynchronous retry functionpoll_until: Poll for condition with timeoutpoll_until_async: Async polling
Features:
- Exponential backoff with jitter
- Configurable max attempts and delays
- Automatic transient error detection
- Quick, slow, and custom configurations
preflight.rs
Provides pre-flight validation checks before installation.
Key Types:
SystemRequirements: System prerequisites definitionPreFlightValidator: Validation executorPreFlightResult: Validation results
Checks:
- Disk space availability
- Memory availability
- Required commands in PATH
- Optional commands
- Required directories
- Network connectivity
Predefined Requirements:
k3s_requirements()gitea_requirements()redis_requirements()keda_requirements()flux_requirements()
rollback.rs
Implements transaction-like rollback for infrastructure changes.
Key Types:
RollbackManager: Manages rollback actionsRollbackContext: Tracks installed resourcesRollbackAction: Cleanup function type
Features:
- Automatic rollback on failure (via Drop)
- LIFO action execution
- Resource tracking (files, directories, K8s resources, Helm releases)
- Partial cleanup reporting
- Commit to disable rollback on success
healthcheck.rs
Provides health checking with timeout support.
Key Types:
HealthStatus: Health status enumHealthCheckResult: Health check resultsK3sHealthChecker: K3s cluster health checkerHelmHealthChecker: Helm release health checker
Features:
- Multiple check types
- Configurable timeouts
- Wait-until-healthy functionality
- Detailed check results
Usage Examples
Basic Error Handling
use raibid_cli::infrastructure::{InfraError, InfraResult, InstallPhase};
fn install_component() -> InfraResult<()> {
// Return detailed errors
Err(InfraError::installation(
"k3s",
InstallPhase::Download,
"Failed to download binary",
))
}Retry Logic
use raibid_cli::infrastructure::{RetryConfig, retry_with_backoff};
let config = RetryConfig::quick();
let result = retry_with_backoff(&config, "download", || {
download_file()
})?;Pre-flight Validation
use raibid_cli::infrastructure::{PreFlightValidator, k3s_requirements};
let validator = PreFlightValidator::new(k3s_requirements());
validator.validate("k3s")?;Rollback
use raibid_cli::infrastructure::RollbackManager;
let mut rollback = RollbackManager::new("k3s");
// Add cleanup actions
rollback.add_action("remove binary", Box::new(|| {
std::fs::remove_file("/usr/local/bin/k3s")?;
Ok(())
}));
// On success
rollback.commit();
// On failure, rollback is automaticHealth Checks
use raibid_cli::infrastructure::K3sHealthChecker;
use std::time::Duration;
let checker = K3sHealthChecker::new("/path/to/kubeconfig")
.with_timeout(Duration::from_secs(300));
checker.wait_until_healthy()?;Testing
Run the comprehensive error handling tests:
cargo test --test error_handling_testTests cover:
- Error type creation and formatting
- Retry logic with various scenarios
- Pre-flight validation
- Rollback execution
- Health check evaluation
- Integration workflows
Documentation
See error-recovery.md for:
- Detailed error type documentation
- Troubleshooting guide
- Best practices
- Recovery procedures
- Complete examples
Design Principles
- Fail Fast with Context: Provide immediate, actionable feedback
- Automatic Recovery: Retry transient failures automatically
- Clean Failures: Always clean up on failure
- Observable Operations: Comprehensive logging and status reporting
- Type Safety: Leverage Rust’s type system for correctness
Integration
All infrastructure installers support these error handling features:
- k3s: Pre-flight checks, retry downloads, automatic rollback
- Gitea: Helm operation retries, namespace cleanup
- Redis: Health checks, connection validation
- KEDA: CRD validation, operator health checks
- Flux: Binary verification, bootstrap retries