MOP Nushell Automation Scripts
Comprehensive automation scripts for managing the Metrics Observability Platform (MOP).
Prerequisites
- Nushell >= 0.80.0
kubectl- Kubernetes CLItanka- Jsonnet-based Kubernetes configuration toolhelm- Kubernetes package managerjq- JSON processorjsonnetandjsonnet-bundler- Jsonnet tools
Scripts Overview
1. setup.nu - Environment Setup
Complete environment initialization and configuration.
Features:
- ✅ Prerequisites validation (kubectl, tanka, helm, jq, jsonnet, jb)
- ✅ Kubernetes cluster connectivity testing
- ✅ Tanka environment initialization
- ✅ Jsonnet dependency vendoring
- ✅ Namespace creation
- ✅ CRD installation
Usage:
# Setup development environment
./setup.nu --env dev
# Setup staging without vendoring
./setup.nu --env staging --skip-vendor
# Force reinstall CRDs
./setup.nu --env prod --forceOptions:
--env <dev|staging|prod>- Environment to setup (default: dev)--skip-vendor- Skip vendoring Jsonnet dependencies--force- Force reinstall CRDs
2. deploy.nu - Safe Deployment
Production-ready deployment with validation and rollback support.
Features:
- 🔍 Pre-deployment validation checks
- 📊 Interactive diff review
- ⚠️ User confirmation prompts
- ⏳ Progressive rollout monitoring
- 🧪 Post-deployment smoke tests
- 🔄 Automatic rollback on failure
Usage:
# Deploy to development (with confirmation)
./deploy.nu --env dev
# Deploy specific component
./deploy.nu --env staging --component mimir-ingester
# Auto-approve deployment (CI/CD)
./deploy.nu --env dev --auto-approve
# Skip smoke tests
./deploy.nu --env prod --no-smoke-test
# Custom timeout
./deploy.nu --env staging --timeout 900Options:
--env <environment>- Target environment (required)--component <name>- Deploy specific component only--auto-approve- Skip confirmation prompts--no-smoke-test- Skip post-deployment tests--timeout <seconds>- Deployment timeout (default: 600)
Safety Features:
- Pre-deployment validation
- Cluster connectivity check
- Configuration validation
- Resource availability check
- Pod health verification
- Service endpoint validation
- Component health monitoring
3. health-check.nu - System Health Monitoring
Comprehensive health verification for all MOP components.
Features:
- 🏥 Pod status and readiness checks
- 📡 Service endpoint validation
- 📊 Metrics endpoint verification
- 🔗 Inter-component connectivity tests
- 💻 Resource utilization monitoring
- 📈 Health report generation
- 👁️ Continuous watch mode
Usage:
# Check all components
./health-check.nu --env dev
# Check specific component
./health-check.nu --env prod --component mimir-ingester
# Export report as JSON
./health-check.nu --env staging --format json --export health-report.json
# Continuous monitoring (watch mode)
./health-check.nu --env dev --watch
# Generate markdown report
./health-check.nu --env prod --format markdown --export report.mdOptions:
--env <environment>- Target environment (required)--component <name>- Check specific component only--format <table|json|markdown>- Output format (default: table)--export <path>- Export report to file--watch- Continuous monitoring mode
Health Checks:
- Pod phase and container status
- Container restart counts
- Service endpoint availability
- Metrics endpoint accessibility (
:8080/metrics) - Inter-component connectivity (distributor→ingester, query-frontend→querier)
- Resource usage (CPU, memory)
4. cost-analysis.nu - Cost Analysis & Optimization
Analyze costs and generate optimization recommendations.
Features:
- 💰 Storage cost estimation
- ⚡ Compute cost calculation
- 📈 Ingestion cost analysis
- 📊 Cost breakdown by service
- 🎯 Optimization recommendations
- 📉 Baseline comparison
- 💡 Potential savings estimates
Usage:
# Analyze current costs
./cost-analysis.nu --env prod
# Custom analysis period
./cost-analysis.nu --env prod --period 30d
# Compare to baseline
./cost-analysis.nu --env prod --baseline baseline-2024-01.json
# Export as CSV
./cost-analysis.nu --env staging --format csv --export costs.csv
# Custom Mimir endpoint
./cost-analysis.nu --env dev --mimir-url http://mimir.example.com:8080Options:
--env <environment>- Target environment (required)--period <duration>- Analysis period: 1h, 1d, 7d, 30d (default: 7d)--format <table|json|csv>- Output format (default: table)--export <path>- Export report to file--baseline <path>- Compare to baseline file--mimir-url <url>- Mimir query endpoint (default: http://localhost:8080)
Cost Metrics:
- Active time series count
- Sample ingestion rate
- Query request rate
- Storage utilization
- Ingester instance count
- Storage block count
Recommendations Include:
- Data retention policy optimization
- Ingester scaling recommendations
- Service-level trace sampling adjustments
- Adaptive sampling enablement
- Tiered storage strategy suggestions
5. backup.nu - Configuration Backup
Automated backup of configurations and dashboards.
Features:
- 📊 Grafana dashboard export
- 🔌 Grafana datasource backup
- ⚙️ Tanka configuration backup
- ☸️ Kubernetes resource export
- 📦 Compressed archive creation
- ☁️ Cloud storage upload (S3/GCS)
- 🧹 Automatic retention cleanup
- ✅ Backup integrity verification
Usage:
# Basic backup
./backup.nu --env prod
# Custom output directory
./backup.nu --env staging --output /backups
# Upload to S3
./backup.nu --env prod --upload s3://my-bucket/mop-backups
# Upload to GCS
./backup.nu --env prod --upload gs://my-bucket/mop-backups
# Custom retention period
./backup.nu --env dev --retention 60
# With Grafana credentials
./backup.nu --env prod --grafana-url http://grafana.local --grafana-token <token>Options:
--env <environment>- Target environment (required)--output <path>- Output directory (default: backups)--upload <url>- Cloud storage URL (s3:// or gs://)--retention <days>- Retention period (default: 30)--grafana-url <url>- Grafana URL (default: http://localhost:3000)--grafana-token <token>- Grafana API token (or use GRAFANA_TOKEN env var)
Backup Contents:
- Grafana dashboards (JSON)
- Grafana datasources (JSON, credentials sanitized)
- Tanka environments and libraries
- Rendered Kubernetes manifests
- ConfigMaps, Secrets, Services
- Deployments, StatefulSets
- PVCs, Ingresses
Archive Format:
mop-prod-20240106-143022.tar.gz
├── grafana/
│ ├── dashboards/
│ │ ├── mimir-overview.json
│ │ └── trace-analysis.json
│ └── datasources/
│ ├── mimir.json
│ └── tempo.json
├── tanka/
│ ├── environments/
│ ├── lib/
│ ├── jsonnetfile.json
│ └── rendered/
│ └── prod.yaml
└── kubernetes/
├── configmaps.yaml
├── deployments.yaml
└── services.yaml
6. experiment-runner.nu - OBI Experiment Automation
Automated experiment execution and analysis using the Observability-by-Inference framework.
Features:
- 🧪 Automated experiment execution
- 📊 Baseline metric collection
- 🚀 Experimental change deployment
- 👁️ Continuous metric monitoring
- 🔍 Statistical analysis
- 📈 Improvement calculation
- 🎯 Automated recommendations
- 🔄 Automatic rollback on degradation
- 📄 Comprehensive report generation
Usage:
# Run experiment from config
./experiment-runner.nu --config experiments/adaptive-sampling.json --env dev
# Custom duration
./experiment-runner.nu --config exp.json --env staging --duration 7200
# Auto-rollback on degradation
./experiment-runner.nu --config exp.json --env prod --auto-rollback
# Export results
./experiment-runner.nu --config exp.json --env dev --export results.json
# Extended baseline collection
./experiment-runner.nu --config exp.json --env staging --baseline-duration 600Options:
--config <path>- Experiment configuration file (required)--env <environment>- Target environment (default: dev)--duration <seconds>- Experiment duration (default: 3600)--baseline-duration <seconds>- Baseline collection period (default: 300)--auto-rollback- Automatically rollback on metric degradation--export <path>- Export results to file
Experiment Configuration Format:
{
"name": "Adaptive Sampling Test",
"description": "Test adaptive sampling impact on cost and quality",
"changes": [
{
"type": "deployment",
"component": "mimir-distributor",
"container": "distributor",
"parameter": "SAMPLE_RATE",
"value": "0.5"
}
],
"success_metrics": [
{
"name": "ingestion_rate",
"query": "sum(rate(mimir_distributor_samples_in_total[5m]))",
"direction": "lower",
"threshold": 10000
},
{
"name": "query_latency_p95",
"query": "histogram_quantile(0.95, rate(mimir_request_duration_seconds_bucket[5m]))",
"direction": "lower",
"threshold": 0.5
}
]
}Change Types:
deployment- Modify deployment environment variablesconfigmap- Update ConfigMap values
Metric Directions:
lower- Lower is better (latency, cost, errors)higher- Higher is better (throughput, availability)
Analysis Recommendations:
adopt- Score ≥ 0.8, clear improvementinvestigate- Score ≥ 0.5, inconclusive resultsrollback- Score < 0.5, degradation detected
Common Workflows
Initial Setup
# 1. Setup environment
./setup.nu --env dev
# 2. Deploy components
./deploy.nu --env dev
# 3. Verify health
./health-check.nu --env devProduction Deployment
# 1. Deploy to staging first
./deploy.nu --env staging
# 2. Run health checks
./health-check.nu --env staging
# 3. Create backup before prod deployment
./backup.nu --env prod --upload s3://backups/mop
# 4. Deploy to production
./deploy.nu --env prod
# 5. Monitor health continuously
./health-check.nu --env prod --watchCost Optimization
# 1. Analyze current costs
./cost-analysis.nu --env prod --export baseline.json
# 2. Run experiment with optimizations
./experiment-runner.nu --config optimize-sampling.json --env dev
# 3. Compare results
./cost-analysis.nu --env prod --baseline baseline.json
# 4. Deploy if successful
./deploy.nu --env prod --component mimir-distributorDisaster Recovery
# 1. Create comprehensive backup
./backup.nu --env prod --upload s3://dr-backups/mop
# 2. If recovery needed, restore from backup
# (Manual restoration from backup archive)
# 3. Verify health after restoration
./health-check.nu --env prod --format json --export health-report.jsonEnvironment Variables
Grafana Authentication
export GRAFANA_TOKEN="your-api-token"
./backup.nu --env prodCustom Kubernetes Context
export KUBECONFIG=/path/to/kubeconfig
./deploy.nu --env prodAWS Credentials (for S3 upload)
export AWS_ACCESS_KEY_ID="your-key"
export AWS_SECRET_ACCESS_KEY="your-secret"
./backup.nu --env prod --upload s3://bucket/pathGCP Credentials (for GCS upload)
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/credentials.json"
./backup.nu --env prod --upload gs://bucket/pathNushell Features Used
These scripts leverage Nushell’s powerful features:
- Structured Data: All data is typed and structured
- Pipelines: Clean data transformation with
| - Error Handling: Robust
try/catchblocks - Type Safety: Strong typing for function parameters
- Tables: Beautiful table formatting with
| table -e - JSON Support: Native JSON parsing with
from json/to json - YAML Support: Native YAML parsing with
from yaml/to yaml - Date/Time: Built-in date manipulation
- Math Operations: Native math functions
- HTTP Requests: Built-in HTTP client
- ANSI Colors: Rich terminal output with color support
Troubleshooting
Script Permissions
chmod +x scripts/nu/*.nuMissing Tools
# Install Nushell
brew install nushell
# Install Kubernetes tools
brew install kubectl tanka helm
# Install Jsonnet tools
brew install jsonnet jsonnet-bundler
# Install utilities
brew install jqPort Forward Issues
# Check existing port forwards
ps aux | grep port-forward
# Kill existing port forwards
pkill -f "port-forward.*mimir"
# Manually setup port forward
kubectl port-forward -n mop-prod svc/mimir-query-frontend 8080:8080Grafana Connection
# Test Grafana connectivity
curl -H "Authorization: Bearer $GRAFANA_TOKEN" http://localhost:3000/api/health
# Generate API token in Grafana
# Settings → API Keys → Add API KeyBest Practices
-
Always run health checks after deployment
./deploy.nu --env prod && ./health-check.nu --env prod -
Create backups before major changes
./backup.nu --env prod --upload s3://backups/mop -
Test in dev/staging first
./deploy.nu --env dev ./health-check.nu --env dev ./deploy.nu --env staging ./deploy.nu --env prod -
Use experiments for risky changes
./experiment-runner.nu --config change.json --env dev --auto-rollback -
Monitor costs regularly
# Weekly cost analysis ./cost-analysis.nu --env prod --export "costs-$(date +%Y%m%d).json"
Contributing
When adding new scripts:
- Follow the existing structure and naming conventions
- Include comprehensive error handling
- Add detailed comments and documentation
- Use Nushell idioms (structured data, pipelines)
- Provide helpful output with ANSI colors
- Include usage examples in comments
License
Part of the MOP (Metrics Observability Platform) project.