MOP Nushell Automation Scripts

Comprehensive automation scripts for managing the Metrics Observability Platform (MOP).

Prerequisites

Nushell >= 0.80.0
kubectl - Kubernetes CLI
tanka - Jsonnet-based Kubernetes configuration tool
helm - Kubernetes package manager
jq - JSON processor
jsonnet and jsonnet-bundler - Jsonnet tools

Scripts Overview

1. setup.nu - Environment Setup

Complete environment initialization and configuration.

Features:

✅ Prerequisites validation (kubectl, tanka, helm, jq, jsonnet, jb)
✅ Kubernetes cluster connectivity testing
✅ Tanka environment initialization
✅ Jsonnet dependency vendoring
✅ Namespace creation
✅ CRD installation

Usage:

# Setup development environment
./setup.nu --env dev
 
# Setup staging without vendoring
./setup.nu --env staging --skip-vendor
 
# Force reinstall CRDs
./setup.nu --env prod --force

Options:

--env <dev|staging|prod> - Environment to setup (default: dev)
--skip-vendor - Skip vendoring Jsonnet dependencies
--force - Force reinstall CRDs

2. deploy.nu - Safe Deployment

Production-ready deployment with validation and rollback support.

Features:

🔍 Pre-deployment validation checks
📊 Interactive diff review
⚠️ User confirmation prompts
⏳ Progressive rollout monitoring
🧪 Post-deployment smoke tests
🔄 Automatic rollback on failure

Usage:

# Deploy to development (with confirmation)
./deploy.nu --env dev
 
# Deploy specific component
./deploy.nu --env staging --component mimir-ingester
 
# Auto-approve deployment (CI/CD)
./deploy.nu --env dev --auto-approve
 
# Skip smoke tests
./deploy.nu --env prod --no-smoke-test
 
# Custom timeout
./deploy.nu --env staging --timeout 900

Options:

--env <environment> - Target environment (required)
--component <name> - Deploy specific component only
--auto-approve - Skip confirmation prompts
--no-smoke-test - Skip post-deployment tests
--timeout <seconds> - Deployment timeout (default: 600)

Safety Features:

Pre-deployment validation
Cluster connectivity check
Configuration validation
Resource availability check
Pod health verification
Service endpoint validation
Component health monitoring

3. health-check.nu - System Health Monitoring

Comprehensive health verification for all MOP components.

Features:

🏥 Pod status and readiness checks
📡 Service endpoint validation
📊 Metrics endpoint verification
🔗 Inter-component connectivity tests
💻 Resource utilization monitoring
📈 Health report generation
👁️ Continuous watch mode

Usage:

# Check all components
./health-check.nu --env dev
 
# Check specific component
./health-check.nu --env prod --component mimir-ingester
 
# Export report as JSON
./health-check.nu --env staging --format json --export health-report.json
 
# Continuous monitoring (watch mode)
./health-check.nu --env dev --watch
 
# Generate markdown report
./health-check.nu --env prod --format markdown --export report.md

Options:

--env <environment> - Target environment (required)
--component <name> - Check specific component only
--format <table|json|markdown> - Output format (default: table)
--export <path> - Export report to file
--watch - Continuous monitoring mode

Health Checks:

Pod phase and container status
Container restart counts
Service endpoint availability
Metrics endpoint accessibility (:8080/metrics)
Inter-component connectivity (distributor→ingester, query-frontend→querier)
Resource usage (CPU, memory)

4. cost-analysis.nu - Cost Analysis & Optimization

Analyze costs and generate optimization recommendations.

Features:

💰 Storage cost estimation
⚡ Compute cost calculation
📈 Ingestion cost analysis
📊 Cost breakdown by service
🎯 Optimization recommendations
📉 Baseline comparison
💡 Potential savings estimates

Usage:

# Analyze current costs
./cost-analysis.nu --env prod
 
# Custom analysis period
./cost-analysis.nu --env prod --period 30d
 
# Compare to baseline
./cost-analysis.nu --env prod --baseline baseline-2024-01.json
 
# Export as CSV
./cost-analysis.nu --env staging --format csv --export costs.csv
 
# Custom Mimir endpoint
./cost-analysis.nu --env dev --mimir-url http://mimir.example.com:8080

Options:

--env <environment> - Target environment (required)
--period <duration> - Analysis period: 1h, 1d, 7d, 30d (default: 7d)
--format <table|json|csv> - Output format (default: table)
--export <path> - Export report to file
--baseline <path> - Compare to baseline file
--mimir-url <url> - Mimir query endpoint (default: http://localhost:8080)

Cost Metrics:

Active time series count
Sample ingestion rate
Query request rate
Storage utilization
Ingester instance count
Storage block count

Recommendations Include:

Data retention policy optimization
Ingester scaling recommendations
Service-level trace sampling adjustments
Adaptive sampling enablement
Tiered storage strategy suggestions

5. backup.nu - Configuration Backup

Automated backup of configurations and dashboards.

Features:

📊 Grafana dashboard export
🔌 Grafana datasource backup
⚙️ Tanka configuration backup
☸️ Kubernetes resource export
📦 Compressed archive creation
☁️ Cloud storage upload (S3/GCS)
🧹 Automatic retention cleanup
✅ Backup integrity verification

Usage:

# Basic backup
./backup.nu --env prod
 
# Custom output directory
./backup.nu --env staging --output /backups
 
# Upload to S3
./backup.nu --env prod --upload s3://my-bucket/mop-backups
 
# Upload to GCS
./backup.nu --env prod --upload gs://my-bucket/mop-backups
 
# Custom retention period
./backup.nu --env dev --retention 60
 
# With Grafana credentials
./backup.nu --env prod --grafana-url http://grafana.local --grafana-token <token>

Options:

--env <environment> - Target environment (required)
--output <path> - Output directory (default: backups)
--upload <url> - Cloud storage URL (s3:// or gs://)
--retention <days> - Retention period (default: 30)
--grafana-url <url> - Grafana URL (default: http://localhost:3000)
--grafana-token <token> - Grafana API token (or use GRAFANA_TOKEN env var)

Backup Contents:

Grafana dashboards (JSON)
Grafana datasources (JSON, credentials sanitized)
Tanka environments and libraries
Rendered Kubernetes manifests
ConfigMaps, Secrets, Services
Deployments, StatefulSets
PVCs, Ingresses

Archive Format:

mop-prod-20240106-143022.tar.gz
├── grafana/
│   ├── dashboards/
│   │   ├── mimir-overview.json
│   │   └── trace-analysis.json
│   └── datasources/
│       ├── mimir.json
│       └── tempo.json
├── tanka/
│   ├── environments/
│   ├── lib/
│   ├── jsonnetfile.json
│   └── rendered/
│       └── prod.yaml
└── kubernetes/
    ├── configmaps.yaml
    ├── deployments.yaml
    └── services.yaml

6. experiment-runner.nu - OBI Experiment Automation

Automated experiment execution and analysis using the Observability-by-Inference framework.

Features:

🧪 Automated experiment execution
📊 Baseline metric collection
🚀 Experimental change deployment
👁️ Continuous metric monitoring
🔍 Statistical analysis
📈 Improvement calculation
🎯 Automated recommendations
🔄 Automatic rollback on degradation
📄 Comprehensive report generation

Usage:

# Run experiment from config
./experiment-runner.nu --config experiments/adaptive-sampling.json --env dev
 
# Custom duration
./experiment-runner.nu --config exp.json --env staging --duration 7200
 
# Auto-rollback on degradation
./experiment-runner.nu --config exp.json --env prod --auto-rollback
 
# Export results
./experiment-runner.nu --config exp.json --env dev --export results.json
 
# Extended baseline collection
./experiment-runner.nu --config exp.json --env staging --baseline-duration 600

Options:

--config <path> - Experiment configuration file (required)
--env <environment> - Target environment (default: dev)
--duration <seconds> - Experiment duration (default: 3600)
--baseline-duration <seconds> - Baseline collection period (default: 300)
--auto-rollback - Automatically rollback on metric degradation
--export <path> - Export results to file

Experiment Configuration Format:

{
  "name": "Adaptive Sampling Test",
  "description": "Test adaptive sampling impact on cost and quality",
  "changes": [
    {
      "type": "deployment",
      "component": "mimir-distributor",
      "container": "distributor",
      "parameter": "SAMPLE_RATE",
      "value": "0.5"
    }
  ],
  "success_metrics": [
    {
      "name": "ingestion_rate",
      "query": "sum(rate(mimir_distributor_samples_in_total[5m]))",
      "direction": "lower",
      "threshold": 10000
    },
    {
      "name": "query_latency_p95",
      "query": "histogram_quantile(0.95, rate(mimir_request_duration_seconds_bucket[5m]))",
      "direction": "lower",
      "threshold": 0.5
    }
  ]
}

Change Types:

deployment - Modify deployment environment variables
configmap - Update ConfigMap values

Metric Directions:

lower - Lower is better (latency, cost, errors)
higher - Higher is better (throughput, availability)

Analysis Recommendations:

adopt - Score ≥ 0.8, clear improvement
investigate - Score ≥ 0.5, inconclusive results
rollback - Score < 0.5, degradation detected

Common Workflows

Initial Setup

# 1. Setup environment
./setup.nu --env dev
 
# 2. Deploy components
./deploy.nu --env dev
 
# 3. Verify health
./health-check.nu --env dev

Production Deployment

# 1. Deploy to staging first
./deploy.nu --env staging
 
# 2. Run health checks
./health-check.nu --env staging
 
# 3. Create backup before prod deployment
./backup.nu --env prod --upload s3://backups/mop
 
# 4. Deploy to production
./deploy.nu --env prod
 
# 5. Monitor health continuously
./health-check.nu --env prod --watch

Cost Optimization

# 1. Analyze current costs
./cost-analysis.nu --env prod --export baseline.json
 
# 2. Run experiment with optimizations
./experiment-runner.nu --config optimize-sampling.json --env dev
 
# 3. Compare results
./cost-analysis.nu --env prod --baseline baseline.json
 
# 4. Deploy if successful
./deploy.nu --env prod --component mimir-distributor

Disaster Recovery

# 1. Create comprehensive backup
./backup.nu --env prod --upload s3://dr-backups/mop
 
# 2. If recovery needed, restore from backup
# (Manual restoration from backup archive)
 
# 3. Verify health after restoration
./health-check.nu --env prod --format json --export health-report.json

Environment Variables

Grafana Authentication

export GRAFANA_TOKEN="your-api-token"
./backup.nu --env prod

Custom Kubernetes Context

export KUBECONFIG=/path/to/kubeconfig
./deploy.nu --env prod

AWS Credentials (for S3 upload)

export AWS_ACCESS_KEY_ID="your-key"
export AWS_SECRET_ACCESS_KEY="your-secret"
./backup.nu --env prod --upload s3://bucket/path

GCP Credentials (for GCS upload)

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/credentials.json"
./backup.nu --env prod --upload gs://bucket/path

Nushell Features Used

These scripts leverage Nushell’s powerful features:

Structured Data: All data is typed and structured
Pipelines: Clean data transformation with |
Error Handling: Robust try/catch blocks
Type Safety: Strong typing for function parameters
Tables: Beautiful table formatting with | table -e
JSON Support: Native JSON parsing with from json / to json
YAML Support: Native YAML parsing with from yaml / to yaml
Date/Time: Built-in date manipulation
Math Operations: Native math functions
HTTP Requests: Built-in HTTP client
ANSI Colors: Rich terminal output with color support

Troubleshooting

Script Permissions

chmod +x scripts/nu/*.nu

Missing Tools

# Install Nushell
brew install nushell
 
# Install Kubernetes tools
brew install kubectl tanka helm
 
# Install Jsonnet tools
brew install jsonnet jsonnet-bundler
 
# Install utilities
brew install jq

Port Forward Issues

# Check existing port forwards
ps aux | grep port-forward
 
# Kill existing port forwards
pkill -f "port-forward.*mimir"
 
# Manually setup port forward
kubectl port-forward -n mop-prod svc/mimir-query-frontend 8080:8080

Grafana Connection

# Test Grafana connectivity
curl -H "Authorization: Bearer $GRAFANA_TOKEN" http://localhost:3000/api/health
 
# Generate API token in Grafana
# Settings → API Keys → Add API Key

Best Practices

Always run health checks after deployment

./deploy.nu --env prod && ./health-check.nu --env prod

Create backups before major changes

./backup.nu --env prod --upload s3://backups/mop

Test in dev/staging first

./deploy.nu --env dev
./health-check.nu --env dev
./deploy.nu --env staging
./deploy.nu --env prod

Use experiments for risky changes

./experiment-runner.nu --config change.json --env dev --auto-rollback

Monitor costs regularly

# Weekly cost analysis
./cost-analysis.nu --env prod --export "costs-$(date +%Y%m%d).json"

Contributing

When adding new scripts:

Follow the existing structure and naming conventions
Include comprehensive error handling
Add detailed comments and documentation
Use Nushell idioms (structured data, pipelines)
Provide helpful output with ANSI colors
Include usage examples in comments

License

Part of the MOP (Metrics Observability Platform) project.

Raibid Labs Documentation

Explorer

README

MOP Nushell Automation Scripts

Prerequisites

Scripts Overview

1. setup.nu - Environment Setup

2. deploy.nu - Safe Deployment

3. health-check.nu - System Health Monitoring

4. cost-analysis.nu - Cost Analysis & Optimization

5. backup.nu - Configuration Backup

6. experiment-runner.nu - OBI Experiment Automation

Common Workflows

Initial Setup

Production Deployment

Cost Optimization

Disaster Recovery

Environment Variables

Grafana Authentication

Custom Kubernetes Context

AWS Credentials (for S3 upload)

GCP Credentials (for GCS upload)

Nushell Features Used

Troubleshooting

Script Permissions

Missing Tools

Port Forward Issues

Grafana Connection

Best Practices

Contributing

License

Graph View

Table of Contents