KEDA Autoscaling Behavior Documentation

Overview

This document describes how KEDA autoscales raibid-ci agents based on Redis Streams queue depth using ScaledJob.

Scaling Lifecycle

1. Queue Empty (Scale-to-Zero)

State: No jobs in Redis Streams KEDA Action: No Kubernetes Jobs exist Resource Usage: Zero (only KEDA operator running)

Redis Queue: []
Kubernetes Jobs: 0
Agent Pods: 0

2. Job Added to Queue

Event: Developer pushes code, triggering CI job Action: Job dispatcher adds entry to Redis Streams

XADD raibid:jobs * \
  job_id abc123 \
  repo raibid-labs/app \
  branch feature/new \
  commit def456

Redis State:

Stream: raibid:jobs
Length: 1
Pending Entries: 1 (in consumer group raibid-workers)

3. KEDA Detects Job (Polling)

Timing: Within 10 seconds (polling interval) Action: KEDA queries Redis Streams trigger

KEDA runs this logic:

pending_count = XPENDING raibid:jobs raibid-workers
if pending_count >= pendingEntriesCount (1):
    create_kubernetes_job()

KEDA Logs:

[INFO] scaledjob/raibid-ci-agent: Scaling from 0 to 1 jobs
[INFO] redis-streams: pending entries = 1, threshold = 1

4. Kubernetes Job Created

Timing: Within 15 seconds of queue detection Action: KEDA creates Job from ScaledJob template

apiVersion: batch/v1
kind: Job
metadata:
  name: raibid-ci-agent-abc123
  namespace: raibid-ci
  ownerReferences:
  - apiVersion: keda.sh/v1alpha1
    kind: ScaledJob
    name: raibid-ci-agent
spec:
  template:
    spec:
      containers:
      - name: rust-agent
        image: ghcr.io/raibid-labs/rust-agent:latest
        # ... environment, resources, etc

5. Pod Scheduled and Running

Timing: Within 30 seconds (image pull + pod start) Actions:

Kubernetes scheduler assigns pod to node
Kubelet pulls container image (if not cached)
Container starts, agent begins execution

Agent Actions:

1. Connect to Redis
2. Read from consumer group: XREADGROUP raibid-workers consumer1 raibid:jobs
3. Process job (build, test, etc)
4. Acknowledge job: XACK raibid:jobs raibid-workers <job-id>
5. Exit with code 0 (success) or 1 (failure)

6. Job Completion

Timing: Variable (depends on job duration) Actions:

Pod exits
Job status updated to Complete or Failed
Pod enters Completed state

Kubernetes State:

Job: raibid-ci-agent-abc123
  Status: Complete
  Succeeded: 1
  Failed: 0
  Start Time: 2025-11-01T10:00:00Z
  Completion Time: 2025-11-01T10:05:30Z
  Duration: 5m30s

7. Job History Management

KEDA Action: Keep completed jobs based on history limits

successfulJobsHistoryLimit: 3  # Keep last 3 successful
failedJobsHistoryLimit: 5      # Keep last 5 failed

Old jobs are automatically deleted to prevent resource accumulation.

8. Return to Scale-to-Zero

Condition: No pending entries in Redis Streams Timing: Immediate (no cooldown for ScaledJob) Action: No new jobs created

Redis Queue: [] (all processed)
Kubernetes Jobs: 3 (completed, kept for history)
Active Pods: 0

Scaling Scenarios

Scenario A: Single Job

Time  Queue  Jobs  Pods  Action
0s    0      0     0     Idle
10s   1      0     0     KEDA detects job
15s   1      1     0     Job created
20s   1      1     1     Pod running
5m    0      1     0     Job complete, pod terminated

Scenario B: Burst of Jobs

Time  Queue  Jobs  Pods  Action
0s    0      0     0     Idle
5s    10     0     0     10 jobs added
15s   10     10    5     KEDA creates 10 jobs, 5 pods running
20s   10     10    10    All 10 pods running (max replicas)
25s   8      10    10    2 jobs complete
30s   5      10    10    5 jobs complete
35s   2      10    8     8 jobs complete
40s   0      10    2     All jobs complete, last 2 pods finishing
45s   0      10    0     All pods terminated

Scenario C: Continuous Flow

Time  Queue  Jobs  Pods  Action
0s    5      5     5     5 jobs processing
10s   5      5     5     2 complete, 2 new jobs added
20s   5      5     5     Steady state (jobs in = jobs out)
30s   8      8     8     Burst: 3 new jobs added
40s   10     10    10    Max replicas reached, 2 jobs queued
50s   7      10    10    3 jobs complete
60s   5      7     7     Back to steady state

Scenario D: Overload (Queue Backup)

Time  Queue  Jobs  Pods  Action
0s    50     0     0     Massive job backlog
10s   50     10    5     KEDA creates max jobs (10), 5 running
20s   50     10    10    All 10 pods running
5m    45     10    10    5 jobs complete, 5 new jobs started
10m   40     10    10    Still at max capacity
15m   30     10    10    Processing continues
...
Queue processes at: 10 jobs per average_job_duration

Key Point: Queue will process at maximum throughput (10 concurrent jobs). Excess jobs wait in Redis.

Scaling Triggers

Redis Streams Trigger

KEDA queries Redis for pending entries:

# What KEDA runs
XPENDING raibid:jobs raibid-workers
 
# Returns:
# [
#   lowest_pending_id,
#   highest_pending_id,
#   pending_count,
#   consumers
# ]

Scaling Logic:

def should_scale():
    pending = get_pending_count()
    running = get_running_jobs()
 
    desired = min(pending, max_replicas)
 
    if desired > running:
        create_jobs(desired - running)
 
    # ScaledJob doesn't scale down - jobs complete naturally

Trigger Metadata

metadata:
  address: raibid-redis-master.raibid-redis.svc.cluster.local:6379
  stream: raibid:jobs
  consumerGroup: raibid-workers
  pendingEntriesCount: "1"  # Minimum to trigger
  streamLength: "5"         # Total stream length threshold
  lagCount: "5"             # Consumer lag threshold
  activationLagCount: "0"   # Start scaling immediately

Multi-Metric Scaling

KEDA can combine multiple metrics:

triggers:
- type: redis-streams
  metadata:
    pendingEntriesCount: "1"
- type: cron
  metadata:
    timezone: UTC
    start: 0 8 * * 1-5    # 8 AM weekdays
    end: 0 18 * * 1-5     # 6 PM weekdays
    desiredReplicas: "5"  # Keep 5 warm during business hours

Scaling Strategies

Default Strategy

Algorithm: 1 job per pending entry Behavior: Conservative, predictable

scalingStrategy:
  strategy: "default"

Example:

5 pending entries → 5 Jobs created
20 pending entries, max 10 → 10 Jobs created

Accurate Strategy

Algorithm: Precise calculation, minimal overprovision Behavior: Slower to scale, most efficient

scalingStrategy:
  strategy: "accurate"

Use Case: Cost-sensitive environments, predictable workloads

Eager Strategy

Algorithm: Aggressive scaling Behavior: Fast response, may overprovision

scalingStrategy:
  strategy: "eager"

Use Case: Time-sensitive CI, fast feedback required

Custom Strategy

Algorithm: User-defined logic

scalingStrategy:
  strategy: "custom"
  customScalingQueueLengthDeduction: 1
  customScalingRunningJobPercentage: "0.5"
  pendingPodConditions:
  - "Ready"
  - "PodScheduled"

Parameters:

customScalingQueueLengthDeduction: Subtract from queue length (accounts for already-running jobs)
customScalingRunningJobPercentage: Consider percentage of running jobs
pendingPodConditions: Wait for pod conditions before counting as “running”

Performance Characteristics

Latency Metrics

Metric	Target	Actual (Typical)
Queue detection	10s	5-15s (polling interval)
Job creation	5s	2-5s
Pod start (cached image)	10s	5-15s
Pod start (pull image)	60s	30-120s
Total (cached)	25s	15-35s
Total (uncached)	75s	45-150s

Throughput

Maximum Throughput: max_replicas / average_job_duration

Examples:

10 max replicas, 5-minute jobs: 2 jobs/minute = 120 jobs/hour
10 max replicas, 30-second jobs: 20 jobs/minute = 1,200 jobs/hour

Resource Efficiency

Idle Cost: $0 (scale-to-zero) Active Cost: Only running jobs Overhead: KEDA operator (~250m CPU, ~320Mi RAM)

Scaling Policies

Job History Retention

successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 5

Why:

Keep recent jobs for debugging
Prevent resource accumulation
Failed jobs retained longer for troubleshooting

Polling Interval

pollingInterval: 10  # seconds

Trade-offs:

Lower (5s): Faster response, higher Redis load
Higher (30s): Lower overhead, slower response

Recommendation: 10s for most workloads

Maximum Replicas

maxReplicaCount: 10

Calculation:

max_replicas = min(
    available_cluster_resources / job_resource_request,
    desired_parallelism,
    cost_budget_limit
)

DGX Spark Example (20 cores, 128GB RAM):

# Each job: 1 CPU, 2GB RAM
max_cpu_replicas = 20 / 1 = 20
max_mem_replicas = 128 / 2 = 64
max_replicas = min(20, 64) = 20
 
# With 50% reserved for system:
max_replicas = 10

Autoscaling Best Practices

1. Right-Size Resource Requests

resources:
  requests:
    cpu: 1000m      # Based on actual usage
    memory: 2Gi     # 80% of typical usage
  limits:
    cpu: 4000m      # 150-200% of requests
    memory: 8Gi     # 150-200% of requests

2. Use Consumer Groups Correctly

# Create consumer group before deploying ScaledJob
redis-cli XGROUP CREATE raibid:jobs raibid-workers 0 MKSTREAM

3. Monitor Queue Depth

# Watch queue depth
watch -n 5 'redis-cli XLEN raibid:jobs'
 
# Check pending entries
redis-cli XPENDING raibid:jobs raibid-workers SUMMARY

4. Set Appropriate Job Timeouts

jobTargetRef:
  template:
    spec:
      activeDeadlineSeconds: 3600  # Kill after 1 hour
      backoffLimit: 2               # Retry failed jobs twice

5. Implement Health Checks

containers:
- name: rust-agent
  livenessProbe:
    exec:
      command: ["/bin/sh", "-c", "pgrep -f rust-agent"]
    initialDelaySeconds: 30
    periodSeconds: 10
  readinessProbe:
    exec:
      command: ["/bin/sh", "-c", "test -f /tmp/healthy"]
    initialDelaySeconds: 5
    periodSeconds: 5

6. Use Pod Anti-Affinity

Spread jobs across nodes:

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchLabels:
            app: raibid-ci-agent
        topologyKey: kubernetes.io/hostname

7. Enable Metrics Collection

# Agent should expose metrics
- name: METRICS_ENABLED
  value: "true"
- name: METRICS_PORT
  value: "9090"

Troubleshooting Scaling Issues

Jobs Not Scaling

Symptom: Queue has jobs, but no Kubernetes Jobs created

Debug Steps:

# 1. Check KEDA operator logs
kubectl logs -n keda -l app=keda-operator --tail=100
 
# 2. Check ScaledJob status
kubectl describe scaledjob raibid-ci-agent -n raibid-ci
 
# 3. Verify trigger authentication
kubectl get secret raibid-redis-auth -n raibid-ci
 
# 4. Test Redis connection
kubectl run redis-test --rm -it --image=redis -- \
  redis-cli -h raibid-redis-master.raibid-redis.svc.cluster.local PING
 
# 5. Check pending entries
kubectl exec -n raibid-redis raibid-redis-master-0 -- \
  redis-cli XPENDING raibid:jobs raibid-workers

Slow Scaling

Symptom: Jobs created but pods take too long to start

Debug Steps:

# Check pod events
kubectl describe pod -n raibid-ci <pod-name>
 
# Common issues:
# - Image pull (pull image to all nodes beforehand)
# - Resource constraints (check node resources)
# - Scheduling delays (check node availability)

Stuck Jobs

Symptom: Jobs running but never complete

Debug Steps:

# Check job logs
kubectl logs -n raibid-ci job/<job-name>
 
# Check Redis ACK
kubectl exec -n raibid-redis raibid-redis-master-0 -- \
  redis-cli XPENDING raibid:jobs raibid-workers
 
# Common issues:
# - Agent not calling XACK
# - Agent crashed before completion
# - Redis connection lost

Metrics and Monitoring

Key Metrics to Track

Queue Depth: XLEN raibid:jobs
Pending Entries: XPENDING raibid:jobs raibid-workers
Active Jobs: kubectl get jobs -n raibid-ci
Job Success Rate: successful_jobs / total_jobs
Average Job Duration: Time from start to completion
Time to Scale: Time from queue add to pod running
Resource Utilization: CPU/memory usage per job

Prometheus Metrics

# KEDA exposes metrics
- keda_scaler_errors_total
- keda_scaled_job_paused
- keda_scaledjob_max_replicas

Raibid Labs Documentation

Explorer

AUTOSCALING

KEDA Autoscaling Behavior Documentation

Overview

Scaling Lifecycle

1. Queue Empty (Scale-to-Zero)

2. Job Added to Queue

3. KEDA Detects Job (Polling)

4. Kubernetes Job Created

5. Pod Scheduled and Running

6. Job Completion

7. Job History Management

8. Return to Scale-to-Zero

Scaling Scenarios

Scenario A: Single Job

Scenario B: Burst of Jobs

Scenario C: Continuous Flow

Scenario D: Overload (Queue Backup)

Scaling Triggers

Redis Streams Trigger

Trigger Metadata

Multi-Metric Scaling

Scaling Strategies

Default Strategy

Accurate Strategy

Eager Strategy

Custom Strategy

Performance Characteristics

Latency Metrics

Throughput

Resource Efficiency

Scaling Policies

Job History Retention

Polling Interval

Maximum Replicas

Autoscaling Best Practices

1. Right-Size Resource Requests

2. Use Consumer Groups Correctly

3. Monitor Queue Depth

4. Set Appropriate Job Timeouts

5. Implement Health Checks

6. Use Pod Anti-Affinity

7. Enable Metrics Collection

Troubleshooting Scaling Issues

Jobs Not Scaling

Slow Scaling

Stuck Jobs

Metrics and Monitoring

Key Metrics to Track

Prometheus Metrics

References

Graph View

Table of Contents