CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
raibid-ci is a DGX Spark Personal CI Agent Pool - an ephemeral, auto-scaling build system for cross-platform native compilation on NVIDIA DGX Spark. This is a TUI-first, developer-experience-focused tool for provisioning and managing self-hosted CI agents.
Target Hardware
- NVIDIA DGX Spark running Ubuntu 22.04 LTS
- CPU: 20 cores (10x Cortex-X925, 10x Cortex-A725)
- Memory: 128GB LPDDR5x unified memory
- Memory Bandwidth: 273 GB/s
- Storage: Up to 4TB NVMe
- Network: 200 Gb/s ConnectX-7
Technology Stack
Core Infrastructure
- k3s: Lightweight Kubernetes distribution for DGX Spark
- Gitea: Self-hosted Git service with OCI registry
- Flux: GitOps continuous delivery
- KEDA: Kubernetes-based event-driven autoscaling
- Redis Streams: Job queue management
Application Layer
- Rust: Primary language for API server and CLI/TUI client
- Ratatui: Terminal UI framework for management interface
- Nushell: Scripting and automation
Architecture Characteristics
- DX-first: Developer experience is the top priority
- TUI-native: Terminal UI for all management and monitoring
- Ephemeral: Agents spin up on-demand and tear down when idle
- Auto-scaling: KEDA-driven scaling based on job queue depth
- Plugin-based: Extensible architecture for different build agent types
MVP Scope
Infrastructure Setup
- k3s cluster bootstrapping on DGX Spark
- Gitea installation with OCI registry
- Redis Streams for job queueing
- Flux GitOps configuration for deployments from Gitea repo
- KEDA autoscaler integration
API & Client
- Server-side Rust API for job dispatching and TUI communication
- Client-side Rust CLI tool using Ratatui for management, monitoring, and control
- CLI handles infrastructure setup, configuration, and teardown
CI Agents
- MVP focuses on a single Rust agent for building and testing Rust projects
- Emphasis on scaling, scheduling, monitoring, and caching
Repository Mirroring
- Mirror single GitHub repository to Gitea
- Mirror multiple GitHub repositories via list
- Mirror GitHub organization repositories with regex filtering
- Auto-sync on GitHub push (GitHub is source of truth)
Documentation Standards
File Organization
./docs/: All research, notes, diagrams, and documentation./docs/work/: Milestones, issues, and tasks (markdown files formatted for GitHub issues)./docs/diagrams/: Mermaid diagrams for architecture visualization
Style Guidelines
- Use terse language and bullet points
- Create Mermaid diagrams for complex concepts
- Include internal and external links/references
- Keep content concise and scannable
- Markdown files should be GitHub-ready (especially issue descriptions)
Development Workflow
Current Phase
The project is in Planning / MVP Development phase. The immediate focus is on:
- Research and knowledge gathering for required technologies
- Creating comprehensive project plans and documentation
- Architecture design and specification
- No implementation/coding yet - documentation and planning first
Working with This Codebase
- All architectural decisions should be documented in
./docs/ - Use Mermaid diagrams to visualize complex systems and workflows
- When creating issues/tasks, format them as markdown in
./docs/work/for eventual GitHub submission - Consider the DGX Spark hardware constraints (20 cores, 128GB RAM, resource reservation needs)
Design Principles
- Ephemeral by Default: Agents should be stateless and disposable
- Auto-scaling First: KEDA drives all scaling decisions based on job queue
- GitOps Everything: Flux manages all deployments from Gitea
- TUI for Control: All management through terminal interface
- Cache Aggressively: Optimize for build cache hit rates
- Rust for Performance: Critical path uses Rust for speed and safety
Future Considerations
- Tauri GUI for visual management (beyond TUI)
- Multi-DGX clustering for massive workloads
- GPU time-slicing for ML model testing in CI
- Additional build agent types (beyond Rust)