o8 llama - AI Workload Management
The o8 llama command group provides comprehensive management for AI workloads in Orchestr8.
Overview
Orchestr8 AI workloads are built on Llama-Stack, providing enterprise-ready deployment and management of LLM, RAG, and agentic applications. The o8 llama commands handle the complete lifecycle from initialization to monitoring.
Commands
o8 llama init
Initialize a new AI workload module.
o8 llama init <name> [OPTIONS]
Arguments:
name- AI workload name (required)
Options:
--template, -t- Workload template:rag,agent,inference,custom(default:rag)--path, -p- Directory to create workload (default: current directory)--provider- AI provider:openai,anthropic,groq,local(default:openai)
Examples:
# Create a RAG application with OpenAI
o8 llama init my-rag-app --template rag --provider openai
# Create an agentic workflow with Anthropic
o8 llama init my-agent --template agent --provider anthropic
# Create a custom workload in specific directory
o8 llama init my-app --template custom --path ./projects/
Generated Structure:
my-rag-app/
├── .o8/
│ ├── module.yaml # AI workload specification
│ └── security.yaml # Security policies
├── base/ # Base Kubernetes manifests
│ ├── kustomization.yaml
│ ├── deployment.yaml
│ ├── service.yaml
│ └── ...
├── overlays/ # Environment-specific configs
│ ├── dev/
│ ├── staging/
│ └── production/
└── tests/ # Automated tests
o8 llama validate
Validate an AI workload configuration.
o8 llama validate [PATH] [OPTIONS]
Arguments:
path- Path to AI workload directory (default: current directory)
Options:
--verbose, -v- Show detailed validation information
Validation Checks:
- ✅ YAML syntax and structure
- ✅ AI-specific configuration fields
- ✅ GPU requirements and node availability
- ✅ Storage requirements and classes
- ✅ Provider secrets and credentials
- ✅ Kubernetes manifest validity
- ✅ Kustomize build success
Examples:
# Validate current directory
o8 llama validate
# Validate specific workload
o8 llama validate ./my-rag-app --verbose
# Quick validation check
o8 llama validate ./my-app
Sample Output:
✅ AI workload validation passed!
AI Configuration:
├── Providers: openai, anthropic
├── Capabilities: inference, safety, agents
├── GPU Nodes: 2 available
└── Storage: 300Gi required, fast-ssd available
o8 llama deploy
Deploy an AI workload to the cluster.
o8 llama deploy [PATH] [OPTIONS]
Arguments:
path- AI workload path (default: current directory)
Options:
--env, -e- Target environment:dev,staging,production(default:dev)--dry-run- Show what would be deployed without applying changes--wait/--no-wait- Wait for deployment completion (default:--wait)
Examples:
# Deploy to development environment
o8 llama deploy
# Deploy to production with preview
o8 llama deploy --env production --dry-run
# Deploy without waiting
o8 llama deploy --env staging --no-wait
Deployment Process:
- 🔍 Validation - Runs configuration validation
- 🔧 Build - Generates Kubernetes manifests with Kustomize
- 🚀 Deploy - Applies manifests to cluster
- ⏳ Wait - Monitors deployment progress (if
--wait) - ✅ Verify - Confirms successful deployment
o8 llama status
Check the status of AI workloads.
o8 llama status [OPTIONS]
Options:
--namespace, -n- Kubernetes namespace (default:llama-stack)--watch, -w- Watch for changes (continuous monitoring)
Status Information:
- 📊 Pod Status - Running, pending, failed pods
- 🔥 GPU Utilization - GPU allocation and usage
- 🏥 Health Checks - Llama-Stack health and readiness
- 🤖 Available Models - Configured and loaded models
- 🔌 Providers - Connected AI providers
- 💾 Storage - Persistent volume usage
Examples:
# Check status once
o8 llama status
# Monitor continuously
o8 llama status --watch
# Check specific namespace
o8 llama status --namespace my-ai-app
Sample Output:
✅ Llama-Stack Health: Healthy
AI Workload Pods:
NAME READY STATUS GPU AGE
llama-stack-7d4b8f9c8-x2m9p 1/1 Running 1/1 5m
Available Models:
┌─────────────────────────────────────┬──────────────────────┬──────┐
│ Model ID │ Provider │ Type │
├─────────────────────────────────────┼──────────────────────┼──────┤
│ meta-llama/Llama-3.2-3B-Instruct │ remote::ollama │ llm │
│ gpt-4 │ openai │ llm │
└─────────────────────────────────────┴──────────────────────┴──────┘
Configured Providers:
• inference: 3 configured
• safety: 1 configured
• vector-io: 2 configured
o8 llama logs
View AI workload logs.
o8 llama logs [OPTIONS]
Options:
--namespace, -n- Kubernetes namespace (default:llama-stack)--follow, -f- Follow log output (stream logs)--tail- Number of lines to show (default:100)
Examples:
# View recent logs
o8 llama logs
# Stream logs continuously
o8 llama logs --follow
# Show last 50 lines
o8 llama logs --tail 50
# View logs from specific namespace
o8 llama logs --namespace my-ai-app --follow
Log Categories:
- 🚀 Startup - Application initialization
- 🔧 Configuration - Provider and model loading
- 📨 Requests - API request processing
- ⚠️ Errors - Error conditions and warnings
- 📊 Metrics - Performance and usage metrics
o8 llama providers
List and manage AI providers.
o8 llama providers [OPTIONS]
Options:
--namespace, -n- Kubernetes namespace (default:llama-stack)--type, -t- Filter by provider type:inference,safety,vector-io,memory--provider- Show specific provider details
Provider Types:
- inference - LLM providers (OpenAI, Anthropic, Groq, etc.)
- safety - Content safety and moderation
- vector-io - Vector databases (ChromaDB, Qdrant, etc.)
- memory - Persistent memory and context
- telemetry - Monitoring and metrics
- datasets - Data management
Examples:
# List all providers
o8 llama providers
# Show only inference providers
o8 llama providers --type inference
# Show specific provider details
o8 llama providers --type inference --provider openai
Sample Output:
Inference Providers:
┌─────────────────────┬─────────────────────┬────────────────────────────┐
│ Provider ID │ Type │ Configuration │
├─────────────────────┼─────────────────────┼────────────────────────────┤
│ openai │ openai │ url: https://api.openai.c… │
│ anthropic │ anthropic │ url: https://api.anthropi… │
│ remote::ollama │ remote::ollama │ url: http://ollama.platfo… │
└─────────────────────┴─────────────────────┴────────────────────────────┘
Vector-Io Providers:
┌─────────────────────┬─────────────────────┬────────────────────────────┐
│ Provider ID │ Type │ Configuration │
├─────────────────────┼─────────────────────┼────────────────────────────┤
│ chromadb │ chromadb │ host: chromadb.llama-stac… │
└─────────────────────┴─────────────────────┴────────────────────────────┘
Global Options
All o8 llama commands support these global options:
--help, -h- Show command help--verbose, -v- Enable verbose output--config- Specify custom config file path
Configuration Files
AI Workload Specification (.o8/module.yaml)
apiVersion: orchestr8.platform/v1alpha1
kind: ModuleSpecification
metadata:
name: my-rag-app
spec:
module:
name: my-rag-app
version: 0.1.0
tier: enterprise
description: RAG application with OpenAI integration
requirements:
compute:
gpu:
enabled: true
type: nvidia.com/gpu
count: 1
memory:
requests: 4Gi
limits: 16Gi
storage:
modelCache:
size: 100Gi
storageClass: fast-ssd
vectorStore:
size: 200Gi
storageClass: fast-ssd
aiSpecific:
providers:
inference: [openai, anthropic]
vector-io: [chromadb]
capabilities: [inference, safety, agents, vector-io]
modelSupport:
formats: [huggingface, onnx]
sizes: [small, medium, large]
Environment Variables
KUBECONFIG- Path to kubeconfig fileO8_NAMESPACE- Default namespace for operationsO8_CONFIG_DIR- Configuration directory pathO8_LOG_LEVEL- Logging level (DEBUG, INFO, WARN, ERROR)
Exit Codes
0- Success1- General error2- Invalid arguments3- Kubernetes connection error4- Validation error5- Deployment error
Examples
Complete AI Workload Workflow
# 1. Initialize RAG application
o8 llama init customer-support-rag --template rag --provider openai
# 2. Navigate to workload
cd customer-support-rag
# 3. Customize configuration (edit .o8/module.yaml as needed)
# 4. Validate configuration
o8 llama validate --verbose
# 5. Deploy to development
o8 llama deploy --env dev
# 6. Monitor status
o8 llama status --watch
# 7. Check logs if needed
o8 llama logs --follow
# 8. Verify providers
o8 llama providers --type inference
Multi-Environment Deployment
# Deploy to development first
o8 llama deploy --env dev
# Verify in development
o8 llama status
# Preview production deployment
o8 llama deploy --env production --dry-run
# Deploy to production
o8 llama deploy --env production