AI Workloads

Orchestr8 provides enterprise-grade AI workload management on Kubernetes, integrating Llama-Stack with orchestr8's security, multi-tenancy, and operational excellence.

What orchestr8 Adds to AI Workloads

Enterprise Security: Zero-trust networking, pod security standards, secrets management
Multi-Tenancy: Namespace isolation with RBAC and resource quotas
GitOps Integration: AI workloads deployed via ArgoCD with full audit trails
GPU Orchestration: Intelligent GPU scheduling across multiple tenants
Compliance: Built-in SOC2, GDPR, and HIPAA controls for AI applications

Quick Start

1. Initialize Your First AI Workload

# Create a RAG application
o8 llama init my-rag-app --template rag --provider openai

# Create an agentic workflow
o8 llama init my-agent --template agent --provider anthropic

# Create a simple inference service
o8 llama init my-inference --template inference --provider groq

2. Configure Your Workload

Navigate to your workload directory and customize the configuration:

cd my-rag-app

The generated structure includes:

.o8/module.yaml - AI workload specification
base/ - Kubernetes manifests
overlays/ - Environment-specific configurations
tests/ - Automated testing setup

3. Deploy to Your Cluster

# Validate configuration
o8 llama validate

# Deploy to development environment
o8 llama deploy --environment dev

# Check deployment status
o8 llama status

AI Workload Templates

Orchestr8 provides pre-configured templates for common AI use cases:

🔍 RAG Applications

Retrieval-Augmented Generation applications with vector search capabilities.

Features:

Document ingestion and chunking
Vector embedding generation
Semantic search integration
Context-aware response generation

Components:

Vector database (ChromaDB, Qdrant, or PGVector)
Embedding service
Document processing pipeline
Query interface

🤖 Agentic Workflows

Multi-step reasoning applications with tool integration.

Features:

Multi-turn conversation handling
Tool calling and execution
Memory management
Safety guardrails

Components:

Agent orchestrator
Tool registry
Memory storage
Safety filters

⚡ Inference Services

High-performance model serving for real-time applications.

Features:

Model loading and caching
Auto-scaling based on demand
Load balancing
Performance monitoring

Components:

Model server
Load balancer
Monitoring stack
Caching layer

🛠️ Custom Workloads

Flexible template for building specialized AI applications.

Features:

Customizable AI pipeline
Multi-provider support
Resource optimization
Monitoring integration

AI Providers

Orchestr8 supports multiple AI providers through standardized configuration:

Cloud Providers

OpenAI: GPT models, embeddings, and fine-tuned models
Anthropic: Claude models with advanced reasoning
Groq: High-speed inference with specialized hardware
AWS Bedrock: Enterprise AI models with AWS integration

Local Deployment

Ollama: Local model serving with GPU acceleration
Hugging Face: Open-source models and transformers
Custom Models: Deploy your own fine-tuned models

Vector Databases

ChromaDB: Simple and efficient vector storage
Qdrant: High-performance vector search engine
PGVector: PostgreSQL extension for vector operations

Configuration

Provider Configuration

AI providers are configured through Kubernetes secrets managed by External Secrets Operator:

# Example: OpenAI provider configuration
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: llama-stack-api-keys
spec:
  target:
    name: llama-stack-api-keys
    template:
      data:
        OPENAI_API_KEY: "{{ .openai_api_key }}"
  data:
    - secretKey: openai_api_key
      remoteRef:
        key: /orchestr8/llama-stack/api-keys
        property: openai_api_key

Resource Management

AI workloads require careful resource planning:

# Example: GPU resource configuration
spec:
  requirements:
    compute:
      gpu:
        enabled: true
        type: nvidia.com/gpu
        count: 1
        memory: 16Gi
      cpu:
        requests: 2000m
        limits: 8000m
      memory:
        requests: 4Gi
        limits: 16Gi

Storage Configuration

AI applications need optimized storage for models and data:

# Example: Storage configuration
spec:
  requirements:
    storage:
      modelCache:
        type: persistent
        size: 100Gi
        storageClass: fast-ssd
      vectorStore:
        type: persistent
        size: 200Gi
        storageClass: fast-ssd

Security

AI workloads in Orchestr8 follow enterprise security best practices:

Network Security

Default-deny network policies with explicit allow rules
Istio service mesh for mTLS communication
Pod Security Standards with restricted profiles

Data Protection

Encryption at rest for model weights and data
Encryption in transit for all communications
Secret management through External Secrets Operator

Compliance

GDPR compliance with data retention policies
SOC2 controls for access and audit logging
HIPAA support for healthcare applications

Monitoring

Orchestr8 provides comprehensive monitoring for AI workloads:

Metrics

Model Performance: Latency, throughput, accuracy
Resource Usage: GPU utilization, memory consumption
Cost Tracking: Provider API usage and costs

Dashboards

AI Workload Overview: High-level health and performance
Provider Metrics: API usage and response times
Resource Utilization: GPU and compute efficiency

Alerting

Performance Degradation: Response time increases
Resource Exhaustion: GPU or memory limits
Cost Overruns: Budget threshold violations

Best Practices

Development

Start with templates - Use provided templates as starting points
Validate early - Run o8 llama validate before deployment
Test incrementally - Deploy to dev environment first
Monitor resources - Watch GPU and memory usage

Production

Resource planning - Size GPU nodes appropriately
Provider redundancy - Configure multiple AI providers
Scaling policies - Set up horizontal pod autoscaling
Backup strategies - Backup model weights and vector data

Security

Least privilege - Use minimal RBAC permissions
Network isolation - Implement proper network policies
Secret rotation - Regularly rotate API keys
Audit logging - Enable comprehensive audit trails

Troubleshooting

Common Issues

GPU nodes not detected:

# Check GPU node labels
kubectl get nodes -l nvidia.com/gpu.present=true

# Verify GPU operator installation
kubectl get pods -n gpu-operator-resources

AI workload fails to start:

# Check pod events
kubectl describe pod -n llama-stack -l app.kubernetes.io/name=llama-stack

# View detailed logs
o8 llama logs --follow

Provider authentication errors:

# Verify secrets exist
kubectl get secrets -n llama-stack

# Check secret contents (base64 encoded)
kubectl get secret llama-stack-api-keys -o yaml

Getting Help

Check logs: Use o8 llama logs for real-time debugging
Validate configuration: Run o8 llama validate to check setup
Monitor status: Use o8 llama status for health overview
Review documentation: Check provider-specific guides

Next Steps

Provider Configuration - Set up AI providers and secrets
Resource Planning - Plan GPU and storage requirements
Security Configuration - Harden AI workload security
Monitoring Setup - Configure AI-specific monitoring

What orchestr8 Adds to AI Workloads​

Quick Start​

1. Initialize Your First AI Workload​

2. Configure Your Workload​

3. Deploy to Your Cluster​

AI Workload Templates​

🔍 RAG Applications​

🤖 Agentic Workflows​

⚡ Inference Services​

🛠️ Custom Workloads​

AI Providers​

Cloud Providers​

Local Deployment​

Vector Databases​

Configuration​

Provider Configuration​

Resource Management​

Storage Configuration​

Security​

Network Security​

Data Protection​

Compliance​

Monitoring​

Metrics​

Dashboards​

Alerting​

Best Practices​

Development​

Production​

Security​

Troubleshooting​

Common Issues​

Getting Help​

Next Steps​

What orchestr8 Adds to AI Workloads

Quick Start

1. Initialize Your First AI Workload

2. Configure Your Workload

3. Deploy to Your Cluster

AI Workload Templates

🔍 RAG Applications

🤖 Agentic Workflows

⚡ Inference Services

🛠️ Custom Workloads

AI Providers

Cloud Providers

Local Deployment

Vector Databases

Configuration

Provider Configuration

Resource Management

Storage Configuration

Security

Network Security

Data Protection

Compliance

Monitoring

Metrics

Dashboards

Alerting

Best Practices

Development

Production

Security

Troubleshooting

Common Issues

Getting Help

Next Steps