Operations¶

This section documents the operational procedures for maintaining the brennan.page homelab.

Overview¶

The homelab requires regular maintenance to ensure optimal performance, security, and reliability.

Maintenance Schedule¶

Daily Tasks 📅¶

Time: 5 minutes
Frequency: Every day

System Health Check: Quick status verification
Log Review: Check for critical errors
Service Monitoring: Verify service availability

Weekly Tasks 📅¶

Time: 30 minutes
Frequency: Every Sunday

System Updates: Apply security updates
Backup Verification: Check backup integrity
Performance Review: Monitor resource usage

Monthly Tasks 📅¶

Time: 1 hour
Frequency: First Sunday of month

Security Audit: Review security settings
Performance Optimization: Clean up resources
Documentation Update: Update documentation

Quarterly Tasks 📅¶

Time: 2 hours
Frequency: Every quarter

Major Updates: Apply major version updates
Capacity Planning: Review resource usage
Disaster Recovery: Test recovery procedures

Operational Procedures¶

Wiki Management ¶

Wiki deployment, maintenance, and content management procedures.

Deployment ¶

Service deployment and update procedures.

Backups ¶

Backup and recovery procedures.

Monitoring ¶

System and service monitoring.

Maintenance ¶

Regular maintenance procedures.

Quick Commands¶

System Status¶

# Quick system health check
ssh -i ~/.omg-lol-keys/id_ed25519 -T -o BatchMode=yes root@159.203.44.169 "
  echo '=== System Status ==='
  docker ps
  echo -e '\n=== Resource Usage ==='
  free -h
  df -h
  echo -e '\n=== Service Health ==='
  curl -I https://brennan.page
"

Service Health¶

# Check critical services
curl -I https://docker.brennan.page
curl -I https://monitor.brennan.page
curl -I https://files.brennan.page

Log Review¶

# Check for critical errors
ssh -i ~/.omg-lol-keys/id_ed25519 -T -o BatchMode=yes root@159.203.44.169 "
  docker logs --tail 20 caddy | grep -i error
  docker logs --tail 20 postgres | grep -i error
  journalctl -n 50 --no-pager | grep -i error
"

Emergency Procedures¶

Service Outage¶

Assess Impact: Check system status
Restart Services: docker compose restart
Check Logs: Review error logs
Escalate: Contact support if needed

Data Recovery¶

Stop Services: docker compose down
Restore Backup: Use backup procedures
Verify Data: Check data integrity
Start Services: docker compose up -d

Getting Help¶

Before Contacting Support¶

Checked system status
Reviewed error logs
Attempted basic restart
Checked documentation

Information to Include¶

System status output
Error messages
Recent changes
Steps already taken

References¶

Services - Service documentation
Infrastructure - Infrastructure documentation
Configuration - Configuration management
Troubleshooting - Troubleshooting guides ssh -i ~/.omg-lol-keys/id_ed25519 -T -o BatchMode=yes root@159.203.44.169 " apt update apt list --upgradable "

Review and apply updates¶

ssh -i ~/.omg-lol-keys/id_ed25519 -T -o BatchMode=yes root@159.203.44.169 " apt upgrade -y docker system prune -f "

#### Service Updates
```bash
# Update Docker images
ssh -i ~/.omg-lol-keys/id_ed25519 -T -o BatchMode=yes root@159.203.44.169 "
  cd /opt/homelab/services
  for service in */; do
    echo "Updating \$service"
    cd "\$service"
    docker compose pull
    docker compose up -d
    cd ..
  done
"

Backup Verification¶

# Verify backup integrity
ssh -i ~/.omg-lol-keys/id_ed25519 -T -o BatchMode=yes root@159.203.44.169 "
  ls -la /opt/homelab/backups/
  find /opt/homelab/backups/ -name "*.tar.gz" -mtime +7 -exec ls -la {} \;
"

Monthly Tasks 📅¶

Time: 2 hours
Frequency: First Sunday of month

Security Audit¶

# Check security logs
ssh -i ~/.omg-lol-keys/id_ed25519 -T -o BatchMode=yes root@159.203.44.169 "
  echo '=== Failed Login Attempts ==='
  grep 'Failed password' /var/log/auth.log | tail -20
  echo -e '\n=== UFW Status ==='
  ufw status numbered
  echo -e '\n=== SSL Certificate Status ==='
  docker exec caddy caddy list-certificates
"

Performance Review¶

# Check resource trends
ssh -i ~/.omg-lol-keys/id_ed25519 -T -o BatchMode=yes root@159.203.44.169 "
  echo '=== Memory Usage Trend ==='
  free -h
  echo -e '\n=== Disk Usage Trend ==='
  df -h
  echo -e '\n=== Docker Resource Usage ==='
  docker stats --no-stream
"

Database Maintenance¶

# Database optimization
ssh -i ~/.omg-lol-keys/id_ed25519 -T -o BatchMode=yes root@159.203.44.169 "
  docker exec postgres psql -U homelab -d homelab -c 'VACUUM ANALYZE;'
  docker exec postgres psql -U homelab -d vikunja -c 'VACUUM ANALYZE;'
  docker exec postgres psql -U homelab -d hedgedoc -c 'VACUUM ANALYZE;'
  docker exec postgres psql -U homelab -d linkding -c 'VACUUM ANALYZE;'
  docker exec postgres psql -U homelab -d navidrome -c 'VACUUM ANALYZE;'
"

Operational Procedures¶

Service Management¶

Starting Services¶

# Start single service
ssh -i ~/.omg-lol-keys/id_ed25519 -T -o BatchMode=yes root@159.203.44.169 "
  cd /opt/homelab/services/service_name
  docker compose up -d
"

# Start all services
ssh -i ~/.omg-lol-keys/id_ed25519 -T -o BatchMode=yes root@159.203.44.169 "
  cd /opt/homelab
  docker compose up -d
"

Stopping Services¶

# Stop single service
ssh -i ~/.omg-lol-keys/id_ed25519 -T -o BatchMode=yes root@159.203.44.169 "
  cd /opt/homelab/services/service_name
  docker compose down
"

# Stop all services (emergency only)
ssh -i ~/.omg-lol-keys/id_ed25519 -T -o BatchMode=yes root@159.203.44.169 "
  cd /opt/homelab
  docker compose down
"

Restarting Services¶

# Restart single service
ssh -i ~/.omg-lol-keys/id_ed25519 -T -o BatchMode=yes root@159.203.44.169 "
  cd /opt/homelab/services/service_name
  docker compose restart
"

# Graceful restart of all services
ssh -i ~/.omg-lol-keys/id_ed25519 -T -o BatchMode=yes root@159.203.44.169 "
  cd /opt/homelab
  docker compose restart
"

Backup Operations¶

Manual Backup¶

# Create full backup
ssh -i ~/.omg-lol-keys/id_ed25519 -T -o BatchMode=yes root@159.203.44.169 "
  cd /opt/homelab
  ./scripts/backup.sh
"

# Backup specific service
ssh -i ~/.omg-lol-keys/id_ed25519 -T -o BatchMode=yes root@159.203.44.169 "
  cd /opt/homelab
  ./scripts/backup-service.sh service_name
"

Restore Operations¶

# Restore from backup
ssh -i ~/.omg-lol-keys/id_ed25519 -T -o BatchMode=yes root@159.203.44.169 "
  cd /opt/homelab
  ./scripts/restore.sh backup_file.tar.gz
"

# Restore specific service
ssh -i ~/.omg-lol-keys/id_ed25519 -T -o BatchMode=yes root@159.203.44.169 "
  cd /opt/homelab
  ./scripts/restore-service.sh service_name backup_file.tar.gz
"

Monitoring Operations¶

Health Checks¶

# Check all services
ssh -i ~/.omg-lol-keys/id_ed25519 -T -o BatchMode=yes root@159.203.44.169 "
  ./scripts/health-check.sh
"

# Check specific service
ssh -i ~/.omg-lol-keys/id_ed25519 -T -o BatchMode=yes root@159.203.44.169 "
  curl -f https://service.brennan.page || echo 'Service DOWN'
"

Performance Monitoring¶

# Real-time monitoring
ssh -i ~/.omg-lol-keys/id_ed25519 -T -o BatchMode=yes root@159.203.44.169 "
  docker stats
"

# Historical performance
ssh -i ~/.omg-lol-keys/id_ed25519 -T -o BatchMode=yes root@159.203.44.169 "
  docker stats --no-stream --format 'table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}'
"

Incident Response¶

Incident Classification¶

Critical (P1)¶

Service completely down
Data corruption or loss
Security breach
System unavailable

High (P2)¶

Service degradation
Performance issues
Partial functionality loss
Backup failures

Medium (P3)¶

Minor bugs
UI issues
Documentation errors
Non-critical features

Low (P4)¶

Cosmetic issues
Typos
Minor improvements
Feature requests

Response Procedures¶

P1 - Critical Response¶

Immediate Action (5 minutes)

# Assess impact
docker ps
docker logs --tail 50 service_name
curl -I https://service.brennan.page

Stabilization (15 minutes)

# Restart affected services
docker compose restart
# If needed, restore from backup
./scripts/restore.sh latest_backup.tar.gz

Communication (30 minutes)
Document incident
Update status page
Notify stakeholders

P2 - High Response¶

Assessment (30 minutes)

# Investigate issue
docker logs service_name --tail 100
docker exec service_name ps aux
docker stats service_name

Resolution (2 hours)

# Apply fix
docker compose pull
docker compose up -d
# Verify resolution
curl -I https://service.brennan.page

P3 - Medium Response¶

Investigation (4 hours)
Review logs
Test in staging
Plan fix
Implementation (1 day)
Deploy fix
Test thoroughly
Update documentation

P4 - Low Response¶

Planning (1 week)
Add to backlog
Prioritize
Schedule
Implementation (2 weeks)
Implement during regular maintenance
Test and deploy

Operational Metrics¶

Key Performance Indicators¶

Uptime: Target > 99.5%
Response Time: Target < 2 seconds
Backup Success: Target 100%
Security Incidents: Target 0

Monitoring Dashboards¶

System Overview: https://monitor.brennan.page
Service Status: https://brennan.page
Documentation: https://wiki.brennan.page

Reporting¶

Daily: Health check summary
Weekly: Performance report
Monthly: Executive summary
Quarterly: Strategic review

Operational Tools¶

Automation Scripts¶

backup.sh: Automated backup procedures
health-check.sh: Service health monitoring
deploy-service.sh: Service deployment
restore.sh: Disaster recovery

Monitoring Tools¶

Enhanced Monitor: System monitoring
Docker: Container monitoring
Caddy: Web server logs
PostgreSQL: Database monitoring

Management Tools¶

Portainer: Docker management
SSH: Remote management
Git: Configuration management
Wiki: Documentation

Operational Security¶

Access Control¶

SSH Keys: Key-based authentication only
User Accounts: Minimal user accounts
Sudo: Limited sudo access
Audit Trail: All actions logged

Security Procedures¶

Password Management: Regular password rotation
Certificate Management: Automated SSL renewal
Firewall Rules: Regular review and updates
Security Updates: Prompt security patching

Backup Security¶

Encryption: Backup encryption
Offsite: Offsite backup storage
Testing: Regular backup testing
Retention: Backup retention policy

Operational Documentation¶

Required Documentation¶

Runbooks: Step-by-step procedures
Service Docs: Service-specific documentation
Network Diagrams: Infrastructure documentation
Contact Lists: Emergency contact information

Documentation Standards¶

Version Control: All docs in Git
Review Process: Regular doc reviews
Accessibility: Easy to find and use
Accuracy: Regular updates

Training and Knowledge¶

Operator Training¶

System Overview: Understanding the architecture
Service Management: Service operations
Troubleshooting: Problem resolution
Emergency Procedures: Incident response

Wiki: Central knowledge base
Runbooks: Operational procedures
Best Practices: Lessons learned
Incident Reviews: Post-incident analysis

References¶

Services - Service documentation
Infrastructure - Infrastructure documentation
Troubleshooting - Troubleshooting guides
Configuration - Configuration management

Operations¶

Overview¶

Maintenance Schedule¶

Daily Tasks 📅¶

Weekly Tasks 📅¶

Monthly Tasks 📅¶

Quarterly Tasks 📅¶

Operational Procedures¶

Wiki Management¶

Deployment¶

Backups¶

Monitoring¶

Maintenance¶

Quick Commands¶

System Status¶

Service Health¶

Log Review¶

Emergency Procedures¶

Service Outage¶

Data Recovery¶

Getting Help¶

Before Contacting Support¶

Information to Include¶

References¶

Review and apply updates¶

Backup Verification¶

Monthly Tasks 📅¶

Security Audit¶

Performance Review¶

Database Maintenance¶

Operational Procedures¶

Service Management¶

Starting Services¶

Stopping Services¶

Restarting Services¶

Backup Operations¶

Manual Backup¶

Restore Operations¶

Monitoring Operations¶

Health Checks¶

Performance Monitoring¶

Incident Response¶

Incident Classification¶

Critical (P1)¶

High (P2)¶

Medium (P3)¶

Low (P4)¶

Response Procedures¶

P1 - Critical Response¶

P2 - High Response¶

P3 - Medium Response¶

P4 - Low Response¶

Operational Metrics¶

Key Performance Indicators¶

Monitoring Dashboards¶

Reporting¶

Operational Tools¶

Automation Scripts¶

Monitoring Tools¶

Management Tools¶

Operational Security¶

Access Control¶

Security Procedures¶

Backup Security¶

Operational Documentation¶

Required Documentation¶

Documentation Standards¶

Training and Knowledge¶

Operator Training¶

Knowledge Sharing¶

References¶

Wiki Management ¶

Deployment ¶

Backups ¶

Monitoring ¶

Maintenance ¶