Upgrade Guide¶

This guide explains how to upgrade the E6 Operator, CRDs, and MetadataServices workloads safely.

Table of Contents¶

Overview
Upgrade Types
Pre-Upgrade Checklist
Upgrading the Operator
Upgrading CRDs
Upgrading MetadataServices Workloads
Rollback Procedures
Version Compatibility

Overview¶

The E6 Operator consists of three independently upgradeable components:

CRDs (CustomResourceDefinitions) - Schema for MetadataServices resources
Operator - Controller that manages MetadataServices resources
MetadataServices Workloads - Storage and Schema deployments

Upgrade Order¶

⚠️ Important: Always upgrade in this order:

1. CRDs (if schema changes)
2. Operator
3. MetadataServices workloads (automatic via blue-green)

Upgrading out of order may cause compatibility issues.

Upgrade Types¶

Minor Upgrade¶

Example: v0.1.0 → v0.2.0

New features, bug fixes
May include new CRD fields (backward compatible)
No breaking changes
Downtime: None (blue-green deployment)

Major Upgrade¶

Example: v0.x → v1.0.0

Breaking changes possible
CRD schema changes
May require manual intervention
Downtime: Minimal (plan carefully)

Patch Upgrade¶

Example: v0.1.0 → v0.1.1

Bug fixes only
No CRD changes
Fully backward compatible
Downtime: None

Pre-Upgrade Checklist¶

1. Review Release Notes¶

# Check release notes for breaking changes
curl -s https://api.github.com/repos/e6data/e6-operator/releases/latest | \
  jq -r '.body'

2. Backup Current State¶

#!/bin/bash
# backup-operator-state.sh

BACKUP_DIR="backup-$(date +%Y%m%d-%H%M%S)"
mkdir -p $BACKUP_DIR

echo "Backing up operator state..."

# Backup all MetadataServices resources
kubectl get metadataservices --all-namespaces -o yaml > $BACKUP_DIR/metadataservices.yaml

# Backup CRD
kubectl get crd metadataservices.e6data.io -o yaml > $BACKUP_DIR/crd.yaml

# Backup operator deployment
kubectl get deployment -n e6-operator-system e6-operator-controller-manager -o yaml \
  > $BACKUP_DIR/operator-deployment.yaml

# Backup operator RBAC
kubectl get clusterrole metadataservices-operator-manager-role -o yaml \
  > $BACKUP_DIR/clusterrole.yaml
kubectl get clusterrolebinding metadataservices-operator-manager-rolebinding -o yaml \
  > $BACKUP_DIR/clusterrolebinding.yaml

tar -czf $BACKUP_DIR.tar.gz $BACKUP_DIR/
echo "Backup complete: $BACKUP_DIR.tar.gz"

3. Check Current Versions¶

# Operator version
kubectl get deployment -n e6-operator-system e6-operator-controller-manager \
  -o jsonpath='{.spec.template.spec.containers[0].image}'

# CRD version
kubectl get crd metadataservices.e6data.io \
  -o jsonpath='{.spec.versions[*].name}'

# MetadataServices workload versions
kubectl get metadataservices -A \
  -o custom-columns=NAME:.metadata.name,NAMESPACE:.metadata.namespace,\
STORAGE:.spec.storage.imageTag,SCHEMA:.spec.schema.imageTag

4. Verify Cluster Health¶

# Check operator health
kubectl get pods -n e6-operator-system

# Check MetadataServices resources
kubectl get metadataservices -A

# Check for degraded workloads
kubectl get metadataservices -A -o json | \
  jq -r '.items[] | select(.status.phase != "Stable") | "\(.metadata.name) - \(.status.phase)"'

5. Review Capacity¶

# Check cluster resources
kubectl top nodes

# Ensure sufficient capacity for blue-green deployments (2x during upgrade)

Upgrading the Operator¶

Method 1: Helm Upgrade (Recommended)¶

Step 1: Update Helm Repository¶

# Add/update Helm repo
helm repo add e6data https://e6data.github.io/helm-charts
helm repo update

Step 2: Review Changes¶

# Check what will change
helm diff upgrade e6-operator e6data/e6-operator \
  --namespace e6-operator-system \
  --version 0.2.0

# OR with custom values
helm diff upgrade e6-operator e6data/e6-operator \
  --namespace e6-operator-system \
  --version 0.2.0 \
  -f custom-values.yaml

Step 3: Upgrade Operator¶

# Upgrade with default values
helm upgrade e6-operator e6data/e6-operator \
  --namespace e6-operator-system \
  --version 0.2.0

# OR with custom values
helm upgrade e6-operator e6data/e6-operator \
  --namespace e6-operator-system \
  --version 0.2.0 \
  -f custom-values.yaml

Step 4: Verify Upgrade¶

# Check rollout status
kubectl rollout status deployment/e6-operator-controller-manager \
  -n e6-operator-system

# Verify new version
kubectl get deployment -n e6-operator-system e6-operator-controller-manager \
  -o jsonpath='{.spec.template.spec.containers[0].image}'

# Check logs for errors
kubectl logs -n e6-operator-system \
  deployment/e6-operator-controller-manager \
  --tail=50

Method 2: kubectl/Kustomize Upgrade¶

Step 1: Update Manifests¶

# Clone or pull latest version
git clone https://github.com/e6data/e6-operator.git
cd e6-operator
git checkout v0.2.0

Step 2: Review Changes¶

# Preview what will change
kubectl diff -k config/default

Step 3: Apply Upgrade¶

# Apply updated manifests
kubectl apply -k config/default

# OR with custom overlay
kubectl apply -k overlays/production

Step 4: Verify Upgrade¶

# Check deployment status
kubectl rollout status deployment/e6-operator-controller-manager \
  -n e6-operator-system

# Verify version
kubectl get deployment -n e6-operator-system e6-operator-controller-manager \
  -o jsonpath='{.spec.template.spec.containers[0].image}'

Method 3: Direct Image Update¶

⚠️ Not recommended - Use only for quick testing

# Update image directly
kubectl set image deployment/e6-operator-controller-manager \
  manager=your-registry/e6-operator:0.2.0 \
  -n e6-operator-system

# Watch rollout
kubectl rollout status deployment/e6-operator-controller-manager \
  -n e6-operator-system

Upgrading CRDs¶

When to Upgrade CRDs¶

Upgrade CRDs when: - Release notes indicate CRD schema changes - New fields added to MetadataServices spec/status - Validation rules updated - New API versions introduced

CRD Upgrade Procedure¶

Method 1: Separate CRD Chart (Recommended)¶

# Upgrade CRDs chart first
helm upgrade e6-operator-crds e6data/e6-operator-crds \
  --namespace e6-operator-system \
  --version 0.2.0

# Verify CRD updated
kubectl get crd metadataservices.e6data.io \
  -o jsonpath='{.spec.versions[*].name}'

Method 2: kubectl apply¶

# Apply updated CRD
kubectl apply -f https://github.com/e6data/e6-operator/releases/download/v0.2.0/metadataservices.e6data.io.yaml

# Verify
kubectl get crd metadataservices.e6data.io -o yaml | grep "version:"

CRD Upgrade Considerations¶

⚠️ Important CRD Limitations:

Cannot remove fields - Only add new optional fields
Cannot change field types - Field types are immutable
Cannot rename fields - Use new fields, deprecate old ones
Validation rules - Can be added but not easily removed

Multi-Version CRD Support¶

The operator may support multiple CRD versions simultaneously:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
spec:
  versions:
  - name: v1alpha1  # Old version
    served: true
    storage: false  # Deprecated
  - name: v1beta1   # New version
    served: true
    storage: true   # Default

Migration Path:

# 1. New version added (both served)
# 2. Convert resources to new version
# 3. Deprecate old version
# 4. Remove old version (major release)

Upgrading MetadataServices Workloads¶

MetadataServices workloads (storage and schema) use automatic blue-green deployment when image tags change.

Zero-Downtime Upgrade Process¶

Step 1: Update Image Tags¶

# Edit MetadataServices resource
kubectl edit metadataservices sample1 -n autoscalingv2

# Update image tags:
spec:
  storage:
    imageTag: "1.0.500-new"  # Updated from 1.0.437-old
  schema:
    imageTag: "1.0.600-new"  # Updated from 1.0.547-old

OR via kubectl patch:

kubectl patch metadataservices sample1 -n autoscalingv2 --type=merge -p '
{
  "spec": {
    "storage": {"imageTag": "1.0.500-new"},
    "schema": {"imageTag": "1.0.600-new"}
  }
}'

Step 2: Monitor Upgrade Progress¶

# Watch deployment phase
watch -n 2 "kubectl get metadataservices sample1 -n autoscalingv2 \
  -o jsonpath='{.status.deploymentPhase}: {.status.activeStrategy} -> {.status.pendingStrategy}'"

# Phases: Stable -> Deploying -> Switching -> Cleanup -> Stable

Expected Timeline:

Phase	Duration	Description
Stable → Deploying	0s	New strategy deployment initiated
Deploying	2-5 min	New pods starting, passing health checks
Deploying (grace)	2 min	Grace period for stability
Switching	10s	Traffic switched to new version
Cleanup	30s	Old version resources deleted
Stable	-	Upgrade complete

Total: ~5-10 minutes

Step 3: Verify Upgrade¶

# Check active version
kubectl get metadataservices sample1 -n autoscalingv2 \
  -o jsonpath='{.status.activeReleaseVersion}'

# Check current image tags
kubectl get metadataservices sample1 -n autoscalingv2 \
  -o jsonpath='{.spec.storage.imageTag}: {.spec.schema.imageTag}'

# Check pods are running
kubectl get pods -n autoscalingv2 -l app=sample1

# Check release history
kubectl get metadataservices sample1 -n autoscalingv2 \
  -o jsonpath='{.status.releaseHistory}' | jq .

Step 4: Test Application¶

# Test storage service
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
  curl -v http://sample1-storage-green:9005

# Test schema service
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
  curl -v http://sample1-schema-green:9006

# Check logs for errors
kubectl logs -n autoscalingv2 -l app=sample1 --tail=100 | grep ERROR

Batch Upgrades¶

For upgrading multiple MetadataServices resources:

#!/bin/bash
# batch-upgrade.sh

NEW_STORAGE_TAG="1.0.500-new"
NEW_SCHEMA_TAG="1.0.600-new"
NAMESPACE="autoscalingv2"

# Get all MetadataServices resources
RESOURCES=$(kubectl get metadataservices -n $NAMESPACE -o name)

for resource in $RESOURCES; do
  name=$(echo $resource | cut -d'/' -f2)

  echo "Upgrading $name..."

  kubectl patch metadataservices $name -n $NAMESPACE --type=merge -p "
{
  \"spec\": {
    \"storage\": {\"imageTag\": \"$NEW_STORAGE_TAG\"},
    \"schema\": {\"imageTag\": \"$NEW_SCHEMA_TAG\"}
  }
}"

  # Wait for upgrade to complete
  echo "Waiting for $name to stabilize..."
  while true; do
    phase=$(kubectl get metadataservices $name -n $NAMESPACE -o jsonpath='{.status.deploymentPhase}')
    if [ "$phase" = "Stable" ]; then
      echo "$name upgrade complete"
      break
    fi
    echo "  Phase: $phase"
    sleep 10
  done

  echo ""
done

echo "All upgrades complete"

Canary Upgrades¶

For gradual rollout:

Test on staging first

# Upgrade staging environment
kubectl patch metadataservices sample1-staging -n staging --type=merge -p '...'

# Verify for 24 hours
# Monitor metrics, logs, errors

Upgrade production in phases

# Phase 1: 10% of workloads
kubectl patch metadataservices workspace-1 -n prod --type=merge -p '...'

# Monitor for 1 hour

# Phase 2: 50% of workloads
# Phase 3: 100% of workloads

Rollback Procedures¶

Rollback Operator¶

Helm Rollback¶

# List release history
helm history e6-operator -n e6-operator-system

# Rollback to previous version
helm rollback e6-operator -n e6-operator-system

# OR rollback to specific revision
helm rollback e6-operator 3 -n e6-operator-system

kubectl Rollback¶

# Rollback deployment
kubectl rollout undo deployment/e6-operator-controller-manager \
  -n e6-operator-system

# OR to specific revision
kubectl rollout undo deployment/e6-operator-controller-manager \
  -n e6-operator-system --to-revision=2

Rollback CRDs¶

⚠️ Warning: CRD rollback is risky and not recommended.

Why CRD Rollback is Dangerous: - May remove fields that resources are using - Can cause validation errors - May lose data in removed fields

If absolutely necessary:

# Re-apply old CRD version
kubectl apply -f crd-v0.1.0.yaml

# Verify all resources still valid
kubectl get metadataservices -A

# Check for validation errors in operator logs
kubectl logs -n e6-operator-system deployment/e6-operator-controller-manager \
  | grep ERROR

Rollback MetadataServices Workloads¶

See Rollback Guide for detailed workload rollback procedures.

Quick rollback:

# Manual rollback to previous version
kubectl annotate metadataservices sample1 -n autoscalingv2 \
  e6data.io/rollback-to=$(kubectl get metadataservices sample1 -n autoscalingv2 \
    -o jsonpath='{.status.releaseHistory[-2].version}')

# OR automatic rollback on failure (happens automatically after 2 min)

Version Compatibility¶

Compatibility Matrix¶

Operator Version	CRD Version	Min Kubernetes	Storage Image	Schema Image
v0.1.0	v1alpha1	1.20+	1.0.437+	1.0.547+
v0.2.0	v1alpha1	1.22+	1.0.450+	1.0.550+
v1.0.0	v1beta1	1.24+	1.1.0+	1.1.0+

Skipping Versions¶

Minor versions: Can be skipped safely

# v0.1.0 -> v0.3.0 ✅ OK

Major versions: Must upgrade sequentially

# v0.5.0 -> v1.0.0 -> v2.0.0 ✅ OK
# v0.5.0 -> v2.0.0 ❌ NOT SAFE

Kubernetes Version Requirements¶

E6 Operator	Min K8s	Recommended K8s
v0.1.x	1.20	1.24+
v0.2.x	1.22	1.26+
v1.0.x	1.24	1.28+

Testing Upgrades¶

Dry-Run Upgrade¶

# Helm dry-run
helm upgrade e6-operator e6data/e6-operator \
  --namespace e6-operator-system \
  --version 0.2.0 \
  --dry-run --debug

# kubectl dry-run
kubectl apply -k config/default --dry-run=client

Test in Staging¶

# 1. Deploy operator to staging cluster
helm install e6-operator-staging e6data/e6-operator \
  --namespace e6-operator-staging \
  --create-namespace

# 2. Create test MetadataServices
kubectl apply -f test-metadataservices.yaml -n staging

# 3. Upgrade operator
helm upgrade e6-operator-staging e6data/e6-operator \
  --namespace e6-operator-staging \
  --version 0.2.0

# 4. Verify everything works
# 5. Proceed with production upgrade

Monitoring Upgrades¶

Key Metrics to Monitor¶

# Reconciliation errors during upgrade
rate(controller_runtime_reconcile_errors_total[5m])

# Deployment unavailability
count(kube_deployment_status_replicas_unavailable > 0)

# Pod restarts
rate(kube_pod_container_status_restarts_total[5m])

# Workqueue depth
workqueue_depth{name="metadataservices"}

Alert Rules for Upgrades¶

- alert: UpgradeStalled
  expr: kube_deployment_status_observed_generation != kube_deployment_metadata_generation
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "Deployment upgrade stalled"

- alert: HighErrorRateDuringUpgrade
  expr: rate(controller_runtime_reconcile_errors_total[5m]) > 0.1
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "High error rate during upgrade"

Best Practices¶

1. Always Upgrade During Maintenance Window¶

Schedule upgrades during low-traffic periods
Inform stakeholders of planned upgrades
Have rollback plan ready

2. Test in Non-Production First¶

Development → Staging → Production

3. Upgrade One Component at a Time¶

CRDs → Operator → Workloads (one by one)

4. Monitor Closely¶

Watch operator logs
Monitor metrics
Check resource status
Verify application functionality

5. Document Upgrade Process¶

Record versions before/after
Document any issues encountered
Note rollback procedures used
Share lessons learned

6. Backup Before Upgrade¶

Always backup: - MetadataServices resources - CRD definitions - Operator configuration - RBAC manifests

Troubleshooting Upgrades¶

Operator Upgrade Fails¶

Check pod status:

kubectl get pods -n e6-operator-system
kubectl describe pod <pod-name> -n e6-operator-system
kubectl logs <pod-name> -n e6-operator-system --all-containers

Common issues: - Webhook certificate not ready - RBAC permission changes - Image pull errors - Resource constraints

CRD Upgrade Fails¶

Error: "field is immutable"

# CRD fields cannot be changed once set
# Solution: Create new CRD version, migrate resources

Error: "existing resources don't validate"

# Check which resources fail validation
kubectl get metadataservices -A -o yaml | kubectl apply --dry-run=server -f -

# Fix resources or adjust validation

Workload Upgrade Stuck¶

See Troubleshooting Guide - Blue-Green Issues

Upgrade Guide¶

Table of Contents¶

Overview¶

Upgrade Order¶

Upgrade Types¶

Minor Upgrade¶

Major Upgrade¶

Patch Upgrade¶

Pre-Upgrade Checklist¶

1. Review Release Notes¶

2. Backup Current State¶

3. Check Current Versions¶

4. Verify Cluster Health¶

5. Review Capacity¶

Upgrading the Operator¶

Method 1: Helm Upgrade (Recommended)¶

Step 1: Update Helm Repository¶

Step 2: Review Changes¶

Step 3: Upgrade Operator¶

Step 4: Verify Upgrade¶

Method 2: kubectl/Kustomize Upgrade¶

Step 1: Update Manifests¶

Step 2: Review Changes¶

Step 3: Apply Upgrade¶

Step 4: Verify Upgrade¶

Method 3: Direct Image Update¶

Upgrading CRDs¶

When to Upgrade CRDs¶

CRD Upgrade Procedure¶

Method 1: Separate CRD Chart (Recommended)¶

Method 2: kubectl apply¶

CRD Upgrade Considerations¶

Multi-Version CRD Support¶

Upgrading MetadataServices Workloads¶

Zero-Downtime Upgrade Process¶

Step 1: Update Image Tags¶

Step 2: Monitor Upgrade Progress¶

Step 3: Verify Upgrade¶

Step 4: Test Application¶

Batch Upgrades¶

Canary Upgrades¶

Rollback Procedures¶

Rollback Operator¶

Helm Rollback¶

kubectl Rollback¶

Rollback CRDs¶

Rollback MetadataServices Workloads¶

Version Compatibility¶

Compatibility Matrix¶

Skipping Versions¶

Kubernetes Version Requirements¶

Testing Upgrades¶

Dry-Run Upgrade¶

Test in Staging¶

Monitoring Upgrades¶

Key Metrics to Monitor¶

Alert Rules for Upgrades¶

Best Practices¶

1. Always Upgrade During Maintenance Window¶

2. Test in Non-Production First¶

3. Upgrade One Component at a Time¶

4. Monitor Closely¶

5. Document Upgrade Process¶

6. Backup Before Upgrade¶

Troubleshooting Upgrades¶

Operator Upgrade Fails¶

CRD Upgrade Fails¶

Workload Upgrade Stuck¶

Additional Resources¶