Upgrade Guide¶
This guide explains how to upgrade the E6 Operator, CRDs, and MetadataServices workloads safely.
Table of Contents¶
- Overview
- Upgrade Types
- Pre-Upgrade Checklist
- Upgrading the Operator
- Upgrading CRDs
- Upgrading MetadataServices Workloads
- Rollback Procedures
- Version Compatibility
Overview¶
The E6 Operator consists of three independently upgradeable components:
- CRDs (CustomResourceDefinitions) - Schema for MetadataServices resources
- Operator - Controller that manages MetadataServices resources
- MetadataServices Workloads - Storage and Schema deployments
Upgrade Order¶
⚠️ Important: Always upgrade in this order:
Upgrading out of order may cause compatibility issues.
Upgrade Types¶
Minor Upgrade¶
Example: v0.1.0 → v0.2.0
- New features, bug fixes
- May include new CRD fields (backward compatible)
- No breaking changes
- Downtime: None (blue-green deployment)
Major Upgrade¶
Example: v0.x → v1.0.0
- Breaking changes possible
- CRD schema changes
- May require manual intervention
- Downtime: Minimal (plan carefully)
Patch Upgrade¶
Example: v0.1.0 → v0.1.1
- Bug fixes only
- No CRD changes
- Fully backward compatible
- Downtime: None
Pre-Upgrade Checklist¶
1. Review Release Notes¶
# Check release notes for breaking changes
curl -s https://api.github.com/repos/e6data/e6-operator/releases/latest | \
jq -r '.body'
2. Backup Current State¶
#!/bin/bash
# backup-operator-state.sh
BACKUP_DIR="backup-$(date +%Y%m%d-%H%M%S)"
mkdir -p $BACKUP_DIR
echo "Backing up operator state..."
# Backup all MetadataServices resources
kubectl get metadataservices --all-namespaces -o yaml > $BACKUP_DIR/metadataservices.yaml
# Backup CRD
kubectl get crd metadataservices.e6data.io -o yaml > $BACKUP_DIR/crd.yaml
# Backup operator deployment
kubectl get deployment -n e6-operator-system e6-operator-controller-manager -o yaml \
> $BACKUP_DIR/operator-deployment.yaml
# Backup operator RBAC
kubectl get clusterrole metadataservices-operator-manager-role -o yaml \
> $BACKUP_DIR/clusterrole.yaml
kubectl get clusterrolebinding metadataservices-operator-manager-rolebinding -o yaml \
> $BACKUP_DIR/clusterrolebinding.yaml
tar -czf $BACKUP_DIR.tar.gz $BACKUP_DIR/
echo "Backup complete: $BACKUP_DIR.tar.gz"
3. Check Current Versions¶
# Operator version
kubectl get deployment -n e6-operator-system e6-operator-controller-manager \
-o jsonpath='{.spec.template.spec.containers[0].image}'
# CRD version
kubectl get crd metadataservices.e6data.io \
-o jsonpath='{.spec.versions[*].name}'
# MetadataServices workload versions
kubectl get metadataservices -A \
-o custom-columns=NAME:.metadata.name,NAMESPACE:.metadata.namespace,\
STORAGE:.spec.storage.imageTag,SCHEMA:.spec.schema.imageTag
4. Verify Cluster Health¶
# Check operator health
kubectl get pods -n e6-operator-system
# Check MetadataServices resources
kubectl get metadataservices -A
# Check for degraded workloads
kubectl get metadataservices -A -o json | \
jq -r '.items[] | select(.status.phase != "Stable") | "\(.metadata.name) - \(.status.phase)"'
5. Review Capacity¶
# Check cluster resources
kubectl top nodes
# Ensure sufficient capacity for blue-green deployments (2x during upgrade)
Upgrading the Operator¶
Method 1: Helm Upgrade (Recommended)¶
Step 1: Update Helm Repository¶
Step 2: Review Changes¶
# Check what will change
helm diff upgrade e6-operator e6data/e6-operator \
--namespace e6-operator-system \
--version 0.2.0
# OR with custom values
helm diff upgrade e6-operator e6data/e6-operator \
--namespace e6-operator-system \
--version 0.2.0 \
-f custom-values.yaml
Step 3: Upgrade Operator¶
# Upgrade with default values
helm upgrade e6-operator e6data/e6-operator \
--namespace e6-operator-system \
--version 0.2.0
# OR with custom values
helm upgrade e6-operator e6data/e6-operator \
--namespace e6-operator-system \
--version 0.2.0 \
-f custom-values.yaml
Step 4: Verify Upgrade¶
# Check rollout status
kubectl rollout status deployment/e6-operator-controller-manager \
-n e6-operator-system
# Verify new version
kubectl get deployment -n e6-operator-system e6-operator-controller-manager \
-o jsonpath='{.spec.template.spec.containers[0].image}'
# Check logs for errors
kubectl logs -n e6-operator-system \
deployment/e6-operator-controller-manager \
--tail=50
Method 2: kubectl/Kustomize Upgrade¶
Step 1: Update Manifests¶
# Clone or pull latest version
git clone https://github.com/e6data/e6-operator.git
cd e6-operator
git checkout v0.2.0
Step 2: Review Changes¶
Step 3: Apply Upgrade¶
# Apply updated manifests
kubectl apply -k config/default
# OR with custom overlay
kubectl apply -k overlays/production
Step 4: Verify Upgrade¶
# Check deployment status
kubectl rollout status deployment/e6-operator-controller-manager \
-n e6-operator-system
# Verify version
kubectl get deployment -n e6-operator-system e6-operator-controller-manager \
-o jsonpath='{.spec.template.spec.containers[0].image}'
Method 3: Direct Image Update¶
⚠️ Not recommended - Use only for quick testing
# Update image directly
kubectl set image deployment/e6-operator-controller-manager \
manager=your-registry/e6-operator:0.2.0 \
-n e6-operator-system
# Watch rollout
kubectl rollout status deployment/e6-operator-controller-manager \
-n e6-operator-system
Upgrading CRDs¶
When to Upgrade CRDs¶
Upgrade CRDs when: - Release notes indicate CRD schema changes - New fields added to MetadataServices spec/status - Validation rules updated - New API versions introduced
CRD Upgrade Procedure¶
Method 1: Separate CRD Chart (Recommended)¶
# Upgrade CRDs chart first
helm upgrade e6-operator-crds e6data/e6-operator-crds \
--namespace e6-operator-system \
--version 0.2.0
# Verify CRD updated
kubectl get crd metadataservices.e6data.io \
-o jsonpath='{.spec.versions[*].name}'
Method 2: kubectl apply¶
# Apply updated CRD
kubectl apply -f https://github.com/e6data/e6-operator/releases/download/v0.2.0/metadataservices.e6data.io.yaml
# Verify
kubectl get crd metadataservices.e6data.io -o yaml | grep "version:"
CRD Upgrade Considerations¶
⚠️ Important CRD Limitations:
- Cannot remove fields - Only add new optional fields
- Cannot change field types - Field types are immutable
- Cannot rename fields - Use new fields, deprecate old ones
- Validation rules - Can be added but not easily removed
Multi-Version CRD Support¶
The operator may support multiple CRD versions simultaneously:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
spec:
versions:
- name: v1alpha1 # Old version
served: true
storage: false # Deprecated
- name: v1beta1 # New version
served: true
storage: true # Default
Migration Path:
# 1. New version added (both served)
# 2. Convert resources to new version
# 3. Deprecate old version
# 4. Remove old version (major release)
Upgrading MetadataServices Workloads¶
MetadataServices workloads (storage and schema) use automatic blue-green deployment when image tags change.
Zero-Downtime Upgrade Process¶
Step 1: Update Image Tags¶
# Edit MetadataServices resource
kubectl edit metadataservices sample1 -n autoscalingv2
# Update image tags:
spec:
storage:
imageTag: "1.0.500-new" # Updated from 1.0.437-old
schema:
imageTag: "1.0.600-new" # Updated from 1.0.547-old
OR via kubectl patch:
kubectl patch metadataservices sample1 -n autoscalingv2 --type=merge -p '
{
"spec": {
"storage": {"imageTag": "1.0.500-new"},
"schema": {"imageTag": "1.0.600-new"}
}
}'
Step 2: Monitor Upgrade Progress¶
# Watch deployment phase
watch -n 2 "kubectl get metadataservices sample1 -n autoscalingv2 \
-o jsonpath='{.status.deploymentPhase}: {.status.activeStrategy} -> {.status.pendingStrategy}'"
# Phases: Stable -> Deploying -> Switching -> Cleanup -> Stable
Expected Timeline:
| Phase | Duration | Description |
|---|---|---|
| Stable → Deploying | 0s | New strategy deployment initiated |
| Deploying | 2-5 min | New pods starting, passing health checks |
| Deploying (grace) | 2 min | Grace period for stability |
| Switching | 10s | Traffic switched to new version |
| Cleanup | 30s | Old version resources deleted |
| Stable | - | Upgrade complete |
Total: ~5-10 minutes
Step 3: Verify Upgrade¶
# Check active version
kubectl get metadataservices sample1 -n autoscalingv2 \
-o jsonpath='{.status.activeReleaseVersion}'
# Check current image tags
kubectl get metadataservices sample1 -n autoscalingv2 \
-o jsonpath='{.spec.storage.imageTag}: {.spec.schema.imageTag}'
# Check pods are running
kubectl get pods -n autoscalingv2 -l app=sample1
# Check release history
kubectl get metadataservices sample1 -n autoscalingv2 \
-o jsonpath='{.status.releaseHistory}' | jq .
Step 4: Test Application¶
# Test storage service
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
curl -v http://sample1-storage-green:9005
# Test schema service
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
curl -v http://sample1-schema-green:9006
# Check logs for errors
kubectl logs -n autoscalingv2 -l app=sample1 --tail=100 | grep ERROR
Batch Upgrades¶
For upgrading multiple MetadataServices resources:
#!/bin/bash
# batch-upgrade.sh
NEW_STORAGE_TAG="1.0.500-new"
NEW_SCHEMA_TAG="1.0.600-new"
NAMESPACE="autoscalingv2"
# Get all MetadataServices resources
RESOURCES=$(kubectl get metadataservices -n $NAMESPACE -o name)
for resource in $RESOURCES; do
name=$(echo $resource | cut -d'/' -f2)
echo "Upgrading $name..."
kubectl patch metadataservices $name -n $NAMESPACE --type=merge -p "
{
\"spec\": {
\"storage\": {\"imageTag\": \"$NEW_STORAGE_TAG\"},
\"schema\": {\"imageTag\": \"$NEW_SCHEMA_TAG\"}
}
}"
# Wait for upgrade to complete
echo "Waiting for $name to stabilize..."
while true; do
phase=$(kubectl get metadataservices $name -n $NAMESPACE -o jsonpath='{.status.deploymentPhase}')
if [ "$phase" = "Stable" ]; then
echo "$name upgrade complete"
break
fi
echo " Phase: $phase"
sleep 10
done
echo ""
done
echo "All upgrades complete"
Canary Upgrades¶
For gradual rollout:
-
Test on staging first
-
Upgrade production in phases
Rollback Procedures¶
Rollback Operator¶
Helm Rollback¶
# List release history
helm history e6-operator -n e6-operator-system
# Rollback to previous version
helm rollback e6-operator -n e6-operator-system
# OR rollback to specific revision
helm rollback e6-operator 3 -n e6-operator-system
kubectl Rollback¶
# Rollback deployment
kubectl rollout undo deployment/e6-operator-controller-manager \
-n e6-operator-system
# OR to specific revision
kubectl rollout undo deployment/e6-operator-controller-manager \
-n e6-operator-system --to-revision=2
Rollback CRDs¶
⚠️ Warning: CRD rollback is risky and not recommended.
Why CRD Rollback is Dangerous: - May remove fields that resources are using - Can cause validation errors - May lose data in removed fields
If absolutely necessary:
# Re-apply old CRD version
kubectl apply -f crd-v0.1.0.yaml
# Verify all resources still valid
kubectl get metadataservices -A
# Check for validation errors in operator logs
kubectl logs -n e6-operator-system deployment/e6-operator-controller-manager \
| grep ERROR
Rollback MetadataServices Workloads¶
See Rollback Guide for detailed workload rollback procedures.
Quick rollback:
# Manual rollback to previous version
kubectl annotate metadataservices sample1 -n autoscalingv2 \
e6data.io/rollback-to=$(kubectl get metadataservices sample1 -n autoscalingv2 \
-o jsonpath='{.status.releaseHistory[-2].version}')
# OR automatic rollback on failure (happens automatically after 2 min)
Version Compatibility¶
Compatibility Matrix¶
| Operator Version | CRD Version | Min Kubernetes | Storage Image | Schema Image |
|---|---|---|---|---|
| v0.1.0 | v1alpha1 | 1.20+ | 1.0.437+ | 1.0.547+ |
| v0.2.0 | v1alpha1 | 1.22+ | 1.0.450+ | 1.0.550+ |
| v1.0.0 | v1beta1 | 1.24+ | 1.1.0+ | 1.1.0+ |
Skipping Versions¶
Minor versions: Can be skipped safely
Major versions: Must upgrade sequentially
Kubernetes Version Requirements¶
| E6 Operator | Min K8s | Recommended K8s |
|---|---|---|
| v0.1.x | 1.20 | 1.24+ |
| v0.2.x | 1.22 | 1.26+ |
| v1.0.x | 1.24 | 1.28+ |
Testing Upgrades¶
Dry-Run Upgrade¶
# Helm dry-run
helm upgrade e6-operator e6data/e6-operator \
--namespace e6-operator-system \
--version 0.2.0 \
--dry-run --debug
# kubectl dry-run
kubectl apply -k config/default --dry-run=client
Test in Staging¶
# 1. Deploy operator to staging cluster
helm install e6-operator-staging e6data/e6-operator \
--namespace e6-operator-staging \
--create-namespace
# 2. Create test MetadataServices
kubectl apply -f test-metadataservices.yaml -n staging
# 3. Upgrade operator
helm upgrade e6-operator-staging e6data/e6-operator \
--namespace e6-operator-staging \
--version 0.2.0
# 4. Verify everything works
# 5. Proceed with production upgrade
Monitoring Upgrades¶
Key Metrics to Monitor¶
# Reconciliation errors during upgrade
rate(controller_runtime_reconcile_errors_total[5m])
# Deployment unavailability
count(kube_deployment_status_replicas_unavailable > 0)
# Pod restarts
rate(kube_pod_container_status_restarts_total[5m])
# Workqueue depth
workqueue_depth{name="metadataservices"}
Alert Rules for Upgrades¶
- alert: UpgradeStalled
expr: kube_deployment_status_observed_generation != kube_deployment_metadata_generation
for: 10m
labels:
severity: warning
annotations:
summary: "Deployment upgrade stalled"
- alert: HighErrorRateDuringUpgrade
expr: rate(controller_runtime_reconcile_errors_total[5m]) > 0.1
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate during upgrade"
Best Practices¶
1. Always Upgrade During Maintenance Window¶
- Schedule upgrades during low-traffic periods
- Inform stakeholders of planned upgrades
- Have rollback plan ready
2. Test in Non-Production First¶
3. Upgrade One Component at a Time¶
4. Monitor Closely¶
- Watch operator logs
- Monitor metrics
- Check resource status
- Verify application functionality
5. Document Upgrade Process¶
- Record versions before/after
- Document any issues encountered
- Note rollback procedures used
- Share lessons learned
6. Backup Before Upgrade¶
Always backup: - MetadataServices resources - CRD definitions - Operator configuration - RBAC manifests
Troubleshooting Upgrades¶
Operator Upgrade Fails¶
Check pod status:
kubectl get pods -n e6-operator-system
kubectl describe pod <pod-name> -n e6-operator-system
kubectl logs <pod-name> -n e6-operator-system --all-containers
Common issues: - Webhook certificate not ready - RBAC permission changes - Image pull errors - Resource constraints
CRD Upgrade Fails¶
Error: "field is immutable"
Error: "existing resources don't validate"
# Check which resources fail validation
kubectl get metadataservices -A -o yaml | kubectl apply --dry-run=server -f -
# Fix resources or adjust validation
Workload Upgrade Stuck¶
See Troubleshooting Guide - Blue-Green Issues