Autoscaling and Auto-Suspension Guide¶
This guide covers executor autoscaling, auto-suspension, and auto-resume features in QueryService.
Overview¶
| Feature | Purpose | Trigger |
|---|---|---|
| Autoscaling | Scale executors based on query load | API calls from Queue service |
| Auto-Suspension | Suspend idle clusters to save costs | Idle timeout reached |
| Auto-Resume | Wake suspended clusters on demand | Incoming query via Envoy |
| Pool Burst | Overflow to shared compute pool | Regular capacity exceeded |
Executor Autoscaling¶
How It Works¶
- Queue service monitors query load (pending queries, active workers)
- Queue calls operator API at
http://e6-operator:8082/endpoints/v1/cluster/{namespace}/{name}/autoscale - Operator scales executor deployment up or down
- Status updated with scaling history
Configuration¶
apiVersion: e6data.io/v1alpha1
kind: QueryService
metadata:
name: analytics-cluster
spec:
executor:
replicas: 2 # Initial/baseline replicas
autoscaling:
enabled: true # Enable autoscaling
minExecutors: 2 # Minimum executors (floor)
maxExecutors: 20 # Maximum executors (ceiling)
# Optional: Override operator endpoint
# clusterManagementBaseURL: "http://custom-operator:8082/endpoints/v1/cluster"
# Window-based autoscaling (optional)
windowBased:
enabled: true
slidingWindowDuration: 300 # 5-minute sliding window (seconds)
Key Fields¶
| Field | Required | Default | Description |
|---|---|---|---|
enabled | No | false | Enable/disable autoscaling |
minExecutors | No | replicas value | Minimum executor count |
maxExecutors | Yes (if enabled) | - | Maximum executor count |
clusterManagementBaseURL | No | Auto-discovered | Operator API endpoint |
windowBased.enabled | No | false | Enable window-based scaling |
windowBased.slidingWindowDuration | No | - | Window duration in seconds |
Autoscaling Behavior¶
Query Load Increases:
Queue → POST /autoscale {targetReplicas: 8}
Operator → Scale executor deployment to 8
Status → scalingHistory updated
Query Load Decreases:
Queue → POST /autoscale {targetReplicas: 3}
Operator → Scale executor deployment to 3
Status → scalingHistory updated
At Boundaries:
Request for 25 executors with maxExecutors=20
Operator → Scale to 20 (capped at max)
Request for 1 executor with minExecutors=2
Operator → Scale to 2 (floored at min)
Viewing Scaling History¶
# View recent scaling operations
kubectl get qs analytics-cluster -o jsonpath='{.status.scalingHistory}' | jq
# Example output:
[
{
"timestamp": "2024-12-09T10:30:00Z",
"component": "executor",
"oldReplicas": 2,
"newReplicas": 8,
"trigger": "autoscaling-api",
"strategy": "blue"
},
{
"timestamp": "2024-12-09T10:45:00Z",
"component": "executor",
"oldReplicas": 8,
"newReplicas": 4,
"trigger": "autoscaling-api",
"strategy": "blue"
}
]
Manual Scaling¶
You can also scale manually:
# Scale via kubectl patch
kubectl patch qs analytics-cluster --type=merge -p '{"spec":{"executor":{"replicas":10}}}'
# Or via API (same endpoint the Queue uses)
curl -X POST "http://e6-operator.e6-operator-system:8082/endpoints/v1/cluster/workspace-prod/analytics-cluster/autoscale" \
-H "Content-Type: application/json" \
-d '{"targetReplicas": 10}'
Auto-Suspension¶
How It Works¶
- Operator monitors query activity via Queue service
- After idle timeout, operator suspends components (scales to 0)
- Pre-suspension replicas saved for later resume
- Status updated with suspension history
Configuration¶
apiVersion: e6data.io/v1alpha1
kind: QueryService
metadata:
name: analytics-cluster
spec:
# Auto-suspension configuration
autoSuspension:
enabled: true
maxIdleDurationMinutes: 30 # Suspend after 30 minutes idle
Key Fields¶
| Field | Required | Default | Description |
|---|---|---|---|
enabled | No | false | Enable auto-suspension |
maxIdleDurationMinutes | Yes (if enabled) | - | Idle time before suspension |
Suspension Behavior¶
Cluster becomes idle (no active queries):
Timer starts → 30 minutes countdown
Query arrives during countdown:
Timer resets → Back to 30 minutes
Timer expires (30 minutes idle):
1. Save current replicas: planner=1, queue=1, executor=4
2. Scale planner, queue, executor to 0
3. Envoy remains running (for auto-resume via xDS)
4. Update suspensionHistory
Suspended state:
- Envoy: Running (handles incoming connections via xDS)
- Planner: 0 replicas
- Queue: 0 replicas
- Executor: 0 replicas
Viewing Suspension History¶
# View suspension events
kubectl get qs analytics-cluster -o jsonpath='{.status.suspensionHistory}' | jq
# Example output:
[
{
"timestamp": "2024-12-09T08:00:00Z",
"action": "suspend",
"trigger": "auto-suspension-api",
"strategy": "blue",
"componentsSuspended": ["planner", "queue", "executor"],
"preSuspensionReplicas": {
"plannerReplicas": 1,
"queueReplicas": 1,
"executorReplicas": 4
}
},
{
"timestamp": "2024-12-09T09:15:00Z",
"action": "resume",
"trigger": "auto-resume-api",
"strategy": "blue"
}
]
Manual Suspension¶
# Suspend via API
curl -X POST "http://e6-operator.e6-operator-system:8082/endpoints/v1/cluster/workspace-prod/analytics-cluster/suspend"
# Resume via API
curl -X POST "http://e6-operator.e6-operator-system:8082/endpoints/v1/cluster/workspace-prod/analytics-cluster/resume"
Auto-Resume¶
How It Works¶
- Envoy proxy routes incoming connections via xDS
- On query arrival, xDS control plane or external trigger calls operator resume API
- Operator restores planner, queue, executor to pre-suspension replicas
- Query proceeds once components are ready
Configuration¶
apiVersion: e6data.io/v1alpha1
kind: QueryService
metadata:
name: analytics-cluster
spec:
# Auto-resume configuration (usually paired with auto-suspension)
autoResume:
enabled: true
autoSuspension:
enabled: true
maxIdleDurationMinutes: 30
Resume Behavior¶
Query arrives at suspended cluster:
1. Envoy receives connection
2. xDS detects no healthy backends
3. Resume API is called (POST /resume)
4. Operator scales components back:
- Planner: 0 → 1 (from saved replicas)
- Queue: 0 → 1
- Executor: 0 → 4
5. Envoy waits for backends to become healthy (via xDS updates)
6. Query is forwarded to planner
7. Resume takes ~30-60 seconds typically
Cold Start Considerations¶
| Component | Startup Time | Notes |
|---|---|---|
| Planner | 15-30s | JVM warmup, connects to storage |
| Queue | 10-20s | Connects to planner |
| Executor | 20-45s | JVM warmup, cache initialization |
Total resume time: ~30-90 seconds depending on resources
Reducing Resume Time¶
- Increase resources - Faster JVM startup with more CPU
- Use warmup pools - Pre-warmed executors via Pool CRD
- Adjust idle timeout - Longer timeout = fewer suspensions
Pool-Based Burst Scaling¶
How It Works¶
When autoscaling needs more executors than regular nodes can provide:
- Regular executors scale up to
minExecutors(or capacity limit) - Pool executors handle overflow up to
maxExecutors - Pool nodes are provisioned by Karpenter
- Warmup DaemonSets keep executor image cached
Configuration¶
apiVersion: e6data.io/v1alpha1
kind: QueryService
metadata:
name: analytics-cluster
labels:
e6data.io/pool: burst-pool # Label for pool selector
spec:
executor:
replicas: 2 # Baseline on regular nodes
autoscaling:
enabled: true
minExecutors: 2 # Regular node capacity
maxExecutors: 20 # Total (regular + pool)
# Reference to burst pool
poolRef:
name: burst-pool
namespace: e6-pools
Scaling Split Logic¶
Request: Scale to 12 executors
Configuration: minExecutors=2, maxExecutors=20, poolRef=burst-pool
Split calculation:
regularExecutors = min(targetReplicas, minExecutors) = min(12, 2) = 2
poolExecutors = targetReplicas - regularExecutors = 12 - 2 = 10
Result:
- Regular executor deployment: 2 replicas
- Pool executor deployment: 10 replicas
Status shows:
regularExecutorReplicas: 2
poolExecutorReplicas: 10
Pool Configuration¶
apiVersion: e6data.io/v1alpha1
kind: Pool
metadata:
name: burst-pool
namespace: e6-pools
spec:
minExecutors: 0 # Pool can scale to zero
maxExecutors: 50 # Maximum pool capacity
# Instance configuration
instanceConfig:
instanceFamily: c6g # Or explicit instanceType
spotEnabled: true # Use spot instances for cost savings
# Which QueryServices can use this pool
queryServiceSelector:
matchLabels:
e6data.io/pool: burst-pool
# Image warmup
imageConfig:
autoCollectImages: true # Collect from attached QueryServices
Viewing Pool Status¶
# Check pool capacity
kubectl get pool burst-pool -n e6-pools -o yaml
# Check QueryService pool allocation
kubectl get qs analytics-cluster -o jsonpath='{.status}' | jq '{
poolName: .poolName,
regularExecutors: .regularExecutorReplicas,
poolExecutors: .poolExecutorReplicas
}'
Best Practices¶
Autoscaling¶
- Set appropriate min/max
minExecutors: Enough for baseline query load-
maxExecutors: Cost ceiling you're comfortable with -
Use window-based scaling for smoother scaling
- Prevents thrashing from short query bursts
-
5-minute window is a good starting point
-
Monitor scaling history
- Too many scale events = adjust min/max
- Frequent max hits = increase maxExecutors or add pool
Auto-Suspension¶
- Choose idle timeout carefully
- Too short: Frequent suspend/resume overhead
- Too long: Unnecessary cost during idle periods
-
15-30 minutes is typical for interactive workloads
-
Pair with auto-resume for seamless experience
-
Users experience ~30-60s delay on first query after idle
-
Consider workload patterns
- Business hours only: Short timeout (15 min)
- 24/7 sporadic: Longer timeout or no auto-suspend
Pool Burst¶
- Use pools for cost optimization
- Regular nodes: Reserved/on-demand for baseline
-
Pool nodes: Spot instances for burst
-
Set pool max based on budget
-
Pool executors = maxExecutors - minExecutors
-
Enable image warmup
- Reduces executor startup time on pool nodes
- Keep warmup DaemonSet running
Troubleshooting¶
Autoscaling Not Working¶
# Check operator API is accessible
kubectl port-forward -n e6-operator-system svc/e6-operator 8082:8082
curl http://localhost:8082/health
# Check autoscaling is enabled
kubectl get qs analytics-cluster -o jsonpath='{.spec.executor.autoscaling}'
# Check operator logs for scaling requests
kubectl logs -n e6-operator-system deployment/e6-operator | grep -i autoscale
Cluster Not Suspending¶
# Check auto-suspension config
kubectl get qs analytics-cluster -o jsonpath='{.spec.autoSuspension}'
# Check if there are active queries (prevents suspension)
kubectl logs -l app=queue -n workspace-prod | grep -i "active queries"
# Check operator logs
kubectl logs -n e6-operator-system deployment/e6-operator | grep -i suspend
Resume Taking Too Long¶
# Check pod startup events
kubectl get events -n workspace-prod --sort-by='.lastTimestamp' | grep -E 'planner|queue|executor'
# Check if image pull is slow
kubectl describe pod -l app=executor -n workspace-prod | grep -A5 "Events:"
# Consider using a Pool with warmup for faster resume
Related Documentation¶
- QueryService - Full QueryService spec reference
- Pool - Pool CRD for burst capacity
- Status Diagnostics - Understanding status fields