Autoscaling and Auto-Suspension Guide¶

This guide covers executor autoscaling, auto-suspension, and auto-resume features in QueryService.

Overview¶

Feature	Purpose	Trigger
Autoscaling	Scale executors based on query load	API calls from Queue service
Auto-Suspension	Suspend idle clusters to save costs	Idle timeout reached
Auto-Resume	Wake suspended clusters on demand	Incoming query via Envoy
Pool Burst	Overflow to shared compute pool	Regular capacity exceeded

Executor Autoscaling¶

How It Works¶

Queue service monitors query load (pending queries, active workers)
Queue calls operator API at http://e6-operator:8082/endpoints/v1/cluster/{namespace}/{name}/autoscale
Operator scales executor deployment up or down
Status updated with scaling history

Configuration¶

apiVersion: e6data.io/v1alpha1
kind: QueryService
metadata:
  name: analytics-cluster
spec:
  executor:
    replicas: 2                    # Initial/baseline replicas

    autoscaling:
      enabled: true                # Enable autoscaling
      minExecutors: 2              # Minimum executors (floor)
      maxExecutors: 20             # Maximum executors (ceiling)

      # Optional: Override operator endpoint
      # clusterManagementBaseURL: "http://custom-operator:8082/endpoints/v1/cluster"

      # Window-based autoscaling (optional)
      windowBased:
        enabled: true
        slidingWindowDuration: 300  # 5-minute sliding window (seconds)

Key Fields¶

Field	Required	Default	Description
`enabled`	No	`false`	Enable/disable autoscaling
`minExecutors`	No	`replicas` value	Minimum executor count
`maxExecutors`	Yes (if enabled)	-	Maximum executor count
`clusterManagementBaseURL`	No	Auto-discovered	Operator API endpoint
`windowBased.enabled`	No	`false`	Enable window-based scaling
`windowBased.slidingWindowDuration`	No	-	Window duration in seconds

Autoscaling Behavior¶

Query Load Increases:
  Queue → POST /autoscale {targetReplicas: 8}
  Operator → Scale executor deployment to 8
  Status → scalingHistory updated

Query Load Decreases:
  Queue → POST /autoscale {targetReplicas: 3}
  Operator → Scale executor deployment to 3
  Status → scalingHistory updated

At Boundaries:
  Request for 25 executors with maxExecutors=20
  Operator → Scale to 20 (capped at max)

  Request for 1 executor with minExecutors=2
  Operator → Scale to 2 (floored at min)

Viewing Scaling History¶

# View recent scaling operations
kubectl get qs analytics-cluster -o jsonpath='{.status.scalingHistory}' | jq

# Example output:
[
  {
    "timestamp": "2024-12-09T10:30:00Z",
    "component": "executor",
    "oldReplicas": 2,
    "newReplicas": 8,
    "trigger": "autoscaling-api",
    "strategy": "blue"
  },
  {
    "timestamp": "2024-12-09T10:45:00Z",
    "component": "executor",
    "oldReplicas": 8,
    "newReplicas": 4,
    "trigger": "autoscaling-api",
    "strategy": "blue"
  }
]

Manual Scaling¶

You can also scale manually:

# Scale via kubectl patch
kubectl patch qs analytics-cluster --type=merge -p '{"spec":{"executor":{"replicas":10}}}'

# Or via API (same endpoint the Queue uses)
curl -X POST "http://e6-operator.e6-operator-system:8082/endpoints/v1/cluster/workspace-prod/analytics-cluster/autoscale" \
  -H "Content-Type: application/json" \
  -d '{"targetReplicas": 10}'

Auto-Suspension¶

How It Works¶

Operator monitors query activity via Queue service
After idle timeout, operator suspends components (scales to 0)
Pre-suspension replicas saved for later resume
Status updated with suspension history

Configuration¶

apiVersion: e6data.io/v1alpha1
kind: QueryService
metadata:
  name: analytics-cluster
spec:
  # Auto-suspension configuration
  autoSuspension:
    enabled: true
    maxIdleDurationMinutes: 30    # Suspend after 30 minutes idle

Key Fields¶

Field	Required	Default	Description
`enabled`	No	`false`	Enable auto-suspension
`maxIdleDurationMinutes`	Yes (if enabled)	-	Idle time before suspension

Suspension Behavior¶

Cluster becomes idle (no active queries):
  Timer starts → 30 minutes countdown

Query arrives during countdown:
  Timer resets → Back to 30 minutes

Timer expires (30 minutes idle):
  1. Save current replicas: planner=1, queue=1, executor=4
  2. Scale planner, queue, executor to 0
  3. Envoy remains running (for auto-resume via xDS)
  4. Update suspensionHistory

Suspended state:
  - Envoy: Running (handles incoming connections via xDS)
  - Planner: 0 replicas
  - Queue: 0 replicas
  - Executor: 0 replicas

Viewing Suspension History¶

# View suspension events
kubectl get qs analytics-cluster -o jsonpath='{.status.suspensionHistory}' | jq

# Example output:
[
  {
    "timestamp": "2024-12-09T08:00:00Z",
    "action": "suspend",
    "trigger": "auto-suspension-api",
    "strategy": "blue",
    "componentsSuspended": ["planner", "queue", "executor"],
    "preSuspensionReplicas": {
      "plannerReplicas": 1,
      "queueReplicas": 1,
      "executorReplicas": 4
    }
  },
  {
    "timestamp": "2024-12-09T09:15:00Z",
    "action": "resume",
    "trigger": "auto-resume-api",
    "strategy": "blue"
  }
]

Manual Suspension¶

# Suspend via API
curl -X POST "http://e6-operator.e6-operator-system:8082/endpoints/v1/cluster/workspace-prod/analytics-cluster/suspend"

# Resume via API
curl -X POST "http://e6-operator.e6-operator-system:8082/endpoints/v1/cluster/workspace-prod/analytics-cluster/resume"

Auto-Resume¶

How It Works¶

Envoy proxy routes incoming connections via xDS
On query arrival, xDS control plane or external trigger calls operator resume API
Operator restores planner, queue, executor to pre-suspension replicas
Query proceeds once components are ready

Configuration¶

apiVersion: e6data.io/v1alpha1
kind: QueryService
metadata:
  name: analytics-cluster
spec:
  # Auto-resume configuration (usually paired with auto-suspension)
  autoResume:
    enabled: true

  autoSuspension:
    enabled: true
    maxIdleDurationMinutes: 30

Resume Behavior¶

Query arrives at suspended cluster:
  1. Envoy receives connection
  2. xDS detects no healthy backends
  3. Resume API is called (POST /resume)
  4. Operator scales components back:
     - Planner: 0 → 1 (from saved replicas)
     - Queue: 0 → 1
     - Executor: 0 → 4
  5. Envoy waits for backends to become healthy (via xDS updates)
  6. Query is forwarded to planner
  7. Resume takes ~30-60 seconds typically

Cold Start Considerations¶

Component	Startup Time	Notes
Planner	15-30s	JVM warmup, connects to storage
Queue	10-20s	Connects to planner
Executor	20-45s	JVM warmup, cache initialization

Total resume time: ~30-90 seconds depending on resources

Reducing Resume Time¶

Increase resources - Faster JVM startup with more CPU
Use warmup pools - Pre-warmed executors via Pool CRD
Adjust idle timeout - Longer timeout = fewer suspensions

Pool-Based Burst Scaling¶

How It Works¶

When autoscaling needs more executors than regular nodes can provide:

Regular executors scale up to minExecutors (or capacity limit)
Pool executors handle overflow up to maxExecutors
Pool nodes are provisioned by Karpenter
Warmup DaemonSets keep executor image cached

Configuration¶

apiVersion: e6data.io/v1alpha1
kind: QueryService
metadata:
  name: analytics-cluster
  labels:
    e6data.io/pool: burst-pool    # Label for pool selector
spec:
  executor:
    replicas: 2                    # Baseline on regular nodes

    autoscaling:
      enabled: true
      minExecutors: 2              # Regular node capacity
      maxExecutors: 20             # Total (regular + pool)

    # Reference to burst pool
    poolRef:
      name: burst-pool
      namespace: e6-pools

Scaling Split Logic¶

Request: Scale to 12 executors
Configuration: minExecutors=2, maxExecutors=20, poolRef=burst-pool

Split calculation:
  regularExecutors = min(targetReplicas, minExecutors) = min(12, 2) = 2
  poolExecutors = targetReplicas - regularExecutors = 12 - 2 = 10

Result:
  - Regular executor deployment: 2 replicas
  - Pool executor deployment: 10 replicas

Status shows:
  regularExecutorReplicas: 2
  poolExecutorReplicas: 10

Pool Configuration¶

apiVersion: e6data.io/v1alpha1
kind: Pool
metadata:
  name: burst-pool
  namespace: e6-pools
spec:
  minExecutors: 0                  # Pool can scale to zero
  maxExecutors: 50                 # Maximum pool capacity

  # Instance configuration
  instanceConfig:
    instanceFamily: c6g            # Or explicit instanceType
    spotEnabled: true              # Use spot instances for cost savings

  # Which QueryServices can use this pool
  queryServiceSelector:
    matchLabels:
      e6data.io/pool: burst-pool

  # Image warmup
  imageConfig:
    autoCollectImages: true        # Collect from attached QueryServices

Viewing Pool Status¶

# Check pool capacity
kubectl get pool burst-pool -n e6-pools -o yaml

# Check QueryService pool allocation
kubectl get qs analytics-cluster -o jsonpath='{.status}' | jq '{
  poolName: .poolName,
  regularExecutors: .regularExecutorReplicas,
  poolExecutors: .poolExecutorReplicas
}'

Best Practices¶

Autoscaling¶

Set appropriate min/max
minExecutors: Enough for baseline query load
maxExecutors: Cost ceiling you're comfortable with
Use window-based scaling for smoother scaling
Prevents thrashing from short query bursts
5-minute window is a good starting point
Monitor scaling history
Too many scale events = adjust min/max
Frequent max hits = increase maxExecutors or add pool

Auto-Suspension¶

Choose idle timeout carefully
Too short: Frequent suspend/resume overhead
Too long: Unnecessary cost during idle periods
15-30 minutes is typical for interactive workloads
Pair with auto-resume for seamless experience
Users experience ~30-60s delay on first query after idle
Consider workload patterns
Business hours only: Short timeout (15 min)
24/7 sporadic: Longer timeout or no auto-suspend

Pool Burst¶

Use pools for cost optimization
Regular nodes: Reserved/on-demand for baseline
Pool nodes: Spot instances for burst
Set pool max based on budget
Pool executors = maxExecutors - minExecutors
Enable image warmup
Reduces executor startup time on pool nodes
Keep warmup DaemonSet running

Troubleshooting¶

Autoscaling Not Working¶

# Check operator API is accessible
kubectl port-forward -n e6-operator-system svc/e6-operator 8082:8082
curl http://localhost:8082/health

# Check autoscaling is enabled
kubectl get qs analytics-cluster -o jsonpath='{.spec.executor.autoscaling}'

# Check operator logs for scaling requests
kubectl logs -n e6-operator-system deployment/e6-operator | grep -i autoscale

Cluster Not Suspending¶

# Check auto-suspension config
kubectl get qs analytics-cluster -o jsonpath='{.spec.autoSuspension}'

# Check if there are active queries (prevents suspension)
kubectl logs -l app=queue -n workspace-prod | grep -i "active queries"

# Check operator logs
kubectl logs -n e6-operator-system deployment/e6-operator | grep -i suspend

Resume Taking Too Long¶

# Check pod startup events
kubectl get events -n workspace-prod --sort-by='.lastTimestamp' | grep -E 'planner|queue|executor'

# Check if image pull is slow
kubectl describe pod -l app=executor -n workspace-prod | grep -A5 "Events:"

# Consider using a Pool with warmup for faster resume

QueryService - Full QueryService spec reference
Pool - Pool CRD for burst capacity
Status Diagnostics - Understanding status fields

Autoscaling and Auto-Suspension Guide¶

Overview¶

Executor Autoscaling¶

How It Works¶

Configuration¶

Key Fields¶

Autoscaling Behavior¶

Viewing Scaling History¶

Manual Scaling¶

Auto-Suspension¶

How It Works¶

Configuration¶

Key Fields¶

Suspension Behavior¶

Viewing Suspension History¶

Manual Suspension¶

Auto-Resume¶

How It Works¶

Configuration¶

Resume Behavior¶

Cold Start Considerations¶

Reducing Resume Time¶

Pool-Based Burst Scaling¶

How It Works¶

Configuration¶

Scaling Split Logic¶

Pool Configuration¶

Viewing Pool Status¶

Best Practices¶

Autoscaling¶

Auto-Suspension¶

Pool Burst¶

Troubleshooting¶

Autoscaling Not Working¶

Cluster Not Suspending¶

Resume Taking Too Long¶

Related Documentation¶