Skip to content

Autoscaling and Auto-Suspension Guide

This guide covers executor autoscaling, auto-suspension, and auto-resume features in QueryService.


Overview

Feature Purpose Trigger
Autoscaling Scale executors based on query load API calls from Queue service
Auto-Suspension Suspend idle clusters to save costs Idle timeout reached
Auto-Resume Wake suspended clusters on demand Incoming query via Envoy
Pool Burst Overflow to shared compute pool Regular capacity exceeded

Executor Autoscaling

How It Works

  1. Queue service monitors query load (pending queries, active workers)
  2. Queue calls operator API at http://e6-operator:8082/endpoints/v1/cluster/{namespace}/{name}/autoscale
  3. Operator scales executor deployment up or down
  4. Status updated with scaling history

Configuration

apiVersion: e6data.io/v1alpha1
kind: QueryService
metadata:
  name: analytics-cluster
spec:
  executor:
    replicas: 2                    # Initial/baseline replicas

    autoscaling:
      enabled: true                # Enable autoscaling
      minExecutors: 2              # Minimum executors (floor)
      maxExecutors: 20             # Maximum executors (ceiling)

      # Optional: Override operator endpoint
      # clusterManagementBaseURL: "http://custom-operator:8082/endpoints/v1/cluster"

      # Window-based autoscaling (optional)
      windowBased:
        enabled: true
        slidingWindowDuration: 300  # 5-minute sliding window (seconds)

Key Fields

Field Required Default Description
enabled No false Enable/disable autoscaling
minExecutors No replicas value Minimum executor count
maxExecutors Yes (if enabled) - Maximum executor count
clusterManagementBaseURL No Auto-discovered Operator API endpoint
windowBased.enabled No false Enable window-based scaling
windowBased.slidingWindowDuration No - Window duration in seconds

Autoscaling Behavior

Query Load Increases:
  Queue → POST /autoscale {targetReplicas: 8}
  Operator → Scale executor deployment to 8
  Status → scalingHistory updated

Query Load Decreases:
  Queue → POST /autoscale {targetReplicas: 3}
  Operator → Scale executor deployment to 3
  Status → scalingHistory updated

At Boundaries:
  Request for 25 executors with maxExecutors=20
  Operator → Scale to 20 (capped at max)

  Request for 1 executor with minExecutors=2
  Operator → Scale to 2 (floored at min)

Viewing Scaling History

# View recent scaling operations
kubectl get qs analytics-cluster -o jsonpath='{.status.scalingHistory}' | jq

# Example output:
[
  {
    "timestamp": "2024-12-09T10:30:00Z",
    "component": "executor",
    "oldReplicas": 2,
    "newReplicas": 8,
    "trigger": "autoscaling-api",
    "strategy": "blue"
  },
  {
    "timestamp": "2024-12-09T10:45:00Z",
    "component": "executor",
    "oldReplicas": 8,
    "newReplicas": 4,
    "trigger": "autoscaling-api",
    "strategy": "blue"
  }
]

Manual Scaling

You can also scale manually:

# Scale via kubectl patch
kubectl patch qs analytics-cluster --type=merge -p '{"spec":{"executor":{"replicas":10}}}'

# Or via API (same endpoint the Queue uses)
curl -X POST "http://e6-operator.e6-operator-system:8082/endpoints/v1/cluster/workspace-prod/analytics-cluster/autoscale" \
  -H "Content-Type: application/json" \
  -d '{"targetReplicas": 10}'

Auto-Suspension

How It Works

  1. Operator monitors query activity via Queue service
  2. After idle timeout, operator suspends components (scales to 0)
  3. Pre-suspension replicas saved for later resume
  4. Status updated with suspension history

Configuration

apiVersion: e6data.io/v1alpha1
kind: QueryService
metadata:
  name: analytics-cluster
spec:
  # Auto-suspension configuration
  autoSuspension:
    enabled: true
    maxIdleDurationMinutes: 30    # Suspend after 30 minutes idle

Key Fields

Field Required Default Description
enabled No false Enable auto-suspension
maxIdleDurationMinutes Yes (if enabled) - Idle time before suspension

Suspension Behavior

Cluster becomes idle (no active queries):
  Timer starts → 30 minutes countdown

Query arrives during countdown:
  Timer resets → Back to 30 minutes

Timer expires (30 minutes idle):
  1. Save current replicas: planner=1, queue=1, executor=4
  2. Scale planner, queue, executor to 0
  3. Envoy remains running (for auto-resume via xDS)
  4. Update suspensionHistory

Suspended state:
  - Envoy: Running (handles incoming connections via xDS)
  - Planner: 0 replicas
  - Queue: 0 replicas
  - Executor: 0 replicas

Viewing Suspension History

# View suspension events
kubectl get qs analytics-cluster -o jsonpath='{.status.suspensionHistory}' | jq

# Example output:
[
  {
    "timestamp": "2024-12-09T08:00:00Z",
    "action": "suspend",
    "trigger": "auto-suspension-api",
    "strategy": "blue",
    "componentsSuspended": ["planner", "queue", "executor"],
    "preSuspensionReplicas": {
      "plannerReplicas": 1,
      "queueReplicas": 1,
      "executorReplicas": 4
    }
  },
  {
    "timestamp": "2024-12-09T09:15:00Z",
    "action": "resume",
    "trigger": "auto-resume-api",
    "strategy": "blue"
  }
]

Manual Suspension

# Suspend via API
curl -X POST "http://e6-operator.e6-operator-system:8082/endpoints/v1/cluster/workspace-prod/analytics-cluster/suspend"

# Resume via API
curl -X POST "http://e6-operator.e6-operator-system:8082/endpoints/v1/cluster/workspace-prod/analytics-cluster/resume"

Auto-Resume

How It Works

  1. Envoy proxy routes incoming connections via xDS
  2. On query arrival, xDS control plane or external trigger calls operator resume API
  3. Operator restores planner, queue, executor to pre-suspension replicas
  4. Query proceeds once components are ready

Configuration

apiVersion: e6data.io/v1alpha1
kind: QueryService
metadata:
  name: analytics-cluster
spec:
  # Auto-resume configuration (usually paired with auto-suspension)
  autoResume:
    enabled: true

  autoSuspension:
    enabled: true
    maxIdleDurationMinutes: 30

Resume Behavior

Query arrives at suspended cluster:
  1. Envoy receives connection
  2. xDS detects no healthy backends
  3. Resume API is called (POST /resume)
  4. Operator scales components back:
     - Planner: 0 → 1 (from saved replicas)
     - Queue: 0 → 1
     - Executor: 0 → 4
  5. Envoy waits for backends to become healthy (via xDS updates)
  6. Query is forwarded to planner
  7. Resume takes ~30-60 seconds typically

Cold Start Considerations

Component Startup Time Notes
Planner 15-30s JVM warmup, connects to storage
Queue 10-20s Connects to planner
Executor 20-45s JVM warmup, cache initialization

Total resume time: ~30-90 seconds depending on resources

Reducing Resume Time

  1. Increase resources - Faster JVM startup with more CPU
  2. Use warmup pools - Pre-warmed executors via Pool CRD
  3. Adjust idle timeout - Longer timeout = fewer suspensions

Pool-Based Burst Scaling

How It Works

When autoscaling needs more executors than regular nodes can provide:

  1. Regular executors scale up to minExecutors (or capacity limit)
  2. Pool executors handle overflow up to maxExecutors
  3. Pool nodes are provisioned by Karpenter
  4. Warmup DaemonSets keep executor image cached

Configuration

apiVersion: e6data.io/v1alpha1
kind: QueryService
metadata:
  name: analytics-cluster
  labels:
    e6data.io/pool: burst-pool    # Label for pool selector
spec:
  executor:
    replicas: 2                    # Baseline on regular nodes

    autoscaling:
      enabled: true
      minExecutors: 2              # Regular node capacity
      maxExecutors: 20             # Total (regular + pool)

    # Reference to burst pool
    poolRef:
      name: burst-pool
      namespace: e6-pools

Scaling Split Logic

Request: Scale to 12 executors
Configuration: minExecutors=2, maxExecutors=20, poolRef=burst-pool

Split calculation:
  regularExecutors = min(targetReplicas, minExecutors) = min(12, 2) = 2
  poolExecutors = targetReplicas - regularExecutors = 12 - 2 = 10

Result:
  - Regular executor deployment: 2 replicas
  - Pool executor deployment: 10 replicas

Status shows:
  regularExecutorReplicas: 2
  poolExecutorReplicas: 10

Pool Configuration

apiVersion: e6data.io/v1alpha1
kind: Pool
metadata:
  name: burst-pool
  namespace: e6-pools
spec:
  minExecutors: 0                  # Pool can scale to zero
  maxExecutors: 50                 # Maximum pool capacity

  # Instance configuration
  instanceConfig:
    instanceFamily: c6g            # Or explicit instanceType
    spotEnabled: true              # Use spot instances for cost savings

  # Which QueryServices can use this pool
  queryServiceSelector:
    matchLabels:
      e6data.io/pool: burst-pool

  # Image warmup
  imageConfig:
    autoCollectImages: true        # Collect from attached QueryServices

Viewing Pool Status

# Check pool capacity
kubectl get pool burst-pool -n e6-pools -o yaml

# Check QueryService pool allocation
kubectl get qs analytics-cluster -o jsonpath='{.status}' | jq '{
  poolName: .poolName,
  regularExecutors: .regularExecutorReplicas,
  poolExecutors: .poolExecutorReplicas
}'

Best Practices

Autoscaling

  1. Set appropriate min/max
  2. minExecutors: Enough for baseline query load
  3. maxExecutors: Cost ceiling you're comfortable with

  4. Use window-based scaling for smoother scaling

  5. Prevents thrashing from short query bursts
  6. 5-minute window is a good starting point

  7. Monitor scaling history

  8. Too many scale events = adjust min/max
  9. Frequent max hits = increase maxExecutors or add pool

Auto-Suspension

  1. Choose idle timeout carefully
  2. Too short: Frequent suspend/resume overhead
  3. Too long: Unnecessary cost during idle periods
  4. 15-30 minutes is typical for interactive workloads

  5. Pair with auto-resume for seamless experience

  6. Users experience ~30-60s delay on first query after idle

  7. Consider workload patterns

  8. Business hours only: Short timeout (15 min)
  9. 24/7 sporadic: Longer timeout or no auto-suspend

Pool Burst

  1. Use pools for cost optimization
  2. Regular nodes: Reserved/on-demand for baseline
  3. Pool nodes: Spot instances for burst

  4. Set pool max based on budget

  5. Pool executors = maxExecutors - minExecutors

  6. Enable image warmup

  7. Reduces executor startup time on pool nodes
  8. Keep warmup DaemonSet running

Troubleshooting

Autoscaling Not Working

# Check operator API is accessible
kubectl port-forward -n e6-operator-system svc/e6-operator 8082:8082
curl http://localhost:8082/health

# Check autoscaling is enabled
kubectl get qs analytics-cluster -o jsonpath='{.spec.executor.autoscaling}'

# Check operator logs for scaling requests
kubectl logs -n e6-operator-system deployment/e6-operator | grep -i autoscale

Cluster Not Suspending

# Check auto-suspension config
kubectl get qs analytics-cluster -o jsonpath='{.spec.autoSuspension}'

# Check if there are active queries (prevents suspension)
kubectl logs -l app=queue -n workspace-prod | grep -i "active queries"

# Check operator logs
kubectl logs -n e6-operator-system deployment/e6-operator | grep -i suspend

Resume Taking Too Long

# Check pod startup events
kubectl get events -n workspace-prod --sort-by='.lastTimestamp' | grep -E 'planner|queue|executor'

# Check if image pull is slow
kubectl describe pod -l app=executor -n workspace-prod | grep -A5 "Events:"

# Consider using a Pool with warmup for faster resume