Skip to content

TrafficInfra

API Version: e6data.io/v1alpha2 Kind: TrafficInfra Short Names: ti


1. Purpose

TrafficInfra manages the traffic infrastructure for routing gRPC queries to e6data query services. This includes:

  • xDS Control Plane: Dynamic service discovery using Kubernetes Endpoints
  • Envoy Proxy: High-performance gRPC load balancing with weighted routing

TrafficInfra replaces the HAProxy component that was previously part of QueryService. It provides: - Dynamic endpoint discovery (no static configuration) - Weighted blue-green traffic routing via xDS admin API - Horizontal autoscaling for Envoy proxies - In-flight query affinity via plannerip header routing

Note: For authentication, use the separate AuthGateway CRD which manages Pomerium for identity-aware access control.

Note: Infrastructure settings (tolerations, nodeSelector, affinity, imagePullSecrets) are inherited from NamespaceConfig in the same namespace.


2. High-level Behavior

When you create a TrafficInfra CR, the operator:

  1. Inherits infrastructure settings from NamespaceConfig (tolerations, node selectors, affinity)
  2. Deploys xDS Control Plane which watches Kubernetes Endpoints for planner services
  3. Deploys Envoy Proxy configured to receive dynamic configuration from xDS
  4. Creates LoadBalancer Service for external gRPC access (or ClusterIP if using AuthGateway)

Prerequisites

  • NamespaceConfig must exist in the same namespace
  • QueryService planner services must exist (e.g., {cluster}-planner-blue, {cluster}-planner-green)

Architecture

Client -> Envoy Proxy -> Planner (blue/green)
              ^
              |
        xDS Control Plane (watches planner services)

The xDS control plane: 1. Watches Kubernetes Endpoints for configured planner services 2. Builds Envoy cluster configurations with endpoint addresses 3. Applies traffic weights for blue-green routing 4. Pushes configuration updates to Envoy via ADS (Aggregated Discovery Service)

Child Resources Created

Resource Type Name Pattern Purpose
ServiceAccount {name}-xds xDS control plane identity
Role {name}-xds-role RBAC for endpoint/service access
RoleBinding {name}-xds-rolebinding Binds role to service account
Deployment {name}-xds xDS control plane pods
Service {name}-xds xDS gRPC and admin endpoints
ConfigMap {name}-envoy-bootstrap Envoy bootstrap configuration
Deployment {name}-envoy Envoy proxy pods
Service {name}-envoy External gRPC endpoint
HPA {name}-envoy-hpa Envoy autoscaling (if enabled)

3. Spec Reference

3.1 Top-level Fields

Field Type Required Default Description
xds XDSSpec Yes - xDS control plane configuration
envoy EnvoySpec Yes - Envoy proxy configuration
trafficDefaults TrafficDefaultsSpec No blue=100, green=0 Initial traffic weights

Note: For authentication, use the separate AuthGateway CRD.


3.2 XDS (XDSSpec)

Configuration for the xDS control plane.

Field Type Required Default Description
replicas int32 No 2 Number of xDS replicas (2+ for HA)
image.repository string No us-docker.pkg.dev/e6data-analytics/e6data Image registry
image.name string No xds-control-plane Image name
image.tag string No 1.0.14 Image tag
image.pullPolicy string No IfNotPresent Image pull policy
resources.cpu string No 200m CPU request
resources.memory string No 256Mi Memory request
ports.grpc int32 No 18000 xDS ADS port
ports.admin int32 No 18080 Admin API port
discovery.services []string No [] Planner service names to watch (auto-registered by operator)
pollInterval int32 No 5 Endpoint polling interval (seconds)
nodeID string No envoy-proxy Expected Envoy node ID

Discovery Services

The discovery.services field specifies which Kubernetes Services to watch for endpoints. In most cases, you don't need to configure this - the operator automatically registers planner services with xDS when QueryService creates them.

Automatic Registration (Recommended): When you create a QueryService, the operator: 1. Creates headless planner services ({name}-planner-blue, {name}-planner-green) 2. Automatically calls the xDS /services API to register them for endpoint discovery 3. Traffic weights are managed automatically during blue-green deployments

Manual Configuration (Optional): If you need to manually specify services:

discovery:
  services:
    - freshworks-planner-blue   # Blue strategy planner
    - freshworks-planner-green  # Green strategy planner

The xDS control plane will: 1. Watch these services using the Kubernetes Endpoints API 2. Build weighted clusters for blue and green 3. Apply traffic weights when routing to planners


3.3 Envoy (EnvoySpec)

Configuration for the Envoy proxy.

Field Type Required Default Description
replicas int32 No 2 Number of Envoy replicas
maxReplicas int32 No 10 Maximum replicas for HPA
image.repository string No envoyproxy Image registry
image.name string No envoy Image name
image.tag string No v1.31-latest Image tag
image.pullPolicy string No IfNotPresent Image pull policy
resources.cpu string No 500m CPU request
resources.memory string No 512Mi Memory request
ports.grpc int32 No 8080 gRPC proxy port
ports.admin int32 No 9901 Envoy admin port
hpa.enabled bool No true Enable HPA
hpa.targetCPUUtilization int32 No 70 CPU target for scaling
hpa.targetMemoryUtilization int32 No 80 Memory target for scaling
service.type string No LoadBalancer Service type
service.annotations map No {} Service annotations

3.4 TrafficDefaults

Initial traffic weights for blue-green routing.

Field Type Required Default Description
blueWeight uint32 No 100 Weight for blue strategy
greenWeight uint32 No 0 Weight for green strategy

4. Status Reference

Field Type Description
phase string Current phase (Pending, Deploying, Ready, Degraded, Failed)
message string Human-readable status message
xdsReady bool xDS control plane is ready
envoyReady bool Envoy proxy is ready
xdsEndpoint string xDS endpoint address
envoyEndpoint string Envoy endpoint address
trafficWeights.blue uint32 Current blue weight
trafficWeights.green uint32 Current green weight
discoveredServices []DiscoveredService Services discovered by xDS
conditions []Condition Detailed conditions
observedGeneration int64 Generation observed by controller
lastTransitionTime Time Last status transition

5. Example CR

Minimal Example (POC)

apiVersion: e6data.io/v1alpha2
kind: TrafficInfra
metadata:
  name: traffic
  namespace: workspace-freshworks-prod
spec:
  xds:
    replicas: 2
    # discovery.services not needed - operator auto-registers planner services
  envoy:
    replicas: 2
    service:
      type: LoadBalancer

Note: discovery.services and trafficDefaults are optional. The operator automatically registers planner services with xDS when QueryService is created, and manages traffic weights during blue-green deployments.

Production Example (with AuthGateway)

For production with authentication, use TrafficInfra with ClusterIP and route through AuthGateway:

apiVersion: e6data.io/v1alpha2
kind: TrafficInfra
metadata:
  name: traffic
  namespace: workspace-prod
spec:
  xds:
    replicas: 3
    image:
      tag: "1.0.14"
    resources:
      cpu: "500m"
      memory: "512Mi"
    discovery:
      services:
        - myapp-planner-blue
        - myapp-planner-green
    pollInterval: 3

  envoy:
    replicas: 3
    maxReplicas: 20
    resources:
      cpu: "1000m"
      memory: "1Gi"
    hpa:
      enabled: true
      targetCPUUtilization: 60
      targetMemoryUtilization: 70
    service:
      type: ClusterIP  # Internal only, accessed via AuthGateway
      annotations: {}

  trafficDefaults:
    blueWeight: 100
    greenWeight: 0
---
# AuthGateway routes external traffic through Pomerium for authentication
apiVersion: e6data.io/v1alpha1
kind: AuthGateway
metadata:
  name: auth
  namespace: workspace-prod
spec:
  domain: grpc.mycompany.com
  authentication:
    enabled: true
    idp:
      provider: google
      credentialsSecretRef:
        name: pomerium-idp-credentials
    policy:
      allowedDomains:
        - "mycompany.com"
  tls:
    certManager:
      enabled: true
      issuerRef:
        name: letsencrypt-prod
  services:
    - name: query
      subdomain: query
      backend:
        serviceName: traffic-envoy  # Points to TrafficInfra's Envoy
        servicePort: 8080
      timeout: "300s"

6. Traffic Management

Automatic Traffic Weight Management

The operator automatically manages traffic weights during QueryService blue-green deployments:

  1. When QueryService enters the "Switching" phase, the operator calls the xDS admin API
  2. Traffic weights are set to route 100% traffic to the new active strategy
  3. No manual intervention required for standard deployments

Manual Traffic Weight Control (Optional)

For canary deployments or manual control, traffic weights can be updated via the xDS admin API:

# Get current weights
curl http://xds-control-plane:18080/traffic

# Set 50/50 split (canary)
curl -X POST "http://xds-control-plane:18080/traffic?blue=50&green=50"

# Switch to green
curl -X POST "http://xds-control-plane:18080/traffic?blue=0&green=100"

# Rollback to blue
curl -X POST "http://xds-control-plane:18080/traffic?blue=100&green=0"

Dynamic Service Registration API

The xDS control plane supports dynamic service registration:

# List registered services
curl http://xds-control-plane:18080/services

# Add a service manually
curl -X PUT "http://xds-control-plane:18080/services?service=myapp-planner-blue"

# Remove a service
curl -X DELETE "http://xds-control-plane:18080/services?service=myapp-planner-blue"

Blue-Green Deployment Workflow

Automatic (Default): 1. Update QueryService spec (triggers new deployment to inactive strategy) 2. Operator waits for new deployment to be ready 3. Operator automatically switches traffic via xDS 4. Old deployment remains as standby

Manual Canary: 1. Deploy new version to inactive strategy (update QueryService) 2. Gradually shift traffic: 90/10 -> 70/30 -> 50/50 -> 30/70 -> 0/100 3. Monitor metrics at each step 4. If issues occur, rollback by shifting traffic back 5. Once verified, the old strategy becomes standby for next deployment


7. Relationship with QueryService

TrafficInfra replaces HAProxy that was previously part of QueryService:

Old (QueryService + HAProxy) New (QueryService + TrafficInfra)
Static HAProxy configuration Dynamic xDS service discovery
HAProxy deployment per QueryService Shared Envoy deployment
Manual config reload Real-time endpoint updates
Limited traffic splitting Weighted routing via API

QueryService Changes

  • Planner services are now headless for xDS endpoint discovery
  • GRPC port (9001) added to planner services
  • HAProxy deployment removed (cleanup on reconcile)

8. Troubleshooting

Common Issues

Symptom Possible Cause Solution
xDS not discovering endpoints Services not found Verify discovery.services matches QueryService planner service names
Envoy not receiving config xDS unreachable Check xDS service and pod logs
Traffic not routing Weights all zero Set valid weights via admin API
High latency Single Envoy replica Increase replicas and enable HPA

Debugging Commands

# Check xDS control plane status
kubectl logs -l app=xds-control-plane -n <namespace>

# Check Envoy config
kubectl exec -it <envoy-pod> -n <namespace> -- curl localhost:9901/config_dump

# Check discovered endpoints
curl http://<xds-service>:18080/endpoints

# Check current traffic weights
curl http://<xds-service>:18080/traffic

9. Singleton Constraint

Only one TrafficInfra is allowed per namespace. The webhook validates this constraint on create.

If you need multiple traffic configurations, use separate namespaces for each QueryService.