TrafficInfra¶
API Version: e6data.io/v1alpha2 Kind: TrafficInfra Short Names: ti
1. Purpose¶
TrafficInfra manages the traffic infrastructure for routing gRPC queries to e6data query services. This includes:
- xDS Control Plane: Dynamic service discovery using Kubernetes Endpoints
- Envoy Proxy: High-performance gRPC load balancing with weighted routing
TrafficInfra replaces the HAProxy component that was previously part of QueryService. It provides: - Dynamic endpoint discovery (no static configuration) - Weighted blue-green traffic routing via xDS admin API - Horizontal autoscaling for Envoy proxies - In-flight query affinity via plannerip header routing
Note: For authentication, use the separate AuthGateway CRD which manages Pomerium for identity-aware access control.
Note: Infrastructure settings (tolerations, nodeSelector, affinity, imagePullSecrets) are inherited from NamespaceConfig in the same namespace.
2. High-level Behavior¶
When you create a TrafficInfra CR, the operator:
- Inherits infrastructure settings from NamespaceConfig (tolerations, node selectors, affinity)
- Deploys xDS Control Plane which watches Kubernetes Endpoints for planner services
- Deploys Envoy Proxy configured to receive dynamic configuration from xDS
- Creates LoadBalancer Service for external gRPC access (or ClusterIP if using AuthGateway)
Prerequisites¶
- NamespaceConfig must exist in the same namespace
- QueryService planner services must exist (e.g.,
{cluster}-planner-blue,{cluster}-planner-green)
Architecture¶
The xDS control plane: 1. Watches Kubernetes Endpoints for configured planner services 2. Builds Envoy cluster configurations with endpoint addresses 3. Applies traffic weights for blue-green routing 4. Pushes configuration updates to Envoy via ADS (Aggregated Discovery Service)
Child Resources Created¶
| Resource Type | Name Pattern | Purpose |
|---|---|---|
| ServiceAccount | {name}-xds | xDS control plane identity |
| Role | {name}-xds-role | RBAC for endpoint/service access |
| RoleBinding | {name}-xds-rolebinding | Binds role to service account |
| Deployment | {name}-xds | xDS control plane pods |
| Service | {name}-xds | xDS gRPC and admin endpoints |
| ConfigMap | {name}-envoy-bootstrap | Envoy bootstrap configuration |
| Deployment | {name}-envoy | Envoy proxy pods |
| Service | {name}-envoy | External gRPC endpoint |
| HPA | {name}-envoy-hpa | Envoy autoscaling (if enabled) |
3. Spec Reference¶
3.1 Top-level Fields¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
xds | XDSSpec | Yes | - | xDS control plane configuration |
envoy | EnvoySpec | Yes | - | Envoy proxy configuration |
trafficDefaults | TrafficDefaultsSpec | No | blue=100, green=0 | Initial traffic weights |
Note: For authentication, use the separate AuthGateway CRD.
3.2 XDS (XDSSpec)¶
Configuration for the xDS control plane.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
replicas | int32 | No | 2 | Number of xDS replicas (2+ for HA) |
image.repository | string | No | us-docker.pkg.dev/e6data-analytics/e6data | Image registry |
image.name | string | No | xds-control-plane | Image name |
image.tag | string | No | 1.0.14 | Image tag |
image.pullPolicy | string | No | IfNotPresent | Image pull policy |
resources.cpu | string | No | 200m | CPU request |
resources.memory | string | No | 256Mi | Memory request |
ports.grpc | int32 | No | 18000 | xDS ADS port |
ports.admin | int32 | No | 18080 | Admin API port |
discovery.services | []string | No | [] | Planner service names to watch (auto-registered by operator) |
pollInterval | int32 | No | 5 | Endpoint polling interval (seconds) |
nodeID | string | No | envoy-proxy | Expected Envoy node ID |
Discovery Services¶
The discovery.services field specifies which Kubernetes Services to watch for endpoints. In most cases, you don't need to configure this - the operator automatically registers planner services with xDS when QueryService creates them.
Automatic Registration (Recommended): When you create a QueryService, the operator: 1. Creates headless planner services ({name}-planner-blue, {name}-planner-green) 2. Automatically calls the xDS /services API to register them for endpoint discovery 3. Traffic weights are managed automatically during blue-green deployments
Manual Configuration (Optional): If you need to manually specify services:
discovery:
services:
- freshworks-planner-blue # Blue strategy planner
- freshworks-planner-green # Green strategy planner
The xDS control plane will: 1. Watch these services using the Kubernetes Endpoints API 2. Build weighted clusters for blue and green 3. Apply traffic weights when routing to planners
3.3 Envoy (EnvoySpec)¶
Configuration for the Envoy proxy.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
replicas | int32 | No | 2 | Number of Envoy replicas |
maxReplicas | int32 | No | 10 | Maximum replicas for HPA |
image.repository | string | No | envoyproxy | Image registry |
image.name | string | No | envoy | Image name |
image.tag | string | No | v1.31-latest | Image tag |
image.pullPolicy | string | No | IfNotPresent | Image pull policy |
resources.cpu | string | No | 500m | CPU request |
resources.memory | string | No | 512Mi | Memory request |
ports.grpc | int32 | No | 8080 | gRPC proxy port |
ports.admin | int32 | No | 9901 | Envoy admin port |
hpa.enabled | bool | No | true | Enable HPA |
hpa.targetCPUUtilization | int32 | No | 70 | CPU target for scaling |
hpa.targetMemoryUtilization | int32 | No | 80 | Memory target for scaling |
service.type | string | No | LoadBalancer | Service type |
service.annotations | map | No | {} | Service annotations |
3.4 TrafficDefaults¶
Initial traffic weights for blue-green routing.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
blueWeight | uint32 | No | 100 | Weight for blue strategy |
greenWeight | uint32 | No | 0 | Weight for green strategy |
4. Status Reference¶
| Field | Type | Description |
|---|---|---|
phase | string | Current phase (Pending, Deploying, Ready, Degraded, Failed) |
message | string | Human-readable status message |
xdsReady | bool | xDS control plane is ready |
envoyReady | bool | Envoy proxy is ready |
xdsEndpoint | string | xDS endpoint address |
envoyEndpoint | string | Envoy endpoint address |
trafficWeights.blue | uint32 | Current blue weight |
trafficWeights.green | uint32 | Current green weight |
discoveredServices | []DiscoveredService | Services discovered by xDS |
conditions | []Condition | Detailed conditions |
observedGeneration | int64 | Generation observed by controller |
lastTransitionTime | Time | Last status transition |
5. Example CR¶
Minimal Example (POC)¶
apiVersion: e6data.io/v1alpha2
kind: TrafficInfra
metadata:
name: traffic
namespace: workspace-freshworks-prod
spec:
xds:
replicas: 2
# discovery.services not needed - operator auto-registers planner services
envoy:
replicas: 2
service:
type: LoadBalancer
Note:
discovery.servicesandtrafficDefaultsare optional. The operator automatically registers planner services with xDS when QueryService is created, and manages traffic weights during blue-green deployments.
Production Example (with AuthGateway)¶
For production with authentication, use TrafficInfra with ClusterIP and route through AuthGateway:
apiVersion: e6data.io/v1alpha2
kind: TrafficInfra
metadata:
name: traffic
namespace: workspace-prod
spec:
xds:
replicas: 3
image:
tag: "1.0.14"
resources:
cpu: "500m"
memory: "512Mi"
discovery:
services:
- myapp-planner-blue
- myapp-planner-green
pollInterval: 3
envoy:
replicas: 3
maxReplicas: 20
resources:
cpu: "1000m"
memory: "1Gi"
hpa:
enabled: true
targetCPUUtilization: 60
targetMemoryUtilization: 70
service:
type: ClusterIP # Internal only, accessed via AuthGateway
annotations: {}
trafficDefaults:
blueWeight: 100
greenWeight: 0
---
# AuthGateway routes external traffic through Pomerium for authentication
apiVersion: e6data.io/v1alpha1
kind: AuthGateway
metadata:
name: auth
namespace: workspace-prod
spec:
domain: grpc.mycompany.com
authentication:
enabled: true
idp:
provider: google
credentialsSecretRef:
name: pomerium-idp-credentials
policy:
allowedDomains:
- "mycompany.com"
tls:
certManager:
enabled: true
issuerRef:
name: letsencrypt-prod
services:
- name: query
subdomain: query
backend:
serviceName: traffic-envoy # Points to TrafficInfra's Envoy
servicePort: 8080
timeout: "300s"
6. Traffic Management¶
Automatic Traffic Weight Management¶
The operator automatically manages traffic weights during QueryService blue-green deployments:
- When QueryService enters the "Switching" phase, the operator calls the xDS admin API
- Traffic weights are set to route 100% traffic to the new active strategy
- No manual intervention required for standard deployments
Manual Traffic Weight Control (Optional)¶
For canary deployments or manual control, traffic weights can be updated via the xDS admin API:
# Get current weights
curl http://xds-control-plane:18080/traffic
# Set 50/50 split (canary)
curl -X POST "http://xds-control-plane:18080/traffic?blue=50&green=50"
# Switch to green
curl -X POST "http://xds-control-plane:18080/traffic?blue=0&green=100"
# Rollback to blue
curl -X POST "http://xds-control-plane:18080/traffic?blue=100&green=0"
Dynamic Service Registration API¶
The xDS control plane supports dynamic service registration:
# List registered services
curl http://xds-control-plane:18080/services
# Add a service manually
curl -X PUT "http://xds-control-plane:18080/services?service=myapp-planner-blue"
# Remove a service
curl -X DELETE "http://xds-control-plane:18080/services?service=myapp-planner-blue"
Blue-Green Deployment Workflow¶
Automatic (Default): 1. Update QueryService spec (triggers new deployment to inactive strategy) 2. Operator waits for new deployment to be ready 3. Operator automatically switches traffic via xDS 4. Old deployment remains as standby
Manual Canary: 1. Deploy new version to inactive strategy (update QueryService) 2. Gradually shift traffic: 90/10 -> 70/30 -> 50/50 -> 30/70 -> 0/100 3. Monitor metrics at each step 4. If issues occur, rollback by shifting traffic back 5. Once verified, the old strategy becomes standby for next deployment
7. Relationship with QueryService¶
TrafficInfra replaces HAProxy that was previously part of QueryService:
| Old (QueryService + HAProxy) | New (QueryService + TrafficInfra) |
|---|---|
| Static HAProxy configuration | Dynamic xDS service discovery |
| HAProxy deployment per QueryService | Shared Envoy deployment |
| Manual config reload | Real-time endpoint updates |
| Limited traffic splitting | Weighted routing via API |
QueryService Changes¶
- Planner services are now headless for xDS endpoint discovery
- GRPC port (9001) added to planner services
- HAProxy deployment removed (cleanup on reconcile)
8. Troubleshooting¶
Common Issues¶
| Symptom | Possible Cause | Solution |
|---|---|---|
| xDS not discovering endpoints | Services not found | Verify discovery.services matches QueryService planner service names |
| Envoy not receiving config | xDS unreachable | Check xDS service and pod logs |
| Traffic not routing | Weights all zero | Set valid weights via admin API |
| High latency | Single Envoy replica | Increase replicas and enable HPA |
Debugging Commands¶
# Check xDS control plane status
kubectl logs -l app=xds-control-plane -n <namespace>
# Check Envoy config
kubectl exec -it <envoy-pod> -n <namespace> -- curl localhost:9901/config_dump
# Check discovered endpoints
curl http://<xds-service>:18080/endpoints
# Check current traffic weights
curl http://<xds-service>:18080/traffic
9. Singleton Constraint¶
Only one TrafficInfra is allowed per namespace. The webhook validates this constraint on create.
If you need multiple traffic configurations, use separate namespaces for each QueryService.