Node Requirements Guide¶
This guide documents the node configuration, taints, tolerations, and ServiceAccount requirements for e6data components.
Quick Reference¶
| Component | Node Scheduling | ServiceAccount Required | Cloud Access |
|---|---|---|---|
| MetadataServices | Custom tolerations & nodeSelector | Yes | S3/GCS/Azure |
| QueryService | Custom tolerations & nodeSelector | Yes | S3/GCS/Azure |
| Pool Nodes | Custom tolerations & nodeSelector | Inherits from QueryService | S3/GCS/Azure |
| MonitoringServices | Optional (runs anywhere) | Yes (auto-created) | None |
1. Node Taints and Tolerations¶
1.1 Overview¶
The operator supports any custom tolerations you provide. You can use your existing node taints and the operator will schedule pods accordingly.
Key Principle: You define your node taints, then configure the CR with matching tolerations.
1.2 Using Custom Tolerations¶
Specify any tolerations in your CR that match your node taints:
apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
name: analytics-prod
spec:
tenant: customer-a
storageBackend: s3a://my-bucket
storage:
imageTag: "3.0.217"
# Your custom tolerations - match your node taints
tolerations:
- key: "dedicated"
operator: "Equal"
value: "e6data"
effect: "NoSchedule"
- key: "workload-type"
operator: "Equal"
value: "analytics"
effect: "NoSchedule"
Common toleration patterns:
# Tolerate any taint with a specific key
tolerations:
- key: "dedicated"
operator: "Exists"
effect: "NoSchedule"
# Tolerate specific key-value pair
tolerations:
- key: "team"
operator: "Equal"
value: "data-platform"
effect: "NoSchedule"
# Tolerate spot/preemptible instances
tolerations:
- key: "kubernetes.io/preemptible"
operator: "Exists"
effect: "NoSchedule"
1.3 Automatic Tolerations (Built-in)¶
The operator automatically adds these tolerations (in addition to any you specify):
Workspace Toleration (always added):
tolerations:
- key: "e6data-workspace-name"
operator: "Equal"
value: "<workspace>" # From spec.workspace or CR name
effect: "NoSchedule"
Azure Spot Toleration (when cloud=AZURE):
tolerations:
- key: "kubernetes.azure.com/scalesetpriority"
operator: "Equal"
value: "spot"
effect: "NoSchedule"
1.4 Example: Using Your Existing Node Taints¶
If your cluster already has tainted nodes:
# Your existing node setup
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
# NAME TAINTS
# worker-1 [dedicated=analytics:NoSchedule]
# worker-2 [dedicated=analytics:NoSchedule]
Configure the CR to match:
apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
name: analytics-prod
spec:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "analytics"
effect: "NoSchedule"
2. Node Selectors¶
2.1 Using Custom Node Selectors¶
Specify any node selectors to target specific nodes:
apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
name: analytics-prod
spec:
nodeSelector:
node-pool: "e6data-storage"
instance-type: "memory-optimized"
topology.kubernetes.io/zone: "us-east-1a"
Common node selector patterns:
# Target specific node pool
nodeSelector:
node-pool: "analytics"
# Target by instance type
nodeSelector:
node.kubernetes.io/instance-type: "r5.4xlarge"
# Target by zone
nodeSelector:
topology.kubernetes.io/zone: "us-west-2a"
# Multiple selectors (AND logic)
nodeSelector:
team: "data-platform"
environment: "production"
2.2 Automatic Node Selectors (GCP Only)¶
For GCP clusters, the operator automatically adds a workspace node selector:
This is in addition to any custom selectors you provide.
2.3 Example: Target Your Existing Node Pool¶
If you have labeled nodes:
# Your existing node labels
kubectl get nodes --show-labels | grep node-pool
# worker-1 node-pool=analytics-storage
# worker-2 node-pool=analytics-storage
Configure the CR to match:
apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
name: analytics-prod
spec:
nodeSelector:
node-pool: "analytics-storage"
3. Karpenter Integration¶
3.1 Provisioner Affinity¶
When using Karpenter for node auto-provisioning, specify the provisioner name:
apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
name: analytics-prod
spec:
karpenterNodePool: "e6data-storage"
The operator adds node affinity for the Karpenter provisioner:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "karpenter.sh/nodepool"
operator: "In"
values: ["e6data-storage"]
3.2 Karpenter NodePool Example¶
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: e6data-storage
spec:
template:
spec:
taints:
- key: "e6data-workspace-name"
value: "analytics-prod"
effect: "NoSchedule"
requirements:
- key: "kubernetes.io/arch"
operator: In
values: ["amd64"]
- key: "karpenter.sh/capacity-type"
operator: In
values: ["on-demand"]
- key: "node.kubernetes.io/instance-type"
operator: In
values: ["r5.4xlarge", "r5.8xlarge", "r6i.4xlarge", "r6i.8xlarge"]
4. ServiceAccount Requirements¶
4.1 ServiceAccount Naming¶
The operator creates or uses a ServiceAccount for each MetadataServices/QueryService:
| spec.serviceAccount | Resulting ServiceAccount Name |
|---|---|
| Not specified | Uses CR name (metadata.name) |
| Specified | Uses specified value |
4.2 Auto-Created RBAC¶
By default (autoCreateRBAC: true), the operator creates:
- ServiceAccount with the appropriate name
- Role with minimal permissions
- RoleBinding linking them
Disable auto-creation:
apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
name: analytics-prod
spec:
autoCreateRBAC: false # You must create SA and RBAC manually
serviceAccount: my-custom-sa
4.3 Required Cloud IAM Permissions¶
The ServiceAccount needs cloud storage access. Configure via:
AWS IRSA¶
apiVersion: v1
kind: ServiceAccount
metadata:
name: analytics-prod # Must match CR name or spec.serviceAccount
namespace: workspace-prod
annotations:
eks.amazonaws.com/role-arn: "arn:aws:iam::123456789012:role/e6data-storage-role"
Required IAM Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::e6data-bucket",
"arn:aws:s3:::e6data-bucket/*"
]
},
{
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"glue:GetDatabases",
"glue:GetTable",
"glue:GetTables",
"glue:GetPartitions",
"glue:BatchGetPartition"
],
"Resource": "*"
}
]
}
GCP Workload Identity¶
apiVersion: v1
kind: ServiceAccount
metadata:
name: analytics-prod
namespace: workspace-prod
annotations:
iam.gke.io/gcp-service-account: "e6data-sa@project-id.iam.gserviceaccount.com"
Required GCP Roles:
roles/storage.objectViewer(read)roles/storage.objectCreator(write)roles/bigquery.dataViewer(for BigQuery catalogs)
Azure Workload Identity¶
apiVersion: v1
kind: ServiceAccount
metadata:
name: analytics-prod
namespace: workspace-prod
annotations:
azure.workload.identity/client-id: "12345678-1234-1234-1234-123456789012"
labels:
azure.workload.identity/use: "true"
Required Azure Roles:
Storage Blob Data Reader(read)Storage Blob Data Contributor(write)
5. Complete Node Setup Example¶
5.1 AWS EKS Setup¶
# 1. Create node group with taints
eksctl create nodegroup \
--cluster my-cluster \
--name e6data-storage \
--node-type r5.4xlarge \
--nodes 3 \
--taints e6data-workspace-name=analytics-prod:NoSchedule
# 2. Create IAM role for IRSA
aws iam create-role \
--role-name e6data-storage-role \
--assume-role-policy-document file://trust-policy.json
# 3. Associate with ServiceAccount
eksctl create iamserviceaccount \
--name analytics-prod \
--namespace workspace-prod \
--cluster my-cluster \
--role-name e6data-storage-role \
--approve
5.2 GCP GKE Setup¶
# 1. Create node pool with taints
gcloud container node-pools create e6data-storage \
--cluster my-cluster \
--machine-type n2-highmem-16 \
--num-nodes 3 \
--node-taints e6data-workspace-name=analytics-prod:NoSchedule \
--node-labels e6data-workspace-name=analytics-prod
# 2. Setup Workload Identity
gcloud iam service-accounts create e6data-sa
gcloud iam service-accounts add-iam-policy-binding \
e6data-sa@project-id.iam.gserviceaccount.com \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:project-id.svc.id.goog[workspace-prod/analytics-prod]"
# 3. Grant storage access
gcloud storage buckets add-iam-policy-binding gs://e6data-bucket \
--member "serviceAccount:e6data-sa@project-id.iam.gserviceaccount.com" \
--role roles/storage.objectViewer
5.3 Azure AKS Setup¶
# 1. Create node pool with taints
az aks nodepool add \
--cluster-name my-cluster \
--name e6datastorage \
--node-vm-size Standard_E16s_v4 \
--node-count 3 \
--node-taints e6data-workspace-name=analytics-prod:NoSchedule
# 2. Enable Workload Identity
az aks update \
--name my-cluster \
--enable-oidc-issuer \
--enable-workload-identity
# 3. Create federated credential
az identity federated-credential create \
--name e6data-federated \
--identity-name e6data-identity \
--issuer $(az aks show --name my-cluster --query "oidcIssuerProfile.issuerUrl" -o tsv) \
--subject system:serviceaccount:workspace-prod:analytics-prod
6. Troubleshooting¶
6.1 Pods Stuck in Pending¶
# Check pod events
kubectl describe pod -l app.kubernetes.io/name=storage -n workspace-prod
# Common issues:
# - No nodes with matching taint
# - Insufficient resources
# - Missing node labels (GCP)
Fix: Verify taints and labels:
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
kubectl get nodes --show-labels | grep e6data-workspace-name
6.2 Permission Denied on S3/GCS¶
# Check ServiceAccount annotations
kubectl get sa analytics-prod -n workspace-prod -o yaml
# Test IRSA (AWS)
kubectl run test-aws --rm -it --restart=Never \
--serviceaccount=analytics-prod \
--image=amazon/aws-cli \
-- s3 ls s3://e6data-bucket
6.3 Wrong Workspace Toleration¶
# Check pod tolerations
kubectl get pod <pod-name> -n workspace-prod -o jsonpath='{.spec.tolerations}' | jq
# Verify CR workspace field
kubectl get metadataservices analytics-prod -n workspace-prod -o jsonpath='{.spec.workspace}'
7. Best Practices¶
- Use consistent naming: Keep CR name, workspace, and ServiceAccount names aligned
- Dedicated node pools: Create separate node pools for e6data workloads
- Resource isolation: Use taints to prevent other workloads from scheduling on e6data nodes
- Least privilege IAM: Grant only required cloud storage permissions
- Monitor node resources: Ensure nodes have sufficient memory for storage/schema services