Node Requirements Guide¶

This guide documents the node configuration, taints, tolerations, and ServiceAccount requirements for e6data components.

Quick Reference¶

Component	Node Scheduling	ServiceAccount Required	Cloud Access
MetadataServices	Custom tolerations & nodeSelector	Yes	S3/GCS/Azure
QueryService	Custom tolerations & nodeSelector	Yes	S3/GCS/Azure
Pool Nodes	Custom tolerations & nodeSelector	Inherits from QueryService	S3/GCS/Azure
MonitoringServices	Optional (runs anywhere)	Yes (auto-created)	None

1. Node Taints and Tolerations¶

1.1 Overview¶

The operator supports any custom tolerations you provide. You can use your existing node taints and the operator will schedule pods accordingly.

Key Principle: You define your node taints, then configure the CR with matching tolerations.

1.2 Using Custom Tolerations¶

Specify any tolerations in your CR that match your node taints:

apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
  name: analytics-prod
spec:
  tenant: customer-a
  storageBackend: s3a://my-bucket
  storage:
    imageTag: "3.0.217"

  # Your custom tolerations - match your node taints
  tolerations:
    - key: "dedicated"
      operator: "Equal"
      value: "e6data"
      effect: "NoSchedule"
    - key: "workload-type"
      operator: "Equal"
      value: "analytics"
      effect: "NoSchedule"

Common toleration patterns:

# Tolerate any taint with a specific key
tolerations:
  - key: "dedicated"
    operator: "Exists"
    effect: "NoSchedule"

# Tolerate specific key-value pair
tolerations:
  - key: "team"
    operator: "Equal"
    value: "data-platform"
    effect: "NoSchedule"

# Tolerate spot/preemptible instances
tolerations:
  - key: "kubernetes.io/preemptible"
    operator: "Exists"
    effect: "NoSchedule"

1.3 Automatic Tolerations (Built-in)¶

The operator automatically adds these tolerations (in addition to any you specify):

Workspace Toleration (always added):

tolerations:
  - key: "e6data-workspace-name"
    operator: "Equal"
    value: "<workspace>"  # From spec.workspace or CR name
    effect: "NoSchedule"

Azure Spot Toleration (when cloud=AZURE):

tolerations:
  - key: "kubernetes.azure.com/scalesetpriority"
    operator: "Equal"
    value: "spot"
    effect: "NoSchedule"

1.4 Example: Using Your Existing Node Taints¶

If your cluster already has tainted nodes:

# Your existing node setup
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
# NAME        TAINTS
# worker-1    [dedicated=analytics:NoSchedule]
# worker-2    [dedicated=analytics:NoSchedule]

Configure the CR to match:

apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
  name: analytics-prod
spec:
  tolerations:
    - key: "dedicated"
      operator: "Equal"
      value: "analytics"
      effect: "NoSchedule"

2. Node Selectors¶

2.1 Using Custom Node Selectors¶

Specify any node selectors to target specific nodes:

apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
  name: analytics-prod
spec:
  nodeSelector:
    node-pool: "e6data-storage"
    instance-type: "memory-optimized"
    topology.kubernetes.io/zone: "us-east-1a"

Common node selector patterns:

# Target specific node pool
nodeSelector:
  node-pool: "analytics"

# Target by instance type
nodeSelector:
  node.kubernetes.io/instance-type: "r5.4xlarge"

# Target by zone
nodeSelector:
  topology.kubernetes.io/zone: "us-west-2a"

# Multiple selectors (AND logic)
nodeSelector:
  team: "data-platform"
  environment: "production"

2.2 Automatic Node Selectors (GCP Only)¶

For GCP clusters, the operator automatically adds a workspace node selector:

nodeSelector:
  e6data-workspace-name: "<workspace>"

This is in addition to any custom selectors you provide.

2.3 Example: Target Your Existing Node Pool¶

If you have labeled nodes:

# Your existing node labels
kubectl get nodes --show-labels | grep node-pool
# worker-1   node-pool=analytics-storage
# worker-2   node-pool=analytics-storage

Configure the CR to match:

apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
  name: analytics-prod
spec:
  nodeSelector:
    node-pool: "analytics-storage"

3. Karpenter Integration¶

3.1 Provisioner Affinity¶

When using Karpenter for node auto-provisioning, specify the provisioner name:

apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
  name: analytics-prod
spec:
  karpenterNodePool: "e6data-storage"

The operator adds node affinity for the Karpenter provisioner:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: "karpenter.sh/nodepool"
              operator: "In"
              values: ["e6data-storage"]

3.2 Karpenter NodePool Example¶

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: e6data-storage
spec:
  template:
    spec:
      taints:
        - key: "e6data-workspace-name"
          value: "analytics-prod"
          effect: "NoSchedule"
      requirements:
        - key: "kubernetes.io/arch"
          operator: In
          values: ["amd64"]
        - key: "karpenter.sh/capacity-type"
          operator: In
          values: ["on-demand"]
        - key: "node.kubernetes.io/instance-type"
          operator: In
          values: ["r5.4xlarge", "r5.8xlarge", "r6i.4xlarge", "r6i.8xlarge"]

4. ServiceAccount Requirements¶

4.1 ServiceAccount Naming¶

The operator creates or uses a ServiceAccount for each MetadataServices/QueryService:

spec.serviceAccount	Resulting ServiceAccount Name
Not specified	Uses CR name (metadata.name)
Specified	Uses specified value

4.2 Auto-Created RBAC¶

By default (autoCreateRBAC: true), the operator creates:

ServiceAccount with the appropriate name
Role with minimal permissions
RoleBinding linking them

Disable auto-creation:

apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
  name: analytics-prod
spec:
  autoCreateRBAC: false  # You must create SA and RBAC manually
  serviceAccount: my-custom-sa

4.3 Required Cloud IAM Permissions¶

The ServiceAccount needs cloud storage access. Configure via:

AWS IRSA¶

apiVersion: v1
kind: ServiceAccount
metadata:
  name: analytics-prod  # Must match CR name or spec.serviceAccount
  namespace: workspace-prod
  annotations:
    eks.amazonaws.com/role-arn: "arn:aws:iam::123456789012:role/e6data-storage-role"

Required IAM Policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::e6data-bucket",
        "arn:aws:s3:::e6data-bucket/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "glue:GetDatabase",
        "glue:GetDatabases",
        "glue:GetTable",
        "glue:GetTables",
        "glue:GetPartitions",
        "glue:BatchGetPartition"
      ],
      "Resource": "*"
    }
  ]
}

GCP Workload Identity¶

apiVersion: v1
kind: ServiceAccount
metadata:
  name: analytics-prod
  namespace: workspace-prod
  annotations:
    iam.gke.io/gcp-service-account: "e6data-sa@project-id.iam.gserviceaccount.com"

Required GCP Roles:

roles/storage.objectViewer (read)
roles/storage.objectCreator (write)
roles/bigquery.dataViewer (for BigQuery catalogs)

Azure Workload Identity¶

apiVersion: v1
kind: ServiceAccount
metadata:
  name: analytics-prod
  namespace: workspace-prod
  annotations:
    azure.workload.identity/client-id: "12345678-1234-1234-1234-123456789012"
  labels:
    azure.workload.identity/use: "true"

Required Azure Roles:

Storage Blob Data Reader (read)
Storage Blob Data Contributor (write)

5. Complete Node Setup Example¶

5.1 AWS EKS Setup¶

# 1. Create node group with taints
eksctl create nodegroup \
  --cluster my-cluster \
  --name e6data-storage \
  --node-type r5.4xlarge \
  --nodes 3 \
  --taints e6data-workspace-name=analytics-prod:NoSchedule

# 2. Create IAM role for IRSA
aws iam create-role \
  --role-name e6data-storage-role \
  --assume-role-policy-document file://trust-policy.json

# 3. Associate with ServiceAccount
eksctl create iamserviceaccount \
  --name analytics-prod \
  --namespace workspace-prod \
  --cluster my-cluster \
  --role-name e6data-storage-role \
  --approve

5.2 GCP GKE Setup¶

# 1. Create node pool with taints
gcloud container node-pools create e6data-storage \
  --cluster my-cluster \
  --machine-type n2-highmem-16 \
  --num-nodes 3 \
  --node-taints e6data-workspace-name=analytics-prod:NoSchedule \
  --node-labels e6data-workspace-name=analytics-prod

# 2. Setup Workload Identity
gcloud iam service-accounts create e6data-sa

gcloud iam service-accounts add-iam-policy-binding \
  e6data-sa@project-id.iam.gserviceaccount.com \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:project-id.svc.id.goog[workspace-prod/analytics-prod]"

# 3. Grant storage access
gcloud storage buckets add-iam-policy-binding gs://e6data-bucket \
  --member "serviceAccount:e6data-sa@project-id.iam.gserviceaccount.com" \
  --role roles/storage.objectViewer

5.3 Azure AKS Setup¶

# 1. Create node pool with taints
az aks nodepool add \
  --cluster-name my-cluster \
  --name e6datastorage \
  --node-vm-size Standard_E16s_v4 \
  --node-count 3 \
  --node-taints e6data-workspace-name=analytics-prod:NoSchedule

# 2. Enable Workload Identity
az aks update \
  --name my-cluster \
  --enable-oidc-issuer \
  --enable-workload-identity

# 3. Create federated credential
az identity federated-credential create \
  --name e6data-federated \
  --identity-name e6data-identity \
  --issuer $(az aks show --name my-cluster --query "oidcIssuerProfile.issuerUrl" -o tsv) \
  --subject system:serviceaccount:workspace-prod:analytics-prod

6. Troubleshooting¶

6.1 Pods Stuck in Pending¶

# Check pod events
kubectl describe pod -l app.kubernetes.io/name=storage -n workspace-prod

# Common issues:
# - No nodes with matching taint
# - Insufficient resources
# - Missing node labels (GCP)

Fix: Verify taints and labels:

kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
kubectl get nodes --show-labels | grep e6data-workspace-name

6.2 Permission Denied on S3/GCS¶

# Check ServiceAccount annotations
kubectl get sa analytics-prod -n workspace-prod -o yaml

# Test IRSA (AWS)
kubectl run test-aws --rm -it --restart=Never \
  --serviceaccount=analytics-prod \
  --image=amazon/aws-cli \
  -- s3 ls s3://e6data-bucket

6.3 Wrong Workspace Toleration¶

# Check pod tolerations
kubectl get pod <pod-name> -n workspace-prod -o jsonpath='{.spec.tolerations}' | jq

# Verify CR workspace field
kubectl get metadataservices analytics-prod -n workspace-prod -o jsonpath='{.spec.workspace}'

7. Best Practices¶

Use consistent naming: Keep CR name, workspace, and ServiceAccount names aligned
Dedicated node pools: Create separate node pools for e6data workloads
Resource isolation: Use taints to prevent other workloads from scheduling on e6data nodes
Least privilege IAM: Grant only required cloud storage permissions
Monitor node resources: Ensure nodes have sufficient memory for storage/schema services