Skip to content

Node Requirements Guide

This guide documents the node configuration, taints, tolerations, and ServiceAccount requirements for e6data components.


Quick Reference

Component Node Scheduling ServiceAccount Required Cloud Access
MetadataServices Custom tolerations & nodeSelector Yes S3/GCS/Azure
QueryService Custom tolerations & nodeSelector Yes S3/GCS/Azure
Pool Nodes Custom tolerations & nodeSelector Inherits from QueryService S3/GCS/Azure
MonitoringServices Optional (runs anywhere) Yes (auto-created) None

1. Node Taints and Tolerations

1.1 Overview

The operator supports any custom tolerations you provide. You can use your existing node taints and the operator will schedule pods accordingly.

Key Principle: You define your node taints, then configure the CR with matching tolerations.

1.2 Using Custom Tolerations

Specify any tolerations in your CR that match your node taints:

apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
  name: analytics-prod
spec:
  tenant: customer-a
  storageBackend: s3a://my-bucket
  storage:
    imageTag: "3.0.217"

  # Your custom tolerations - match your node taints
  tolerations:
    - key: "dedicated"
      operator: "Equal"
      value: "e6data"
      effect: "NoSchedule"
    - key: "workload-type"
      operator: "Equal"
      value: "analytics"
      effect: "NoSchedule"

Common toleration patterns:

# Tolerate any taint with a specific key
tolerations:
  - key: "dedicated"
    operator: "Exists"
    effect: "NoSchedule"

# Tolerate specific key-value pair
tolerations:
  - key: "team"
    operator: "Equal"
    value: "data-platform"
    effect: "NoSchedule"

# Tolerate spot/preemptible instances
tolerations:
  - key: "kubernetes.io/preemptible"
    operator: "Exists"
    effect: "NoSchedule"

1.3 Automatic Tolerations (Built-in)

The operator automatically adds these tolerations (in addition to any you specify):

Workspace Toleration (always added):

tolerations:
  - key: "e6data-workspace-name"
    operator: "Equal"
    value: "<workspace>"  # From spec.workspace or CR name
    effect: "NoSchedule"

Azure Spot Toleration (when cloud=AZURE):

tolerations:
  - key: "kubernetes.azure.com/scalesetpriority"
    operator: "Equal"
    value: "spot"
    effect: "NoSchedule"

1.4 Example: Using Your Existing Node Taints

If your cluster already has tainted nodes:

# Your existing node setup
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
# NAME        TAINTS
# worker-1    [dedicated=analytics:NoSchedule]
# worker-2    [dedicated=analytics:NoSchedule]

Configure the CR to match:

apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
  name: analytics-prod
spec:
  tolerations:
    - key: "dedicated"
      operator: "Equal"
      value: "analytics"
      effect: "NoSchedule"

2. Node Selectors

2.1 Using Custom Node Selectors

Specify any node selectors to target specific nodes:

apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
  name: analytics-prod
spec:
  nodeSelector:
    node-pool: "e6data-storage"
    instance-type: "memory-optimized"
    topology.kubernetes.io/zone: "us-east-1a"

Common node selector patterns:

# Target specific node pool
nodeSelector:
  node-pool: "analytics"

# Target by instance type
nodeSelector:
  node.kubernetes.io/instance-type: "r5.4xlarge"

# Target by zone
nodeSelector:
  topology.kubernetes.io/zone: "us-west-2a"

# Multiple selectors (AND logic)
nodeSelector:
  team: "data-platform"
  environment: "production"

2.2 Automatic Node Selectors (GCP Only)

For GCP clusters, the operator automatically adds a workspace node selector:

nodeSelector:
  e6data-workspace-name: "<workspace>"

This is in addition to any custom selectors you provide.

2.3 Example: Target Your Existing Node Pool

If you have labeled nodes:

# Your existing node labels
kubectl get nodes --show-labels | grep node-pool
# worker-1   node-pool=analytics-storage
# worker-2   node-pool=analytics-storage

Configure the CR to match:

apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
  name: analytics-prod
spec:
  nodeSelector:
    node-pool: "analytics-storage"

3. Karpenter Integration

3.1 Provisioner Affinity

When using Karpenter for node auto-provisioning, specify the provisioner name:

apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
  name: analytics-prod
spec:
  karpenterNodePool: "e6data-storage"

The operator adds node affinity for the Karpenter provisioner:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: "karpenter.sh/nodepool"
              operator: "In"
              values: ["e6data-storage"]

3.2 Karpenter NodePool Example

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: e6data-storage
spec:
  template:
    spec:
      taints:
        - key: "e6data-workspace-name"
          value: "analytics-prod"
          effect: "NoSchedule"
      requirements:
        - key: "kubernetes.io/arch"
          operator: In
          values: ["amd64"]
        - key: "karpenter.sh/capacity-type"
          operator: In
          values: ["on-demand"]
        - key: "node.kubernetes.io/instance-type"
          operator: In
          values: ["r5.4xlarge", "r5.8xlarge", "r6i.4xlarge", "r6i.8xlarge"]

4. ServiceAccount Requirements

4.1 ServiceAccount Naming

The operator creates or uses a ServiceAccount for each MetadataServices/QueryService:

spec.serviceAccount Resulting ServiceAccount Name
Not specified Uses CR name (metadata.name)
Specified Uses specified value

4.2 Auto-Created RBAC

By default (autoCreateRBAC: true), the operator creates:

  1. ServiceAccount with the appropriate name
  2. Role with minimal permissions
  3. RoleBinding linking them

Disable auto-creation:

apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
  name: analytics-prod
spec:
  autoCreateRBAC: false  # You must create SA and RBAC manually
  serviceAccount: my-custom-sa

4.3 Required Cloud IAM Permissions

The ServiceAccount needs cloud storage access. Configure via:

AWS IRSA

apiVersion: v1
kind: ServiceAccount
metadata:
  name: analytics-prod  # Must match CR name or spec.serviceAccount
  namespace: workspace-prod
  annotations:
    eks.amazonaws.com/role-arn: "arn:aws:iam::123456789012:role/e6data-storage-role"

Required IAM Policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::e6data-bucket",
        "arn:aws:s3:::e6data-bucket/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "glue:GetDatabase",
        "glue:GetDatabases",
        "glue:GetTable",
        "glue:GetTables",
        "glue:GetPartitions",
        "glue:BatchGetPartition"
      ],
      "Resource": "*"
    }
  ]
}

GCP Workload Identity

apiVersion: v1
kind: ServiceAccount
metadata:
  name: analytics-prod
  namespace: workspace-prod
  annotations:
    iam.gke.io/gcp-service-account: "e6data-sa@project-id.iam.gserviceaccount.com"

Required GCP Roles:

  • roles/storage.objectViewer (read)
  • roles/storage.objectCreator (write)
  • roles/bigquery.dataViewer (for BigQuery catalogs)

Azure Workload Identity

apiVersion: v1
kind: ServiceAccount
metadata:
  name: analytics-prod
  namespace: workspace-prod
  annotations:
    azure.workload.identity/client-id: "12345678-1234-1234-1234-123456789012"
  labels:
    azure.workload.identity/use: "true"

Required Azure Roles:

  • Storage Blob Data Reader (read)
  • Storage Blob Data Contributor (write)

5. Complete Node Setup Example

5.1 AWS EKS Setup

# 1. Create node group with taints
eksctl create nodegroup \
  --cluster my-cluster \
  --name e6data-storage \
  --node-type r5.4xlarge \
  --nodes 3 \
  --taints e6data-workspace-name=analytics-prod:NoSchedule

# 2. Create IAM role for IRSA
aws iam create-role \
  --role-name e6data-storage-role \
  --assume-role-policy-document file://trust-policy.json

# 3. Associate with ServiceAccount
eksctl create iamserviceaccount \
  --name analytics-prod \
  --namespace workspace-prod \
  --cluster my-cluster \
  --role-name e6data-storage-role \
  --approve

5.2 GCP GKE Setup

# 1. Create node pool with taints
gcloud container node-pools create e6data-storage \
  --cluster my-cluster \
  --machine-type n2-highmem-16 \
  --num-nodes 3 \
  --node-taints e6data-workspace-name=analytics-prod:NoSchedule \
  --node-labels e6data-workspace-name=analytics-prod

# 2. Setup Workload Identity
gcloud iam service-accounts create e6data-sa

gcloud iam service-accounts add-iam-policy-binding \
  e6data-sa@project-id.iam.gserviceaccount.com \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:project-id.svc.id.goog[workspace-prod/analytics-prod]"

# 3. Grant storage access
gcloud storage buckets add-iam-policy-binding gs://e6data-bucket \
  --member "serviceAccount:e6data-sa@project-id.iam.gserviceaccount.com" \
  --role roles/storage.objectViewer

5.3 Azure AKS Setup

# 1. Create node pool with taints
az aks nodepool add \
  --cluster-name my-cluster \
  --name e6datastorage \
  --node-vm-size Standard_E16s_v4 \
  --node-count 3 \
  --node-taints e6data-workspace-name=analytics-prod:NoSchedule

# 2. Enable Workload Identity
az aks update \
  --name my-cluster \
  --enable-oidc-issuer \
  --enable-workload-identity

# 3. Create federated credential
az identity federated-credential create \
  --name e6data-federated \
  --identity-name e6data-identity \
  --issuer $(az aks show --name my-cluster --query "oidcIssuerProfile.issuerUrl" -o tsv) \
  --subject system:serviceaccount:workspace-prod:analytics-prod

6. Troubleshooting

6.1 Pods Stuck in Pending

# Check pod events
kubectl describe pod -l app.kubernetes.io/name=storage -n workspace-prod

# Common issues:
# - No nodes with matching taint
# - Insufficient resources
# - Missing node labels (GCP)

Fix: Verify taints and labels:

kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
kubectl get nodes --show-labels | grep e6data-workspace-name

6.2 Permission Denied on S3/GCS

# Check ServiceAccount annotations
kubectl get sa analytics-prod -n workspace-prod -o yaml

# Test IRSA (AWS)
kubectl run test-aws --rm -it --restart=Never \
  --serviceaccount=analytics-prod \
  --image=amazon/aws-cli \
  -- s3 ls s3://e6data-bucket

6.3 Wrong Workspace Toleration

# Check pod tolerations
kubectl get pod <pod-name> -n workspace-prod -o jsonpath='{.spec.tolerations}' | jq

# Verify CR workspace field
kubectl get metadataservices analytics-prod -n workspace-prod -o jsonpath='{.spec.workspace}'

7. Best Practices

  1. Use consistent naming: Keep CR name, workspace, and ServiceAccount names aligned
  2. Dedicated node pools: Create separate node pools for e6data workloads
  3. Resource isolation: Use taints to prevent other workloads from scheduling on e6data nodes
  4. Least privilege IAM: Grant only required cloud storage permissions
  5. Monitor node resources: Ensure nodes have sufficient memory for storage/schema services