AWS EKS Prerequisites¶

This guide covers all AWS-specific prerequisites for deploying the e6data Kubernetes Operator on Amazon EKS.

Quick Reference¶

Requirement	Status	Notes
EKS 1.24+	Required	Kubernetes cluster
OIDC Provider	Required	For IRSA or Pod Identity
Pod Identity Agent	Recommended	Simpler than IRSA (except GreptimeDB)
S3 Bucket	Required	Data lake storage
IAM Policies	Required	Least-privilege access
Karpenter 1.0+	Recommended	ARM64 Graviton instances
AWS Glue	Optional	If using Glue catalog

1. Authentication Options¶

e6data supports two authentication methods on AWS:

Method	Recommended For	Notes
EKS Pod Identity	All workloads except GreptimeDB	Simpler setup, no OIDC annotation needed
IRSA	GreptimeDB, legacy clusters	Required for GreptimeDB (uses AWS SDK directly)

1.1 EKS Pod Identity (Recommended)¶

EKS Pod Identity is the recommended approach for e6data workloads. It's simpler to configure and doesn't require ServiceAccount annotations.

Note: GreptimeDB requires IRSA because it uses the AWS SDK directly for S3 access.

Step 1: Install Pod Identity Agent¶

# Check if Pod Identity Agent is installed
aws eks describe-addon --cluster-name YOUR_CLUSTER --addon-name eks-pod-identity-agent

# If not installed, add it
aws eks create-addon \
  --cluster-name YOUR_CLUSTER \
  --addon-name eks-pod-identity-agent \
  --addon-version v1.3.2-eksbuild.2

# Wait for addon to be active
aws eks wait addon-active \
  --cluster-name YOUR_CLUSTER \
  --addon-name eks-pod-identity-agent

Step 2: Create IAM Role for Pod Identity¶

Create pod-identity-trust-policy.json:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "pods.eks.amazonaws.com"
      },
      "Action": [
        "sts:AssumeRole",
        "sts:TagSession"
      ]
    }
  ]
}

Create the role:

aws iam create-role \
  --role-name e6data-workspace-role \
  --assume-role-policy-document file://pod-identity-trust-policy.json \
  --description "IAM role for e6data workspace pods"

Step 3: Create Pod Identity Association¶

# Create association for your workspace namespace
aws eks create-pod-identity-association \
  --cluster-name YOUR_CLUSTER \
  --namespace workspace-prod \
  --service-account analytics-prod \
  --role-arn arn:aws:iam::ACCOUNT_ID:role/e6data-workspace-role

# Verify
aws eks list-pod-identity-associations --cluster-name YOUR_CLUSTER

1.2 IRSA (Required for GreptimeDB)¶

IRSA is required for GreptimeDB and can be used for other workloads if preferred.

Step 1: Verify OIDC Provider¶

# Get OIDC issuer URL
OIDC_URL=$(aws eks describe-cluster --name YOUR_CLUSTER \
  --query "cluster.identity.oidc.issuer" --output text)

echo "OIDC URL: $OIDC_URL"

# Extract OIDC ID
OIDC_ID=$(echo $OIDC_URL | cut -d'/' -f5)
echo "OIDC ID: $OIDC_ID"

# Check if provider exists
aws iam list-open-id-connect-providers | grep $OIDC_ID

If OIDC provider doesn't exist:

eksctl utils associate-iam-oidc-provider \
  --cluster YOUR_CLUSTER \
  --approve

Step 2: Create IRSA Trust Policy¶

Create irsa-trust-policy.json (replace placeholders):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/oidc.eks.REGION.amazonaws.com/id/OIDC_ID"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.REGION.amazonaws.com/id/OIDC_ID:aud": "sts.amazonaws.com",
          "oidc.eks.REGION.amazonaws.com/id/OIDC_ID:sub": "system:serviceaccount:NAMESPACE:SERVICE_ACCOUNT"
        }
      }
    }
  ]
}

For multiple namespaces, use StringLike with wildcards:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/oidc.eks.REGION.amazonaws.com/id/OIDC_ID"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringLike": {
          "oidc.eks.REGION.amazonaws.com/id/OIDC_ID:sub": "system:serviceaccount:workspace-*:*"
        },
        "StringEquals": {
          "oidc.eks.REGION.amazonaws.com/id/OIDC_ID:aud": "sts.amazonaws.com"
        }
      }
    }
  ]
}

Step 3: Annotate ServiceAccount¶

apiVersion: v1
kind: ServiceAccount
metadata:
  name: analytics-prod
  namespace: workspace-prod
  annotations:
    eks.amazonaws.com/role-arn: "arn:aws:iam::ACCOUNT_ID:role/e6data-workspace-role"

2. IAM Policies (Least Privilege)¶

2.1 Base S3 Policy (Required)¶

Create e6data-s3-policy.json:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "S3ReadAccess",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:GetObjectVersion",
        "s3:GetObjectTagging",
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::YOUR-DATA-BUCKET",
        "arn:aws:s3:::YOUR-DATA-BUCKET/*"
      ]
    },
    {
      "Sid": "S3WriteAccess",
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::YOUR-DATA-BUCKET/e6data-cache/*",
        "arn:aws:s3:::YOUR-DATA-BUCKET/e6data-metadata/*"
      ],
      "Condition": {
        "StringLike": {
          "s3:prefix": [
            "e6data-cache/*",
            "e6data-metadata/*"
          ]
        }
      }
    }
  ]
}

2.2 AWS Glue Policy (If Using Glue Catalog)¶

Create e6data-glue-policy.json:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "GlueCatalogReadAccess",
      "Effect": "Allow",
      "Action": [
        "glue:GetDatabase",
        "glue:GetDatabases",
        "glue:GetTable",
        "glue:GetTables",
        "glue:GetTableVersion",
        "glue:GetTableVersions",
        "glue:GetPartition",
        "glue:GetPartitions",
        "glue:BatchGetPartition",
        "glue:GetCatalogImportStatus"
      ],
      "Resource": [
        "arn:aws:glue:REGION:ACCOUNT_ID:catalog",
        "arn:aws:glue:REGION:ACCOUNT_ID:database/*",
        "arn:aws:glue:REGION:ACCOUNT_ID:table/*/*"
      ]
    }
  ]
}

Restrict to specific databases (recommended for production):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "GlueCatalogReadAccess",
      "Effect": "Allow",
      "Action": [
        "glue:GetDatabase",
        "glue:GetDatabases",
        "glue:GetTable",
        "glue:GetTables",
        "glue:GetTableVersion",
        "glue:GetTableVersions",
        "glue:GetPartition",
        "glue:GetPartitions",
        "glue:BatchGetPartition"
      ],
      "Resource": [
        "arn:aws:glue:REGION:ACCOUNT_ID:catalog",
        "arn:aws:glue:REGION:ACCOUNT_ID:database/analytics_db",
        "arn:aws:glue:REGION:ACCOUNT_ID:database/sales_db",
        "arn:aws:glue:REGION:ACCOUNT_ID:table/analytics_db/*",
        "arn:aws:glue:REGION:ACCOUNT_ID:table/sales_db/*"
      ]
    }
  ]
}

2.3 GreptimeDB S3 Policy (For MonitoringServices)¶

GreptimeDB needs write access to its own bucket:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "GreptimeS3Access",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::YOUR-GREPTIME-BUCKET",
        "arn:aws:s3:::YOUR-GREPTIME-BUCKET/*"
      ]
    }
  ]
}

2.4 Create and Attach Policies¶

# Create S3 policy
aws iam create-policy \
  --policy-name e6data-s3-policy \
  --policy-document file://e6data-s3-policy.json

# Create Glue policy (if using Glue)
aws iam create-policy \
  --policy-name e6data-glue-policy \
  --policy-document file://e6data-glue-policy.json

# Attach policies to role
aws iam attach-role-policy \
  --role-name e6data-workspace-role \
  --policy-arn arn:aws:iam::ACCOUNT_ID:policy/e6data-s3-policy

aws iam attach-role-policy \
  --role-name e6data-workspace-role \
  --policy-arn arn:aws:iam::ACCOUNT_ID:policy/e6data-glue-policy

3. Karpenter Setup (Recommended)¶

Karpenter provides dynamic node provisioning. We recommend ARM64 (Graviton) instances for cost-effectiveness.

3.1 Install Karpenter¶

export KARPENTER_VERSION="1.0.0"
export CLUSTER_NAME="YOUR_CLUSTER"
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"
export CLUSTER_ENDPOINT="$(aws eks describe-cluster --name ${CLUSTER_NAME} --query "cluster.endpoint" --output text)"

helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \
  --version ${KARPENTER_VERSION} \
  --namespace karpenter --create-namespace \
  --set settings.clusterName=${CLUSTER_NAME} \
  --set settings.clusterEndpoint=${CLUSTER_ENDPOINT} \
  --set settings.interruptionQueue=${CLUSTER_NAME} \
  --wait

3.2 NodePool for e6data (ARM64 Graviton - Recommended)¶

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: e6data-compute
spec:
  template:
    metadata:
      labels:
        e6data.io/node-type: compute
    spec:
      requirements:
        # Prefer ARM64 (Graviton) for cost savings
        - key: kubernetes.io/arch
          operator: In
          values: ["arm64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        # Use instance families, not specific types
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values:
            - r7g    # Graviton3 memory-optimized (best price/performance)
            - r6g    # Graviton2 memory-optimized
            - m7g    # Graviton3 general purpose
            - m6g    # Graviton2 general purpose
        # Instance size range
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values:
            - 4xlarge
            - 8xlarge
            - 12xlarge
            - 16xlarge

      # Taints for workload isolation
      taints:
        - key: e6data-workspace-name
          value: "prod"
          effect: NoSchedule

      # Node expiry
      expireAfter: 720h  # 30 days

      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: e6data-graviton

  # Resource limits
  limits:
    cpu: 2000
    memory: 8000Gi

  # Disruption settings
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 5m
    budgets:
      - nodes: "10%"

3.3 EC2NodeClass for Graviton¶

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: e6data-graviton
spec:
  amiSelectorTerms:
    - alias: al2023@latest

  role: KarpenterNodeRole-YOUR_CLUSTER

  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: YOUR_CLUSTER

  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: YOUR_CLUSTER

  # Block devices
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
        iops: 3000
        throughput: 125
        deleteOnTermination: true
        encrypted: true

  # Instance store for NVMe cache (optional, for instances with local storage)
  instanceStorePolicy: RAID0

  # Metadata options
  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: disabled
    httpPutResponseHopLimit: 2
    httpTokens: required  # IMDSv2

  tags:
    Environment: production
    Team: data-platform
    ManagedBy: karpenter

3.4 NodePool for AMD64 (If ARM64 Not Suitable)¶

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: e6data-compute-amd64
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values:
            - r7i    # Intel memory-optimized
            - r6i    # Intel memory-optimized
            - r5     # Intel memory-optimized (older)
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values:
            - 4xlarge
            - 8xlarge
            - 12xlarge
            - 16xlarge

      taints:
        - key: e6data-workspace-name
          value: "prod"
          effect: NoSchedule

      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: e6data-amd64

  limits:
    cpu: 1000
    memory: 4000Gi

4. Verification¶

4.1 Test Pod Identity¶

# Create test pod
kubectl run test-pod-identity --rm -it --restart=Never \
  --namespace=workspace-prod \
  --serviceaccount=analytics-prod \
  --image=amazon/aws-cli \
  -- sts get-caller-identity

# Expected: Shows assumed role ARN

4.2 Test S3 Access¶

kubectl run test-s3 --rm -it --restart=Never \
  --namespace=workspace-prod \
  --serviceaccount=analytics-prod \
  --image=amazon/aws-cli \
  -- s3 ls s3://YOUR-DATA-BUCKET/ --max-items 5

4.3 Test Glue Access¶

kubectl run test-glue --rm -it --restart=Never \
  --namespace=workspace-prod \
  --serviceaccount=analytics-prod \
  --image=amazon/aws-cli \
  -- glue get-databases

4.4 Verify Karpenter¶

# Check Karpenter pods
kubectl get pods -n karpenter

# Check NodePools
kubectl get nodepools

# Check EC2NodeClasses
kubectl get ec2nodeclasses

# Watch provisioned nodes
kubectl get nodes -l karpenter.sh/nodepool=e6data-compute -w

5. Best Practices¶

5.1 Security¶

Use Pod Identity for most workloads (simpler, more secure)
Use IRSA only for GreptimeDB
Least privilege: Only grant required S3 buckets and Glue databases
IMDSv2: Always use httpTokens: required in EC2NodeClass
Encryption: Enable EBS encryption

5.2 Cost Optimization¶

ARM64 (Graviton): 20-40% cheaper than comparable x86 instances
Spot instances: Use for fault-tolerant workloads
Instance families: Let Karpenter choose optimal size within family
Consolidation: Enable WhenEmptyOrUnderutilized for auto-rightsizing

5.3 Performance¶

Instance size: Use 4xlarge or larger for query executors
Memory-optimized: Use r-family instances for data workloads
NVMe: Use instances with local NVMe for caching (r5d, r6gd)

6. S3 Bucket Setup¶

e6data requires an S3 bucket for metadata storage. Follow these security best practices when creating the bucket.

6.1 Create Bucket¶

# Set variables
BUCKET_NAME="e6-workspace-metadata"
REGION="us-east-1"

# Create bucket (us-east-1 doesn't need LocationConstraint)
aws s3api create-bucket \
  --bucket ${BUCKET_NAME} \
  --region ${REGION}

# For other regions, use:
# aws s3api create-bucket \
#   --bucket ${BUCKET_NAME} \
#   --region ${REGION} \
#   --create-bucket-configuration LocationConstraint=${REGION}

6.2 Enable Security Settings¶

# Block all public access
aws s3api put-public-access-block \
  --bucket ${BUCKET_NAME} \
  --public-access-block-configuration '{
    "BlockPublicAcls": true,
    "IgnorePublicAcls": true,
    "BlockPublicPolicy": true,
    "RestrictPublicBuckets": true
  }'

# Enable server-side encryption (AES256)
aws s3api put-bucket-encryption \
  --bucket ${BUCKET_NAME} \
  --server-side-encryption-configuration '{
    "Rules": [{
      "ApplyServerSideEncryptionByDefault": {
        "SSEAlgorithm": "AES256"
      }
    }]
  }'

6.3 Add Bucket Policy¶

Deny any requests that don't use HTTPS:

aws s3api put-bucket-policy \
  --bucket ${BUCKET_NAME} \
  --policy '{
    "Version": "2012-10-17",
    "Statement": [{
      "Sid": "DenyInsecureTransport",
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:*",
      "Resource": [
        "arn:aws:s3:::'${BUCKET_NAME}'",
        "arn:aws:s3:::'${BUCKET_NAME}'/*"
      ],
      "Condition": {
        "Bool": {
          "aws:SecureTransport": "false"
        }
      }
    }]
  }'

6.4 Optional: VPC Endpoint Restriction¶

For enhanced security, restrict bucket access to your VPC endpoint:

aws s3api put-bucket-policy \
  --bucket ${BUCKET_NAME} \
  --policy '{
    "Version": "2012-10-17",
    "Statement": [
      {
        "Sid": "AllowAccessFromVPCEOnly",
        "Effect": "Allow",
        "Principal": "*",
        "Action": "s3:*",
        "Resource": [
          "arn:aws:s3:::'${BUCKET_NAME}'",
          "arn:aws:s3:::'${BUCKET_NAME}'/*"
        ],
        "Condition": {
          "StringEquals": {
            "aws:SourceVpce": "YOUR_VPC_ENDPOINT_ID"
          }
        }
      },
      {
        "Sid": "DenyOutsideVPCE",
        "Effect": "Deny",
        "Principal": "*",
        "Action": "s3:*",
        "Resource": [
          "arn:aws:s3:::'${BUCKET_NAME}'",
          "arn:aws:s3:::'${BUCKET_NAME}'/*"
        ],
        "Condition": {
          "StringNotEquals": {
            "aws:SourceVpce": "YOUR_VPC_ENDPOINT_ID"
          }
        }
      }
    ]
  }'

7. EC2NodeClass with NVMe Instance Store¶

For instances with NVMe instance store (like c7gd, r7gd, i8g), configure the userData script to automatically set up the local storage.

7.1 NVMe RAID0 Configuration¶

This userData script automatically detects NVMe instance store drives and configures them as RAID0 for optimal performance:

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: e6data-nvme
spec:
  amiSelectorTerms:
    - alias: al2023@latest

  role: KarpenterNodeRole-YOUR_CLUSTER

  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: YOUR_CLUSTER

  securityGroupSelectorTerms:
    - tags:
        aws:eks:cluster-name: YOUR_CLUSTER

  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3

  kubelet:
    maxPods: 18

  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: disabled
    httpPutResponseHopLimit: 1
    httpTokens: required

  userData: |
    mount_location="/app/tmp"
    mkdir -p $mount_location
    yum install nvme-cli -y

    # Check if NVMe instance store drives are present
    if nvme list | grep -q "Amazon EC2 NVMe Instance Storage"; then
        nvme_drives=$(nvme list | grep "Amazon EC2 NVMe Instance Storage" | cut -d " " -f 1 || true)
        readarray -t nvme_drives <<< "$nvme_drives"
        num_drives=${#nvme_drives[@]}

        if [ $num_drives -gt 1 ]; then
            # Multiple NVMe drives - create RAID0 array for maximum performance
            yum install mdadm -y
            mdadm --create /dev/md0 --level=0 --name=md0 --raid-devices=$num_drives "${nvme_drives[@]}"
            mkfs.ext4 /dev/md0
            mount /dev/md0 $mount_location
            mdadm --detail --scan >> /etc/mdadm.conf
            echo /dev/md0 $mount_location ext4 defaults,noatime 0 2 >> /etc/fstab
        else
            # Single NVMe drive - format and mount directly
            for disk in "${nvme_drives[@]}"; do
                mkfs.ext4 -F $disk
                mount $disk $mount_location
                echo $disk $mount_location ext4 defaults,noatime 0 2 >> /etc/fstab
            done
        fi
    else
        echo "No NVMe drives detected. Skipping NVMe configuration."
    fi

    chmod 777 $mount_location

  tags:
    Environment: production
    ManagedBy: karpenter

7.2 Recommended Instance Families with NVMe¶

Instance Family	Architecture	NVMe Storage	Use Case
c7gd	ARM64	Yes	Compute-intensive workloads
r7gd	ARM64	Yes	Memory-intensive workloads
i8g	ARM64	Yes	Storage-intensive workloads
c6gd	ARM64	Yes	General compute
r6gd	ARM64	Yes	General memory
m7gd	ARM64	Yes	Balanced workloads

7.3 NodePool for NVMe Instances¶

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: e6data-nvme-pool
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["arm64"]
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values:
            - c7gd
            - r7gd
            - i8g
            - m7gd
        - key: karpenter.k8s.aws/instance-size
          operator: NotIn
          values:
            - metal
        - key: karpenter.sh/capacity-type
          operator: In
          values:
            - spot
            - on-demand

      taints:
        - key: workspace-name
          value: "YOUR_WORKSPACE"
          effect: NoSchedule

      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: e6data-nvme

  limits:
    cpu: 5000

  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 30s