AWS EKS Prerequisites¶
This guide covers all AWS-specific prerequisites for deploying the e6data Kubernetes Operator on Amazon EKS.
Quick Reference¶
| Requirement | Status | Notes |
|---|---|---|
| EKS 1.24+ | Required | Kubernetes cluster |
| OIDC Provider | Required | For IRSA or Pod Identity |
| Pod Identity Agent | Recommended | Simpler than IRSA (except GreptimeDB) |
| S3 Bucket | Required | Data lake storage |
| IAM Policies | Required | Least-privilege access |
| Karpenter 1.0+ | Recommended | ARM64 Graviton instances |
| AWS Glue | Optional | If using Glue catalog |
1. Authentication Options¶
e6data supports two authentication methods on AWS:
| Method | Recommended For | Notes |
|---|---|---|
| EKS Pod Identity | All workloads except GreptimeDB | Simpler setup, no OIDC annotation needed |
| IRSA | GreptimeDB, legacy clusters | Required for GreptimeDB (uses AWS SDK directly) |
1.1 EKS Pod Identity (Recommended)¶
EKS Pod Identity is the recommended approach for e6data workloads. It's simpler to configure and doesn't require ServiceAccount annotations.
Note: GreptimeDB requires IRSA because it uses the AWS SDK directly for S3 access.
Step 1: Install Pod Identity Agent¶
# Check if Pod Identity Agent is installed
aws eks describe-addon --cluster-name YOUR_CLUSTER --addon-name eks-pod-identity-agent
# If not installed, add it
aws eks create-addon \
--cluster-name YOUR_CLUSTER \
--addon-name eks-pod-identity-agent \
--addon-version v1.3.2-eksbuild.2
# Wait for addon to be active
aws eks wait addon-active \
--cluster-name YOUR_CLUSTER \
--addon-name eks-pod-identity-agent
Step 2: Create IAM Role for Pod Identity¶
Create pod-identity-trust-policy.json:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "pods.eks.amazonaws.com"
},
"Action": [
"sts:AssumeRole",
"sts:TagSession"
]
}
]
}
Create the role:
aws iam create-role \
--role-name e6data-workspace-role \
--assume-role-policy-document file://pod-identity-trust-policy.json \
--description "IAM role for e6data workspace pods"
Step 3: Create Pod Identity Association¶
# Create association for your workspace namespace
aws eks create-pod-identity-association \
--cluster-name YOUR_CLUSTER \
--namespace workspace-prod \
--service-account analytics-prod \
--role-arn arn:aws:iam::ACCOUNT_ID:role/e6data-workspace-role
# Verify
aws eks list-pod-identity-associations --cluster-name YOUR_CLUSTER
1.2 IRSA (Required for GreptimeDB)¶
IRSA is required for GreptimeDB and can be used for other workloads if preferred.
Step 1: Verify OIDC Provider¶
# Get OIDC issuer URL
OIDC_URL=$(aws eks describe-cluster --name YOUR_CLUSTER \
--query "cluster.identity.oidc.issuer" --output text)
echo "OIDC URL: $OIDC_URL"
# Extract OIDC ID
OIDC_ID=$(echo $OIDC_URL | cut -d'/' -f5)
echo "OIDC ID: $OIDC_ID"
# Check if provider exists
aws iam list-open-id-connect-providers | grep $OIDC_ID
If OIDC provider doesn't exist:
Step 2: Create IRSA Trust Policy¶
Create irsa-trust-policy.json (replace placeholders):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/oidc.eks.REGION.amazonaws.com/id/OIDC_ID"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.REGION.amazonaws.com/id/OIDC_ID:aud": "sts.amazonaws.com",
"oidc.eks.REGION.amazonaws.com/id/OIDC_ID:sub": "system:serviceaccount:NAMESPACE:SERVICE_ACCOUNT"
}
}
}
]
}
For multiple namespaces, use StringLike with wildcards:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/oidc.eks.REGION.amazonaws.com/id/OIDC_ID"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringLike": {
"oidc.eks.REGION.amazonaws.com/id/OIDC_ID:sub": "system:serviceaccount:workspace-*:*"
},
"StringEquals": {
"oidc.eks.REGION.amazonaws.com/id/OIDC_ID:aud": "sts.amazonaws.com"
}
}
}
]
}
Step 3: Annotate ServiceAccount¶
apiVersion: v1
kind: ServiceAccount
metadata:
name: analytics-prod
namespace: workspace-prod
annotations:
eks.amazonaws.com/role-arn: "arn:aws:iam::ACCOUNT_ID:role/e6data-workspace-role"
2. IAM Policies (Least Privilege)¶
2.1 Base S3 Policy (Required)¶
Create e6data-s3-policy.json:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "S3ReadAccess",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObjectVersion",
"s3:GetObjectTagging",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::YOUR-DATA-BUCKET",
"arn:aws:s3:::YOUR-DATA-BUCKET/*"
]
},
{
"Sid": "S3WriteAccess",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::YOUR-DATA-BUCKET/e6data-cache/*",
"arn:aws:s3:::YOUR-DATA-BUCKET/e6data-metadata/*"
],
"Condition": {
"StringLike": {
"s3:prefix": [
"e6data-cache/*",
"e6data-metadata/*"
]
}
}
}
]
}
2.2 AWS Glue Policy (If Using Glue Catalog)¶
Create e6data-glue-policy.json:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "GlueCatalogReadAccess",
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"glue:GetDatabases",
"glue:GetTable",
"glue:GetTables",
"glue:GetTableVersion",
"glue:GetTableVersions",
"glue:GetPartition",
"glue:GetPartitions",
"glue:BatchGetPartition",
"glue:GetCatalogImportStatus"
],
"Resource": [
"arn:aws:glue:REGION:ACCOUNT_ID:catalog",
"arn:aws:glue:REGION:ACCOUNT_ID:database/*",
"arn:aws:glue:REGION:ACCOUNT_ID:table/*/*"
]
}
]
}
Restrict to specific databases (recommended for production):
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "GlueCatalogReadAccess",
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"glue:GetDatabases",
"glue:GetTable",
"glue:GetTables",
"glue:GetTableVersion",
"glue:GetTableVersions",
"glue:GetPartition",
"glue:GetPartitions",
"glue:BatchGetPartition"
],
"Resource": [
"arn:aws:glue:REGION:ACCOUNT_ID:catalog",
"arn:aws:glue:REGION:ACCOUNT_ID:database/analytics_db",
"arn:aws:glue:REGION:ACCOUNT_ID:database/sales_db",
"arn:aws:glue:REGION:ACCOUNT_ID:table/analytics_db/*",
"arn:aws:glue:REGION:ACCOUNT_ID:table/sales_db/*"
]
}
]
}
2.3 GreptimeDB S3 Policy (For MonitoringServices)¶
GreptimeDB needs write access to its own bucket:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "GreptimeS3Access",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::YOUR-GREPTIME-BUCKET",
"arn:aws:s3:::YOUR-GREPTIME-BUCKET/*"
]
}
]
}
2.4 Create and Attach Policies¶
# Create S3 policy
aws iam create-policy \
--policy-name e6data-s3-policy \
--policy-document file://e6data-s3-policy.json
# Create Glue policy (if using Glue)
aws iam create-policy \
--policy-name e6data-glue-policy \
--policy-document file://e6data-glue-policy.json
# Attach policies to role
aws iam attach-role-policy \
--role-name e6data-workspace-role \
--policy-arn arn:aws:iam::ACCOUNT_ID:policy/e6data-s3-policy
aws iam attach-role-policy \
--role-name e6data-workspace-role \
--policy-arn arn:aws:iam::ACCOUNT_ID:policy/e6data-glue-policy
3. Karpenter Setup (Recommended)¶
Karpenter provides dynamic node provisioning. We recommend ARM64 (Graviton) instances for cost-effectiveness.
3.1 Install Karpenter¶
export KARPENTER_VERSION="1.0.0"
export CLUSTER_NAME="YOUR_CLUSTER"
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"
export CLUSTER_ENDPOINT="$(aws eks describe-cluster --name ${CLUSTER_NAME} --query "cluster.endpoint" --output text)"
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \
--version ${KARPENTER_VERSION} \
--namespace karpenter --create-namespace \
--set settings.clusterName=${CLUSTER_NAME} \
--set settings.clusterEndpoint=${CLUSTER_ENDPOINT} \
--set settings.interruptionQueue=${CLUSTER_NAME} \
--wait
3.2 NodePool for e6data (ARM64 Graviton - Recommended)¶
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: e6data-compute
spec:
template:
metadata:
labels:
e6data.io/node-type: compute
spec:
requirements:
# Prefer ARM64 (Graviton) for cost savings
- key: kubernetes.io/arch
operator: In
values: ["arm64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
# Use instance families, not specific types
- key: karpenter.k8s.aws/instance-family
operator: In
values:
- r7g # Graviton3 memory-optimized (best price/performance)
- r6g # Graviton2 memory-optimized
- m7g # Graviton3 general purpose
- m6g # Graviton2 general purpose
# Instance size range
- key: karpenter.k8s.aws/instance-size
operator: In
values:
- 4xlarge
- 8xlarge
- 12xlarge
- 16xlarge
# Taints for workload isolation
taints:
- key: e6data-workspace-name
value: "prod"
effect: NoSchedule
# Node expiry
expireAfter: 720h # 30 days
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: e6data-graviton
# Resource limits
limits:
cpu: 2000
memory: 8000Gi
# Disruption settings
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 5m
budgets:
- nodes: "10%"
3.3 EC2NodeClass for Graviton¶
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: e6data-graviton
spec:
amiSelectorTerms:
- alias: al2023@latest
role: KarpenterNodeRole-YOUR_CLUSTER
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: YOUR_CLUSTER
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: YOUR_CLUSTER
# Block devices
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 100Gi
volumeType: gp3
iops: 3000
throughput: 125
deleteOnTermination: true
encrypted: true
# Instance store for NVMe cache (optional, for instances with local storage)
instanceStorePolicy: RAID0
# Metadata options
metadataOptions:
httpEndpoint: enabled
httpProtocolIPv6: disabled
httpPutResponseHopLimit: 2
httpTokens: required # IMDSv2
tags:
Environment: production
Team: data-platform
ManagedBy: karpenter
3.4 NodePool for AMD64 (If ARM64 Not Suitable)¶
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: e6data-compute-amd64
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: karpenter.k8s.aws/instance-family
operator: In
values:
- r7i # Intel memory-optimized
- r6i # Intel memory-optimized
- r5 # Intel memory-optimized (older)
- key: karpenter.k8s.aws/instance-size
operator: In
values:
- 4xlarge
- 8xlarge
- 12xlarge
- 16xlarge
taints:
- key: e6data-workspace-name
value: "prod"
effect: NoSchedule
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: e6data-amd64
limits:
cpu: 1000
memory: 4000Gi
4. Verification¶
4.1 Test Pod Identity¶
# Create test pod
kubectl run test-pod-identity --rm -it --restart=Never \
--namespace=workspace-prod \
--serviceaccount=analytics-prod \
--image=amazon/aws-cli \
-- sts get-caller-identity
# Expected: Shows assumed role ARN
4.2 Test S3 Access¶
kubectl run test-s3 --rm -it --restart=Never \
--namespace=workspace-prod \
--serviceaccount=analytics-prod \
--image=amazon/aws-cli \
-- s3 ls s3://YOUR-DATA-BUCKET/ --max-items 5
4.3 Test Glue Access¶
kubectl run test-glue --rm -it --restart=Never \
--namespace=workspace-prod \
--serviceaccount=analytics-prod \
--image=amazon/aws-cli \
-- glue get-databases
4.4 Verify Karpenter¶
# Check Karpenter pods
kubectl get pods -n karpenter
# Check NodePools
kubectl get nodepools
# Check EC2NodeClasses
kubectl get ec2nodeclasses
# Watch provisioned nodes
kubectl get nodes -l karpenter.sh/nodepool=e6data-compute -w
5. Best Practices¶
5.1 Security¶
- Use Pod Identity for most workloads (simpler, more secure)
- Use IRSA only for GreptimeDB
- Least privilege: Only grant required S3 buckets and Glue databases
- IMDSv2: Always use
httpTokens: requiredin EC2NodeClass - Encryption: Enable EBS encryption
5.2 Cost Optimization¶
- ARM64 (Graviton): 20-40% cheaper than comparable x86 instances
- Spot instances: Use for fault-tolerant workloads
- Instance families: Let Karpenter choose optimal size within family
- Consolidation: Enable
WhenEmptyOrUnderutilizedfor auto-rightsizing
5.3 Performance¶
- Instance size: Use 4xlarge or larger for query executors
- Memory-optimized: Use r-family instances for data workloads
- NVMe: Use instances with local NVMe for caching (r5d, r6gd)
6. S3 Bucket Setup¶
e6data requires an S3 bucket for metadata storage. Follow these security best practices when creating the bucket.
6.1 Create Bucket¶
# Set variables
BUCKET_NAME="e6-workspace-metadata"
REGION="us-east-1"
# Create bucket (us-east-1 doesn't need LocationConstraint)
aws s3api create-bucket \
--bucket ${BUCKET_NAME} \
--region ${REGION}
# For other regions, use:
# aws s3api create-bucket \
# --bucket ${BUCKET_NAME} \
# --region ${REGION} \
# --create-bucket-configuration LocationConstraint=${REGION}
6.2 Enable Security Settings¶
# Block all public access
aws s3api put-public-access-block \
--bucket ${BUCKET_NAME} \
--public-access-block-configuration '{
"BlockPublicAcls": true,
"IgnorePublicAcls": true,
"BlockPublicPolicy": true,
"RestrictPublicBuckets": true
}'
# Enable server-side encryption (AES256)
aws s3api put-bucket-encryption \
--bucket ${BUCKET_NAME} \
--server-side-encryption-configuration '{
"Rules": [{
"ApplyServerSideEncryptionByDefault": {
"SSEAlgorithm": "AES256"
}
}]
}'
6.3 Add Bucket Policy¶
Deny any requests that don't use HTTPS:
aws s3api put-bucket-policy \
--bucket ${BUCKET_NAME} \
--policy '{
"Version": "2012-10-17",
"Statement": [{
"Sid": "DenyInsecureTransport",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::'${BUCKET_NAME}'",
"arn:aws:s3:::'${BUCKET_NAME}'/*"
],
"Condition": {
"Bool": {
"aws:SecureTransport": "false"
}
}
}]
}'
6.4 Optional: VPC Endpoint Restriction¶
For enhanced security, restrict bucket access to your VPC endpoint:
aws s3api put-bucket-policy \
--bucket ${BUCKET_NAME} \
--policy '{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowAccessFromVPCEOnly",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::'${BUCKET_NAME}'",
"arn:aws:s3:::'${BUCKET_NAME}'/*"
],
"Condition": {
"StringEquals": {
"aws:SourceVpce": "YOUR_VPC_ENDPOINT_ID"
}
}
},
{
"Sid": "DenyOutsideVPCE",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::'${BUCKET_NAME}'",
"arn:aws:s3:::'${BUCKET_NAME}'/*"
],
"Condition": {
"StringNotEquals": {
"aws:SourceVpce": "YOUR_VPC_ENDPOINT_ID"
}
}
}
]
}'
7. EC2NodeClass with NVMe Instance Store¶
For instances with NVMe instance store (like c7gd, r7gd, i8g), configure the userData script to automatically set up the local storage.
7.1 NVMe RAID0 Configuration¶
This userData script automatically detects NVMe instance store drives and configures them as RAID0 for optimal performance:
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: e6data-nvme
spec:
amiSelectorTerms:
- alias: al2023@latest
role: KarpenterNodeRole-YOUR_CLUSTER
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: YOUR_CLUSTER
securityGroupSelectorTerms:
- tags:
aws:eks:cluster-name: YOUR_CLUSTER
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 100Gi
volumeType: gp3
kubelet:
maxPods: 18
metadataOptions:
httpEndpoint: enabled
httpProtocolIPv6: disabled
httpPutResponseHopLimit: 1
httpTokens: required
userData: |
mount_location="/app/tmp"
mkdir -p $mount_location
yum install nvme-cli -y
# Check if NVMe instance store drives are present
if nvme list | grep -q "Amazon EC2 NVMe Instance Storage"; then
nvme_drives=$(nvme list | grep "Amazon EC2 NVMe Instance Storage" | cut -d " " -f 1 || true)
readarray -t nvme_drives <<< "$nvme_drives"
num_drives=${#nvme_drives[@]}
if [ $num_drives -gt 1 ]; then
# Multiple NVMe drives - create RAID0 array for maximum performance
yum install mdadm -y
mdadm --create /dev/md0 --level=0 --name=md0 --raid-devices=$num_drives "${nvme_drives[@]}"
mkfs.ext4 /dev/md0
mount /dev/md0 $mount_location
mdadm --detail --scan >> /etc/mdadm.conf
echo /dev/md0 $mount_location ext4 defaults,noatime 0 2 >> /etc/fstab
else
# Single NVMe drive - format and mount directly
for disk in "${nvme_drives[@]}"; do
mkfs.ext4 -F $disk
mount $disk $mount_location
echo $disk $mount_location ext4 defaults,noatime 0 2 >> /etc/fstab
done
fi
else
echo "No NVMe drives detected. Skipping NVMe configuration."
fi
chmod 777 $mount_location
tags:
Environment: production
ManagedBy: karpenter
7.2 Recommended Instance Families with NVMe¶
| Instance Family | Architecture | NVMe Storage | Use Case |
|---|---|---|---|
| c7gd | ARM64 | Yes | Compute-intensive workloads |
| r7gd | ARM64 | Yes | Memory-intensive workloads |
| i8g | ARM64 | Yes | Storage-intensive workloads |
| c6gd | ARM64 | Yes | General compute |
| r6gd | ARM64 | Yes | General memory |
| m7gd | ARM64 | Yes | Balanced workloads |
7.3 NodePool for NVMe Instances¶
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: e6data-nvme-pool
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["arm64"]
- key: karpenter.k8s.aws/instance-family
operator: In
values:
- c7gd
- r7gd
- i8g
- m7gd
- key: karpenter.k8s.aws/instance-size
operator: NotIn
values:
- metal
- key: karpenter.sh/capacity-type
operator: In
values:
- spot
- on-demand
taints:
- key: workspace-name
value: "YOUR_WORKSPACE"
effect: NoSchedule
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: e6data-nvme
limits:
cpu: 5000
disruption:
consolidationPolicy: WhenEmpty
consolidateAfter: 30s