AWS Complete Onboarding Guide¶
This guide walks through deploying the e6data operator and a workspace on AWS EKS, step by step. Follow these steps in order for a complete production deployment.
Prerequisites¶
Before starting, ensure you have:
| Requirement | Description |
|---|---|
| EKS Cluster | Kubernetes 1.24+ with OIDC provider enabled |
| Karpenter | v0.32+ installed and configured |
| EKS Pod Identity Agent | Addon installed on the cluster |
| AWS CLI | Configured with appropriate permissions |
| kubectl | Connected to your EKS cluster |
| Helm | v3.8+ installed |
Step 1: Create Operator NodePool and EC2NodeClass¶
Create dedicated nodes for the e6data operator with taints to isolate operator workloads.
# operator-nodepool.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: e6operator
spec:
disruption:
budgets:
- nodes: 10%
consolidateAfter: 30s
consolidationPolicy: WhenEmptyOrUnderutilized
limits:
cpu: "100"
memory: 100Gi
template:
metadata:
labels:
app: e6operator
spec:
expireAfter: 720h
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: e6operator
requirements:
- key: karpenter.sh/capacity-type
operator: In
values:
- spot
- on-demand
- key: kubernetes.io/arch
operator: In
values:
- arm64
- key: node.kubernetes.io/instance-type
operator: In
values:
- t4g.medium
- t4g.large
- t4g.xlarge
- key: topology.kubernetes.io/zone
operator: In
values:
- <YOUR_ZONE_A>
- <YOUR_ZONE_B>
- <YOUR_ZONE_C>
taints:
- effect: NoSchedule
key: workload
value: e6operator
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: e6operator
spec:
amiSelectorTerms:
- alias: al2023@latest
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 50Gi
volumeType: gp3
detailedMonitoring: false
metadataOptions:
httpEndpoint: enabled
httpProtocolIPv6: disabled
httpPutResponseHopLimit: 1
httpTokens: required
role: <YOUR_KARPENTER_NODE_ROLE>
securityGroupSelectorTerms:
- tags:
aws:eks:cluster-name: <YOUR_CLUSTER_NAME>
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: <YOUR_CLUSTER_NAME>
tags:
ManagedBy: karpenter
Name: e6operator
app: e6data
Apply the configuration:
Step 2: Install Cert-Manager with Tolerations¶
Install cert-manager configured to run on the operator's tainted nodes.
Create a values file:
# cert-manager-values.yaml
tolerations:
- key: workload
operator: Equal
value: e6operator
effect: NoSchedule
nodeSelector:
app: e6operator
webhook:
tolerations:
- key: workload
operator: Equal
value: e6operator
effect: NoSchedule
nodeSelector:
app: e6operator
cainjector:
tolerations:
- key: workload
operator: Equal
value: e6operator
effect: NoSchedule
nodeSelector:
app: e6operator
startupapicheck:
tolerations:
- key: workload
operator: Equal
value: e6operator
effect: NoSchedule
nodeSelector:
app: e6operator
Install cert-manager:
helm install cert-manager oci://quay.io/jetstack/charts/cert-manager \
--namespace cert-manager \
--create-namespace \
--set crds.enabled=true \
--set prometheus.enabled=false \
--set webhook.timeoutSeconds=4 \
-f cert-manager-values.yaml
Verify installation:
kubectl wait --for=condition=Available --timeout=120s -n cert-manager \
deployment/cert-manager deployment/cert-manager-webhook deployment/cert-manager-cainjector
Step 3: Create Image Pull Secret¶
Create the image pull secret for accessing e6data container images.
# Create operator namespace
kubectl create namespace e6operator
# Create secret
kubectl create secret docker-registry gcr-key \
--namespace e6operator \
--docker-server=us-docker.pkg.dev \
--docker-username=_json_key \
--docker-password="$(cat /path/to/service-account.json)" \
--docker-email=your-email@example.com
Step 4: Install CRDs¶
Install the e6data Custom Resource Definitions:
Verify CRDs are installed:
Expected output:
authgateways.e6data.io
catalogrefreshschedules.e6data.io
catalogrefreshes.e6data.io
e6catalogs.e6data.io
e6consoles.e6data.io
governances.e6data.io
metadataservices.e6data.io
monitoringservices.e6data.io
namespaceconfigs.e6data.io
pools.e6data.io
queryservices.e6data.io
trafficinfras.e6data.io
Step 5: Install Operator¶
Create the operator Helm values:
# operator-values.yaml
replicaCount: 2
image:
repository: us-docker.pkg.dev/e6data-analytics/e6data/e6-operator
pullPolicy: IfNotPresent
imagePullSecrets:
- name: gcr-key
serviceMonitor:
enabled: false
tolerations:
- key: workload
operator: Equal
value: e6operator
effect: NoSchedule
nodeSelector:
app: e6operator
karpenter:
enabled: true
Install the operator:
helm install e6-operator ./e6-operator/helm/e6-operator \
--namespace e6operator \
-f operator-values.yaml
Verify the operator is running:
kubectl get pods -n e6operator
kubectl logs -n e6operator -l app.kubernetes.io/name=e6-operator --tail=50
Step 6: Create Workspace Namespace and RBAC¶
Create a workspace namespace and configure RBAC for engine and monitoring service accounts.
# workspace-rbac.yaml
---
# Create namespace
apiVersion: v1
kind: Namespace
metadata:
name: <WORKSPACE_NAME>
---
# Engine ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
name: <WORKSPACE_NAME>-engine
namespace: <WORKSPACE_NAME>
---
# Engine Role (Namespace-scoped)
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: <WORKSPACE_NAME>-engine-role
namespace: <WORKSPACE_NAME>
rules:
# Pod status (read-only)
- apiGroups: [""]
resources: ["pods", "pods/status", "endpoints"]
verbs: ["get", "list", "watch"]
# Events (for observability)
- apiGroups: [""]
resources: ["events"]
verbs: ["get", "list", "watch", "create", "patch"]
# Service discovery
- apiGroups: [""]
resources: ["services"]
verbs: ["get", "list", "watch"]
# Deployment status (read-only)
- apiGroups: ["apps"]
resources: ["deployments", "deployments/status", "replicasets", "replicasets/status"]
verbs: ["get", "list", "watch"]
# Governance policies
- apiGroups: ["e6data.io"]
resources: ["governances"]
verbs: ["get", "list", "watch"]
---
# Engine RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: <WORKSPACE_NAME>-engine-role-binding
namespace: <WORKSPACE_NAME>
subjects:
- kind: ServiceAccount
name: <WORKSPACE_NAME>-engine
namespace: <WORKSPACE_NAME>
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: <WORKSPACE_NAME>-engine-role
---
# Monitoring ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
name: <WORKSPACE_NAME>-monitoring
namespace: <WORKSPACE_NAME>
---
# Monitoring ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: <WORKSPACE_NAME>-monitoring-role
rules:
# Pod/service discovery and metrics
- apiGroups: [""]
resources: ["nodes", "nodes/proxy", "services", "endpoints", "pods", "events"]
verbs: ["get", "list", "watch"]
# Deployment tracking
- apiGroups: ["apps"]
resources: ["replicasets"]
verbs: ["get", "list", "watch"]
# Prometheus metrics scraping
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
# Monitoring ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: <WORKSPACE_NAME>-monitoring-role-binding
subjects:
- kind: ServiceAccount
name: <WORKSPACE_NAME>-monitoring
namespace: <WORKSPACE_NAME>
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: <WORKSPACE_NAME>-monitoring-role
Apply the RBAC configuration:
# Replace <WORKSPACE_NAME> with your workspace name (e.g., workspace1)
sed 's/<WORKSPACE_NAME>/workspace1/g' workspace-rbac.yaml | kubectl apply -f -
Create the image pull secret in the workspace namespace:
kubectl create secret docker-registry gcr-key \
--namespace <WORKSPACE_NAME> \
--docker-server=us-docker.pkg.dev \
--docker-username=_json_key \
--docker-password="$(cat /path/to/service-account.json)" \
--docker-email=your-email@example.com
Step 7: Create S3 Bucket for Metadata¶
Create an S3 bucket for workspace metadata storage with security best practices.
# Set variables
BUCKET_NAME="e6-<WORKSPACE_NAME>-metadata"
REGION="us-east-1"
# Create bucket
aws s3api create-bucket \
--bucket ${BUCKET_NAME} \
--region ${REGION}
# Block public access
aws s3api put-public-access-block \
--bucket ${BUCKET_NAME} \
--public-access-block-configuration '{
"BlockPublicAcls": true,
"IgnorePublicAcls": true,
"BlockPublicPolicy": true,
"RestrictPublicBuckets": true
}'
# Enable server-side encryption
aws s3api put-bucket-encryption \
--bucket ${BUCKET_NAME} \
--server-side-encryption-configuration '{
"Rules": [{
"ApplyServerSideEncryptionByDefault": {
"SSEAlgorithm": "AES256"
}
}]
}'
# Add bucket policy to deny insecure transport
aws s3api put-bucket-policy \
--bucket ${BUCKET_NAME} \
--policy '{
"Version": "2012-10-17",
"Statement": [{
"Sid": "DenyInsecureTransport",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::'${BUCKET_NAME}'",
"arn:aws:s3:::'${BUCKET_NAME}'/*"
],
"Condition": {
"Bool": {
"aws:SecureTransport": "false"
}
}
}]
}'
Step 8: Create IAM Roles and Pod Identity Associations¶
8.1 Create Trust Policy¶
Create trust-policy.json:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "pods.eks.amazonaws.com"
},
"Action": ["sts:AssumeRole", "sts:TagSession"]
}
]
}
8.2 Create S3 Access Policy¶
Create s3-access-policy.json:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ReadWriteMetadataBucket",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:AbortMultipartUpload",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::<METADATA_BUCKET>",
"arn:aws:s3:::<METADATA_BUCKET>/*"
]
},
{
"Sid": "ReadOnlyDataBuckets",
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:ListBucket"],
"Resource": ["arn:aws:s3:::*"]
}
]
}
8.3 Create Glue Read Policy¶
Create glue-read-policy.json:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "GlueCatalogReadOnly",
"Effect": "Allow",
"Action": [
"glue:GetCatalogImportStatus",
"glue:GetDatabase",
"glue:GetDatabases",
"glue:GetTable",
"glue:GetTables",
"glue:GetTableVersion",
"glue:GetTableVersions",
"glue:GetPartition",
"glue:GetPartitions",
"glue:GetUserDefinedFunction",
"glue:GetUserDefinedFunctions",
"glue:SearchTables",
"glue:GetDataCatalogEncryptionSettings"
],
"Resource": "*"
}
]
}
8.4 Create S3 Monitoring Policy¶
Create s3-monitoring-policy.json:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ReadWriteMetadataBucket",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::<METADATA_BUCKET>",
"arn:aws:s3:::<METADATA_BUCKET>/*"
]
}
]
}
8.5 Create Engine IAM Role and Pod Identity¶
# Set variables
WORKSPACE_NAME="<WORKSPACE_NAME>"
CLUSTER_NAME="<YOUR_CLUSTER_NAME>"
ACCOUNT_ID="<YOUR_AWS_ACCOUNT_ID>"
# Create engine role
aws iam create-role \
--role-name ${WORKSPACE_NAME}-engine-access-role \
--assume-role-policy-document file://trust-policy.json
# Create and attach S3 policy
aws iam create-policy \
--policy-name ${WORKSPACE_NAME}-engine-s3-policy \
--policy-document file://s3-access-policy.json
aws iam attach-role-policy \
--role-name ${WORKSPACE_NAME}-engine-access-role \
--policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/${WORKSPACE_NAME}-engine-s3-policy
# Create and attach Glue policy
aws iam create-policy \
--policy-name ${WORKSPACE_NAME}-engine-glue-policy \
--policy-document file://glue-read-policy.json
aws iam attach-role-policy \
--role-name ${WORKSPACE_NAME}-engine-access-role \
--policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/${WORKSPACE_NAME}-engine-glue-policy
# Create Pod Identity Association for engine
aws eks create-pod-identity-association \
--cluster-name ${CLUSTER_NAME} \
--namespace ${WORKSPACE_NAME} \
--service-account ${WORKSPACE_NAME}-engine \
--role-arn arn:aws:iam::${ACCOUNT_ID}:role/${WORKSPACE_NAME}-engine-access-role
8.6 Create Monitoring IAM Role and Pod Identity¶
# Create monitoring role
aws iam create-role \
--role-name ${WORKSPACE_NAME}-monitoring-access-role \
--assume-role-policy-document file://trust-policy.json
# Create and attach monitoring S3 policy
aws iam create-policy \
--policy-name ${WORKSPACE_NAME}-monitoring-s3-policy \
--policy-document file://s3-monitoring-policy.json
aws iam attach-role-policy \
--role-name ${WORKSPACE_NAME}-monitoring-access-role \
--policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/${WORKSPACE_NAME}-monitoring-s3-policy
# Create Pod Identity Association for monitoring
aws eks create-pod-identity-association \
--cluster-name ${CLUSTER_NAME} \
--namespace ${WORKSPACE_NAME} \
--service-account ${WORKSPACE_NAME}-monitoring \
--role-arn arn:aws:iam::${ACCOUNT_ID}:role/${WORKSPACE_NAME}-monitoring-access-role
Verify Pod Identity Associations:
Step 9: Create Workspace NodePool and EC2NodeClass¶
Create compute nodes for the workspace with NVMe instance store support.
# workspace-nodepool.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
labels:
workspace-name: <WORKSPACE_NAME>
name: <WORKSPACE_NAME>-nodepool
spec:
disruption:
budgets:
- nodes: 100%
reasons:
- Empty
- nodes: "0"
reasons:
- Drifted
consolidateAfter: 30s
consolidationPolicy: WhenEmpty
limits:
cpu: 10000
template:
metadata:
labels:
workspace-name: <WORKSPACE_NAME>
spec:
expireAfter: 720h
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: <WORKSPACE_NAME>-nodeclass
requirements:
- key: karpenter.k8s.aws/instance-family
operator: In
values:
- c7g
- c7gd
- c8g
- r7g
- r7gd
- r8g
- m7g
- m7gd
- i8g
- key: topology.kubernetes.io/zone
operator: In
values:
- <YOUR_ZONE_A>
- <YOUR_ZONE_B>
- <YOUR_ZONE_C>
- key: karpenter.k8s.aws/instance-size
operator: NotIn
values:
- metal
taints:
- effect: NoSchedule
key: workspace-name
value: <WORKSPACE_NAME>
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: <WORKSPACE_NAME>-nodeclass
spec:
amiSelectorTerms:
- alias: al2023@latest
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 100Gi
volumeType: gp3
kubelet:
maxPods: 18
metadataOptions:
httpEndpoint: enabled
httpProtocolIPv6: disabled
httpPutResponseHopLimit: 1
httpTokens: required
role: <YOUR_KARPENTER_NODE_ROLE>
securityGroupSelectorTerms:
- tags:
aws:eks:cluster-name: <YOUR_CLUSTER_NAME>
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: <YOUR_CLUSTER_NAME>
tags:
Name: <WORKSPACE_NAME>
app: e6data
namespace: <WORKSPACE_NAME>
userData: |
mount_location="/app/tmp"
mkdir -p $mount_location
yum install nvme-cli -y
# Check if NVMe instance store drives are present
if nvme list | grep -q "Amazon EC2 NVMe Instance Storage"; then
nvme_drives=$(nvme list | grep "Amazon EC2 NVMe Instance Storage" | cut -d " " -f 1 || true)
readarray -t nvme_drives <<< "$nvme_drives"
num_drives=${#nvme_drives[@]}
if [ $num_drives -gt 1 ]; then
# Multiple NVMe drives - create RAID0 array
yum install mdadm -y
mdadm --create /dev/md0 --level=0 --name=md0 --raid-devices=$num_drives "${nvme_drives[@]}"
mkfs.ext4 /dev/md0
mount /dev/md0 $mount_location
mdadm --detail --scan >> /etc/mdadm.conf
echo /dev/md0 $mount_location ext4 defaults,noatime 0 2 >> /etc/fstab
else
# Single NVMe drive - format and mount directly
for disk in "${nvme_drives[@]}"; do
mkfs.ext4 -F $disk
mount $disk $mount_location
echo $disk $mount_location ext4 defaults,noatime 0 2 >> /etc/fstab
done
fi
else
echo "No NVMe drives detected. Skipping NVMe configuration."
fi
chmod 777 $mount_location
Apply the configuration:
Step 10: Deploy NamespaceConfig¶
Configure shared settings for the workspace:
# namespaceconfig.yaml
apiVersion: e6data.io/v1alpha1
kind: NamespaceConfig
metadata:
name: <WORKSPACE_NAME>-config
namespace: <WORKSPACE_NAME>
spec:
cloud: AWS
imagePullSecrets:
- gcr-key
karpenterNodePool: <WORKSPACE_NAME>-nodepool
serviceAccounts:
data: <WORKSPACE_NAME>-engine
monitoring: <WORKSPACE_NAME>-monitoring
storageBackend: s3a://<METADATA_BUCKET>
Apply:
Step 11: Deploy MetadataServices¶
Deploy the metadata storage and schema services:
# metadataservices.yaml
apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
name: <WORKSPACE_NAME>-mds
namespace: <WORKSPACE_NAME>
spec:
workspace: <WORKSPACE_NAME>
tenant: <YOUR_TENANT>
storage:
replicas: 2
resources:
memory: "8Gi"
cpu: "4"
schema:
replicas: 2
resources:
memory: "8Gi"
cpu: "4"
Apply and watch status:
Step 12: Register E6Catalog¶
Register your data catalog (AWS Glue example):
# e6catalog.yaml
apiVersion: e6data.io/v1alpha1
kind: E6Catalog
metadata:
name: glue-catalog
namespace: <WORKSPACE_NAME>
spec:
catalogType: GLUE
metadataServicesRef: <WORKSPACE_NAME>-mds
isDefault: true
connectionMetadata:
catalogConnection:
glueConnection:
region: <YOUR_AWS_REGION>
Apply:
Step 13: Deploy QueryService¶
Deploy the query execution cluster:
# queryservice.yaml
apiVersion: e6data.io/v1alpha1
kind: QueryService
metadata:
name: <WORKSPACE_NAME>-cluster
namespace: <WORKSPACE_NAME>
spec:
alias: <WORKSPACE_NAME>
workspace: <WORKSPACE_NAME>
planner:
resources:
memory: "8Gi"
cpu: "4"
queue:
resources:
memory: "4Gi"
cpu: "2"
executor:
replicas: 2
resources:
memory: "32Gi"
cpu: "16"
autoscaling:
enabled: true
minReplicas: 1
maxReplicas: 10
Apply and watch status:
Step 14: Deploy TrafficInfra¶
Deploy the Envoy-based traffic infrastructure:
# trafficinfra.yaml
apiVersion: e6data.io/v1alpha2
kind: TrafficInfra
metadata:
name: <WORKSPACE_NAME>-traffic
namespace: <WORKSPACE_NAME>
spec:
envoy:
replicas: 2
resources:
cpu: "500m"
memory: "512Mi"
xds:
resources:
cpu: "100m"
memory: "128Mi"
Apply:
Step 15: Deploy AuthGateway (Optional)¶
Deploy authentication gateway with AWS NLB:
# authgateway.yaml
apiVersion: e6data.io/v1alpha1
kind: AuthGateway
metadata:
name: <WORKSPACE_NAME>-auth
namespace: <WORKSPACE_NAME>
spec:
domain: <YOUR_DOMAIN>
replicas: 2
resources:
cpu: 200m
memory: 256Mi
service:
type: LoadBalancer
loadBalancerClass: service.k8s.aws/nlb
annotations:
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-alpn-policy: HTTP2Only
service.beta.kubernetes.io/aws-load-balancer-ssl-cert: <YOUR_ACM_CERTIFICATE_ARN>
service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
services:
- name: query
enabled: true
isGRPC: true
subdomain: query
timeout: 30s
backend:
serviceName: <WORKSPACE_NAME>-traffic-envoy
servicePort: 8080
Apply:
Step 16: Configure Governance (Optional)¶
Set up data access control policies:
# governance.yaml
apiVersion: e6data.io/v1alpha1
kind: Governance
metadata:
name: <WORKSPACE_NAME>-governance
namespace: <WORKSPACE_NAME>
spec:
policies:
- name: allow-analysts-read
policyType: GRANT_ACCESS
effect: ALLOW
principals:
users:
- analyst@company.com
groups:
- data-analysts
resources:
- catalog: glue-catalog
database: analytics_db
table: "*"
actions:
- SELECT
- name: mask-pii-columns
policyType: COLUMN_MASKING
maskType: MASK_HASH
principals:
groups:
- data-analysts
resources:
- catalog: glue-catalog
database: customers_db
table: users
columns:
- email
- phone
Apply:
Verification¶
Check All Components¶
# Operator
kubectl get pods -n e6operator
# Workspace components
kubectl get all -n <WORKSPACE_NAME>
# CRD statuses
kubectl get mds,qs,e6cat,trafficinfra,authgateway -n <WORKSPACE_NAME>
Test Connectivity¶
# Get the LoadBalancer endpoint
kubectl get svc -n <WORKSPACE_NAME> -l app.kubernetes.io/component=authgateway
# Test query endpoint (if AuthGateway deployed)
# Connect via JDBC/ODBC to the LoadBalancer endpoint
Next Steps¶
- Configure TLS for AuthGateway
- Set up Autoscaling
- Enable Query History with GreptimeDB
- Configure Monitoring
- Troubleshooting Guide