AWS Complete Onboarding Guide¶

This guide walks through deploying the e6data operator and a workspace on AWS EKS, step by step. Follow these steps in order for a complete production deployment.

Prerequisites¶

Before starting, ensure you have:

Requirement	Description
EKS Cluster	Kubernetes 1.24+ with OIDC provider enabled
Karpenter	v0.32+ installed and configured
EKS Pod Identity Agent	Addon installed on the cluster
AWS CLI	Configured with appropriate permissions
kubectl	Connected to your EKS cluster
Helm	v3.8+ installed

Step 1: Create Operator NodePool and EC2NodeClass¶

Create dedicated nodes for the e6data operator with taints to isolate operator workloads.

# operator-nodepool.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: e6operator
spec:
  disruption:
    budgets:
      - nodes: 10%
    consolidateAfter: 30s
    consolidationPolicy: WhenEmptyOrUnderutilized
  limits:
    cpu: "100"
    memory: 100Gi
  template:
    metadata:
      labels:
        app: e6operator
    spec:
      expireAfter: 720h
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: e6operator
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values:
            - spot
            - on-demand
        - key: kubernetes.io/arch
          operator: In
          values:
            - arm64
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
            - t4g.medium
            - t4g.large
            - t4g.xlarge
        - key: topology.kubernetes.io/zone
          operator: In
          values:
            - <YOUR_ZONE_A>
            - <YOUR_ZONE_B>
            - <YOUR_ZONE_C>
      taints:
        - effect: NoSchedule
          key: workload
          value: e6operator

---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: e6operator
spec:
  amiSelectorTerms:
    - alias: al2023@latest
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 50Gi
        volumeType: gp3
  detailedMonitoring: false
  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: disabled
    httpPutResponseHopLimit: 1
    httpTokens: required
  role: <YOUR_KARPENTER_NODE_ROLE>
  securityGroupSelectorTerms:
    - tags:
        aws:eks:cluster-name: <YOUR_CLUSTER_NAME>
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: <YOUR_CLUSTER_NAME>
  tags:
    ManagedBy: karpenter
    Name: e6operator
    app: e6data

Apply the configuration:

kubectl apply -f operator-nodepool.yaml

Step 2: Install Cert-Manager with Tolerations¶

Install cert-manager configured to run on the operator's tainted nodes.

Create a values file:

# cert-manager-values.yaml
tolerations:
  - key: workload
    operator: Equal
    value: e6operator
    effect: NoSchedule

nodeSelector:
  app: e6operator

webhook:
  tolerations:
    - key: workload
      operator: Equal
      value: e6operator
      effect: NoSchedule
  nodeSelector:
    app: e6operator

cainjector:
  tolerations:
    - key: workload
      operator: Equal
      value: e6operator
      effect: NoSchedule
  nodeSelector:
    app: e6operator

startupapicheck:
  tolerations:
    - key: workload
      operator: Equal
      value: e6operator
      effect: NoSchedule
  nodeSelector:
    app: e6operator

Install cert-manager:

helm install cert-manager oci://quay.io/jetstack/charts/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --set crds.enabled=true \
  --set prometheus.enabled=false \
  --set webhook.timeoutSeconds=4 \
  -f cert-manager-values.yaml

Verify installation:

kubectl wait --for=condition=Available --timeout=120s -n cert-manager \
  deployment/cert-manager deployment/cert-manager-webhook deployment/cert-manager-cainjector

Step 3: Create Image Pull Secret¶

Create the image pull secret for accessing e6data container images.

# Create operator namespace
kubectl create namespace e6operator

# Create secret
kubectl create secret docker-registry gcr-key \
  --namespace e6operator \
  --docker-server=us-docker.pkg.dev \
  --docker-username=_json_key \
  --docker-password="$(cat /path/to/service-account.json)" \
  --docker-email=your-email@example.com

Step 4: Install CRDs¶

Install the e6data Custom Resource Definitions:

helm install e6-operator-crds ./e6-operator/helm/e6-operator-crds \
  --namespace e6operator

Verify CRDs are installed:

kubectl get crds | grep e6data

Expected output:

authgateways.e6data.io
catalogrefreshschedules.e6data.io
catalogrefreshes.e6data.io
e6catalogs.e6data.io
e6consoles.e6data.io
governances.e6data.io
metadataservices.e6data.io
monitoringservices.e6data.io
namespaceconfigs.e6data.io
pools.e6data.io
queryservices.e6data.io
trafficinfras.e6data.io

Step 5: Install Operator¶

Create the operator Helm values:

# operator-values.yaml
replicaCount: 2

image:
  repository: us-docker.pkg.dev/e6data-analytics/e6data/e6-operator
  pullPolicy: IfNotPresent

imagePullSecrets:
  - name: gcr-key

serviceMonitor:
  enabled: false

tolerations:
  - key: workload
    operator: Equal
    value: e6operator
    effect: NoSchedule

nodeSelector:
  app: e6operator

karpenter:
  enabled: true

Install the operator:

helm install e6-operator ./e6-operator/helm/e6-operator \
  --namespace e6operator \
  -f operator-values.yaml

Verify the operator is running:

kubectl get pods -n e6operator
kubectl logs -n e6operator -l app.kubernetes.io/name=e6-operator --tail=50

Step 6: Create Workspace Namespace and RBAC¶

Create a workspace namespace and configure RBAC for engine and monitoring service accounts.

# workspace-rbac.yaml
---
# Create namespace
apiVersion: v1
kind: Namespace
metadata:
  name: <WORKSPACE_NAME>

---
# Engine ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
  name: <WORKSPACE_NAME>-engine
  namespace: <WORKSPACE_NAME>

---
# Engine Role (Namespace-scoped)
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: <WORKSPACE_NAME>-engine-role
  namespace: <WORKSPACE_NAME>
rules:
  # Pod status (read-only)
  - apiGroups: [""]
    resources: ["pods", "pods/status", "endpoints"]
    verbs: ["get", "list", "watch"]
  # Events (for observability)
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["get", "list", "watch", "create", "patch"]
  # Service discovery
  - apiGroups: [""]
    resources: ["services"]
    verbs: ["get", "list", "watch"]
  # Deployment status (read-only)
  - apiGroups: ["apps"]
    resources: ["deployments", "deployments/status", "replicasets", "replicasets/status"]
    verbs: ["get", "list", "watch"]
  # Governance policies
  - apiGroups: ["e6data.io"]
    resources: ["governances"]
    verbs: ["get", "list", "watch"]

---
# Engine RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: <WORKSPACE_NAME>-engine-role-binding
  namespace: <WORKSPACE_NAME>
subjects:
  - kind: ServiceAccount
    name: <WORKSPACE_NAME>-engine
    namespace: <WORKSPACE_NAME>
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: <WORKSPACE_NAME>-engine-role

---
# Monitoring ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
  name: <WORKSPACE_NAME>-monitoring
  namespace: <WORKSPACE_NAME>

---
# Monitoring ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: <WORKSPACE_NAME>-monitoring-role
rules:
  # Pod/service discovery and metrics
  - apiGroups: [""]
    resources: ["nodes", "nodes/proxy", "services", "endpoints", "pods", "events"]
    verbs: ["get", "list", "watch"]
  # Deployment tracking
  - apiGroups: ["apps"]
    resources: ["replicasets"]
    verbs: ["get", "list", "watch"]
  # Prometheus metrics scraping
  - nonResourceURLs: ["/metrics"]
    verbs: ["get"]

---
# Monitoring ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: <WORKSPACE_NAME>-monitoring-role-binding
subjects:
  - kind: ServiceAccount
    name: <WORKSPACE_NAME>-monitoring
    namespace: <WORKSPACE_NAME>
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: <WORKSPACE_NAME>-monitoring-role

Apply the RBAC configuration:

# Replace <WORKSPACE_NAME> with your workspace name (e.g., workspace1)
sed 's/<WORKSPACE_NAME>/workspace1/g' workspace-rbac.yaml | kubectl apply -f -

Create the image pull secret in the workspace namespace:

kubectl create secret docker-registry gcr-key \
  --namespace <WORKSPACE_NAME> \
  --docker-server=us-docker.pkg.dev \
  --docker-username=_json_key \
  --docker-password="$(cat /path/to/service-account.json)" \
  --docker-email=your-email@example.com

Step 7: Create S3 Bucket for Metadata¶

Create an S3 bucket for workspace metadata storage with security best practices.

# Set variables
BUCKET_NAME="e6-<WORKSPACE_NAME>-metadata"
REGION="us-east-1"

# Create bucket
aws s3api create-bucket \
  --bucket ${BUCKET_NAME} \
  --region ${REGION}

# Block public access
aws s3api put-public-access-block \
  --bucket ${BUCKET_NAME} \
  --public-access-block-configuration '{
    "BlockPublicAcls": true,
    "IgnorePublicAcls": true,
    "BlockPublicPolicy": true,
    "RestrictPublicBuckets": true
  }'

# Enable server-side encryption
aws s3api put-bucket-encryption \
  --bucket ${BUCKET_NAME} \
  --server-side-encryption-configuration '{
    "Rules": [{
      "ApplyServerSideEncryptionByDefault": {
        "SSEAlgorithm": "AES256"
      }
    }]
  }'

# Add bucket policy to deny insecure transport
aws s3api put-bucket-policy \
  --bucket ${BUCKET_NAME} \
  --policy '{
    "Version": "2012-10-17",
    "Statement": [{
      "Sid": "DenyInsecureTransport",
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:*",
      "Resource": [
        "arn:aws:s3:::'${BUCKET_NAME}'",
        "arn:aws:s3:::'${BUCKET_NAME}'/*"
      ],
      "Condition": {
        "Bool": {
          "aws:SecureTransport": "false"
        }
      }
    }]
  }'

Step 8: Create IAM Roles and Pod Identity Associations¶

8.1 Create Trust Policy¶

Create trust-policy.json:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "pods.eks.amazonaws.com"
      },
      "Action": ["sts:AssumeRole", "sts:TagSession"]
    }
  ]
}

8.2 Create S3 Access Policy¶

Create s3-access-policy.json:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ReadWriteMetadataBucket",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:AbortMultipartUpload",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::<METADATA_BUCKET>",
        "arn:aws:s3:::<METADATA_BUCKET>/*"
      ]
    },
    {
      "Sid": "ReadOnlyDataBuckets",
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:ListBucket"],
      "Resource": ["arn:aws:s3:::*"]
    }
  ]
}

8.3 Create Glue Read Policy¶

Create glue-read-policy.json:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "GlueCatalogReadOnly",
      "Effect": "Allow",
      "Action": [
        "glue:GetCatalogImportStatus",
        "glue:GetDatabase",
        "glue:GetDatabases",
        "glue:GetTable",
        "glue:GetTables",
        "glue:GetTableVersion",
        "glue:GetTableVersions",
        "glue:GetPartition",
        "glue:GetPartitions",
        "glue:GetUserDefinedFunction",
        "glue:GetUserDefinedFunctions",
        "glue:SearchTables",
        "glue:GetDataCatalogEncryptionSettings"
      ],
      "Resource": "*"
    }
  ]
}

8.4 Create S3 Monitoring Policy¶

Create s3-monitoring-policy.json:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ReadWriteMetadataBucket",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::<METADATA_BUCKET>",
        "arn:aws:s3:::<METADATA_BUCKET>/*"
      ]
    }
  ]
}

8.5 Create Engine IAM Role and Pod Identity¶

# Set variables
WORKSPACE_NAME="<WORKSPACE_NAME>"
CLUSTER_NAME="<YOUR_CLUSTER_NAME>"
ACCOUNT_ID="<YOUR_AWS_ACCOUNT_ID>"

# Create engine role
aws iam create-role \
  --role-name ${WORKSPACE_NAME}-engine-access-role \
  --assume-role-policy-document file://trust-policy.json

# Create and attach S3 policy
aws iam create-policy \
  --policy-name ${WORKSPACE_NAME}-engine-s3-policy \
  --policy-document file://s3-access-policy.json

aws iam attach-role-policy \
  --role-name ${WORKSPACE_NAME}-engine-access-role \
  --policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/${WORKSPACE_NAME}-engine-s3-policy

# Create and attach Glue policy
aws iam create-policy \
  --policy-name ${WORKSPACE_NAME}-engine-glue-policy \
  --policy-document file://glue-read-policy.json

aws iam attach-role-policy \
  --role-name ${WORKSPACE_NAME}-engine-access-role \
  --policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/${WORKSPACE_NAME}-engine-glue-policy

# Create Pod Identity Association for engine
aws eks create-pod-identity-association \
  --cluster-name ${CLUSTER_NAME} \
  --namespace ${WORKSPACE_NAME} \
  --service-account ${WORKSPACE_NAME}-engine \
  --role-arn arn:aws:iam::${ACCOUNT_ID}:role/${WORKSPACE_NAME}-engine-access-role

8.6 Create Monitoring IAM Role and Pod Identity¶

# Create monitoring role
aws iam create-role \
  --role-name ${WORKSPACE_NAME}-monitoring-access-role \
  --assume-role-policy-document file://trust-policy.json

# Create and attach monitoring S3 policy
aws iam create-policy \
  --policy-name ${WORKSPACE_NAME}-monitoring-s3-policy \
  --policy-document file://s3-monitoring-policy.json

aws iam attach-role-policy \
  --role-name ${WORKSPACE_NAME}-monitoring-access-role \
  --policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/${WORKSPACE_NAME}-monitoring-s3-policy

# Create Pod Identity Association for monitoring
aws eks create-pod-identity-association \
  --cluster-name ${CLUSTER_NAME} \
  --namespace ${WORKSPACE_NAME} \
  --service-account ${WORKSPACE_NAME}-monitoring \
  --role-arn arn:aws:iam::${ACCOUNT_ID}:role/${WORKSPACE_NAME}-monitoring-access-role

Verify Pod Identity Associations:

aws eks list-pod-identity-associations --cluster-name ${CLUSTER_NAME}

Step 9: Create Workspace NodePool and EC2NodeClass¶

Create compute nodes for the workspace with NVMe instance store support.

# workspace-nodepool.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  labels:
    workspace-name: <WORKSPACE_NAME>
  name: <WORKSPACE_NAME>-nodepool
spec:
  disruption:
    budgets:
      - nodes: 100%
        reasons:
          - Empty
      - nodes: "0"
        reasons:
          - Drifted
    consolidateAfter: 30s
    consolidationPolicy: WhenEmpty
  limits:
    cpu: 10000
  template:
    metadata:
      labels:
        workspace-name: <WORKSPACE_NAME>
    spec:
      expireAfter: 720h
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: <WORKSPACE_NAME>-nodeclass
      requirements:
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values:
            - c7g
            - c7gd
            - c8g
            - r7g
            - r7gd
            - r8g
            - m7g
            - m7gd
            - i8g
        - key: topology.kubernetes.io/zone
          operator: In
          values:
            - <YOUR_ZONE_A>
            - <YOUR_ZONE_B>
            - <YOUR_ZONE_C>
        - key: karpenter.k8s.aws/instance-size
          operator: NotIn
          values:
            - metal
      taints:
        - effect: NoSchedule
          key: workspace-name
          value: <WORKSPACE_NAME>

---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: <WORKSPACE_NAME>-nodeclass
spec:
  amiSelectorTerms:
    - alias: al2023@latest
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
  kubelet:
    maxPods: 18
  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: disabled
    httpPutResponseHopLimit: 1
    httpTokens: required
  role: <YOUR_KARPENTER_NODE_ROLE>
  securityGroupSelectorTerms:
    - tags:
        aws:eks:cluster-name: <YOUR_CLUSTER_NAME>
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: <YOUR_CLUSTER_NAME>
  tags:
    Name: <WORKSPACE_NAME>
    app: e6data
    namespace: <WORKSPACE_NAME>
  userData: |
    mount_location="/app/tmp"
    mkdir -p $mount_location
    yum install nvme-cli -y

    # Check if NVMe instance store drives are present
    if nvme list | grep -q "Amazon EC2 NVMe Instance Storage"; then
        nvme_drives=$(nvme list | grep "Amazon EC2 NVMe Instance Storage" | cut -d " " -f 1 || true)
        readarray -t nvme_drives <<< "$nvme_drives"
        num_drives=${#nvme_drives[@]}

        if [ $num_drives -gt 1 ]; then
            # Multiple NVMe drives - create RAID0 array
            yum install mdadm -y
            mdadm --create /dev/md0 --level=0 --name=md0 --raid-devices=$num_drives "${nvme_drives[@]}"
            mkfs.ext4 /dev/md0
            mount /dev/md0 $mount_location
            mdadm --detail --scan >> /etc/mdadm.conf
            echo /dev/md0 $mount_location ext4 defaults,noatime 0 2 >> /etc/fstab
        else
            # Single NVMe drive - format and mount directly
            for disk in "${nvme_drives[@]}"; do
                mkfs.ext4 -F $disk
                mount $disk $mount_location
                echo $disk $mount_location ext4 defaults,noatime 0 2 >> /etc/fstab
            done
        fi
    else
        echo "No NVMe drives detected. Skipping NVMe configuration."
    fi

    chmod 777 $mount_location

Apply the configuration:

sed 's/<WORKSPACE_NAME>/workspace1/g' workspace-nodepool.yaml | kubectl apply -f -

Step 10: Deploy NamespaceConfig¶

Configure shared settings for the workspace:

# namespaceconfig.yaml
apiVersion: e6data.io/v1alpha1
kind: NamespaceConfig
metadata:
  name: <WORKSPACE_NAME>-config
  namespace: <WORKSPACE_NAME>
spec:
  cloud: AWS
  imagePullSecrets:
    - gcr-key
  karpenterNodePool: <WORKSPACE_NAME>-nodepool
  serviceAccounts:
    data: <WORKSPACE_NAME>-engine
    monitoring: <WORKSPACE_NAME>-monitoring
  storageBackend: s3a://<METADATA_BUCKET>

Apply:

kubectl apply -f namespaceconfig.yaml

Step 11: Deploy MetadataServices¶

Deploy the metadata storage and schema services:

# metadataservices.yaml
apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
  name: <WORKSPACE_NAME>-mds
  namespace: <WORKSPACE_NAME>
spec:
  workspace: <WORKSPACE_NAME>
  tenant: <YOUR_TENANT>

  storage:
    replicas: 2
    resources:
      memory: "8Gi"
      cpu: "4"

  schema:
    replicas: 2
    resources:
      memory: "8Gi"
      cpu: "4"

Apply and watch status:

kubectl apply -f metadataservices.yaml
kubectl get mds -n <WORKSPACE_NAME> -w

Step 12: Register E6Catalog¶

Register your data catalog (AWS Glue example):

# e6catalog.yaml
apiVersion: e6data.io/v1alpha1
kind: E6Catalog
metadata:
  name: glue-catalog
  namespace: <WORKSPACE_NAME>
spec:
  catalogType: GLUE
  metadataServicesRef: <WORKSPACE_NAME>-mds
  isDefault: true
  connectionMetadata:
    catalogConnection:
      glueConnection:
        region: <YOUR_AWS_REGION>

Apply:

kubectl apply -f e6catalog.yaml
kubectl get e6cat -n <WORKSPACE_NAME>

Step 13: Deploy QueryService¶

Deploy the query execution cluster:

# queryservice.yaml
apiVersion: e6data.io/v1alpha1
kind: QueryService
metadata:
  name: <WORKSPACE_NAME>-cluster
  namespace: <WORKSPACE_NAME>
spec:
  alias: <WORKSPACE_NAME>
  workspace: <WORKSPACE_NAME>

  planner:
    resources:
      memory: "8Gi"
      cpu: "4"

  queue:
    resources:
      memory: "4Gi"
      cpu: "2"

  executor:
    replicas: 2
    resources:
      memory: "32Gi"
      cpu: "16"
    autoscaling:
      enabled: true
      minReplicas: 1
      maxReplicas: 10

Apply and watch status:

kubectl apply -f queryservice.yaml
kubectl get qs -n <WORKSPACE_NAME> -w

Step 14: Deploy TrafficInfra¶

Deploy the Envoy-based traffic infrastructure:

# trafficinfra.yaml
apiVersion: e6data.io/v1alpha2
kind: TrafficInfra
metadata:
  name: <WORKSPACE_NAME>-traffic
  namespace: <WORKSPACE_NAME>
spec:
  envoy:
    replicas: 2
    resources:
      cpu: "500m"
      memory: "512Mi"

  xds:
    resources:
      cpu: "100m"
      memory: "128Mi"

Apply:

kubectl apply -f trafficinfra.yaml
kubectl get trafficinfra -n <WORKSPACE_NAME>

Step 15: Deploy AuthGateway (Optional)¶

Deploy authentication gateway with AWS NLB:

# authgateway.yaml
apiVersion: e6data.io/v1alpha1
kind: AuthGateway
metadata:
  name: <WORKSPACE_NAME>-auth
  namespace: <WORKSPACE_NAME>
spec:
  domain: <YOUR_DOMAIN>
  replicas: 2
  resources:
    cpu: 200m
    memory: 256Mi
  service:
    type: LoadBalancer
    loadBalancerClass: service.k8s.aws/nlb
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
      service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
      service.beta.kubernetes.io/aws-load-balancer-alpn-policy: HTTP2Only
      service.beta.kubernetes.io/aws-load-balancer-ssl-cert: <YOUR_ACM_CERTIFICATE_ARN>
      service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
  services:
    - name: query
      enabled: true
      isGRPC: true
      subdomain: query
      timeout: 30s
      backend:
        serviceName: <WORKSPACE_NAME>-traffic-envoy
        servicePort: 8080

Apply:

kubectl apply -f authgateway.yaml
kubectl get authgateway -n <WORKSPACE_NAME>

Step 16: Configure Governance (Optional)¶

Set up data access control policies:

# governance.yaml
apiVersion: e6data.io/v1alpha1
kind: Governance
metadata:
  name: <WORKSPACE_NAME>-governance
  namespace: <WORKSPACE_NAME>
spec:
  policies:
    - name: allow-analysts-read
      policyType: GRANT_ACCESS
      effect: ALLOW
      principals:
        users:
          - analyst@company.com
        groups:
          - data-analysts
      resources:
        - catalog: glue-catalog
          database: analytics_db
          table: "*"
      actions:
        - SELECT

    - name: mask-pii-columns
      policyType: COLUMN_MASKING
      maskType: MASK_HASH
      principals:
        groups:
          - data-analysts
      resources:
        - catalog: glue-catalog
          database: customers_db
          table: users
          columns:
            - email
            - phone

Apply:

kubectl apply -f governance.yaml
kubectl get governance -n <WORKSPACE_NAME>

Verification¶

Check All Components¶

# Operator
kubectl get pods -n e6operator

# Workspace components
kubectl get all -n <WORKSPACE_NAME>

# CRD statuses
kubectl get mds,qs,e6cat,trafficinfra,authgateway -n <WORKSPACE_NAME>

Test Connectivity¶

# Get the LoadBalancer endpoint
kubectl get svc -n <WORKSPACE_NAME> -l app.kubernetes.io/component=authgateway

# Test query endpoint (if AuthGateway deployed)
# Connect via JDBC/ODBC to the LoadBalancer endpoint

AWS Complete Onboarding Guide¶

Prerequisites¶

Step 1: Create Operator NodePool and EC2NodeClass¶

Step 2: Install Cert-Manager with Tolerations¶

Step 3: Create Image Pull Secret¶

Step 4: Install CRDs¶

Step 5: Install Operator¶

Step 6: Create Workspace Namespace and RBAC¶

Step 7: Create S3 Bucket for Metadata¶

Step 8: Create IAM Roles and Pod Identity Associations¶

8.1 Create Trust Policy¶

8.2 Create S3 Access Policy¶

8.3 Create Glue Read Policy¶

8.4 Create S3 Monitoring Policy¶

8.5 Create Engine IAM Role and Pod Identity¶

8.6 Create Monitoring IAM Role and Pod Identity¶

Step 9: Create Workspace NodePool and EC2NodeClass¶

Step 10: Deploy NamespaceConfig¶

Step 11: Deploy MetadataServices¶

Step 12: Register E6Catalog¶

Step 13: Deploy QueryService¶

Step 14: Deploy TrafficInfra¶

Step 15: Deploy AuthGateway (Optional)¶

Step 16: Configure Governance (Optional)¶

Verification¶

Check All Components¶

Test Connectivity¶

Next Steps¶

Related Documentation¶