Skip to content

AWS Complete Onboarding Guide

This guide walks through deploying the e6data operator and a workspace on AWS EKS, step by step. Follow these steps in order for a complete production deployment.


Prerequisites

Before starting, ensure you have:

Requirement Description
EKS Cluster Kubernetes 1.24+ with OIDC provider enabled
Karpenter v0.32+ installed and configured
EKS Pod Identity Agent Addon installed on the cluster
AWS CLI Configured with appropriate permissions
kubectl Connected to your EKS cluster
Helm v3.8+ installed

Step 1: Create Operator NodePool and EC2NodeClass

Create dedicated nodes for the e6data operator with taints to isolate operator workloads.

# operator-nodepool.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: e6operator
spec:
  disruption:
    budgets:
      - nodes: 10%
    consolidateAfter: 30s
    consolidationPolicy: WhenEmptyOrUnderutilized
  limits:
    cpu: "100"
    memory: 100Gi
  template:
    metadata:
      labels:
        app: e6operator
    spec:
      expireAfter: 720h
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: e6operator
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values:
            - spot
            - on-demand
        - key: kubernetes.io/arch
          operator: In
          values:
            - arm64
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
            - t4g.medium
            - t4g.large
            - t4g.xlarge
        - key: topology.kubernetes.io/zone
          operator: In
          values:
            - <YOUR_ZONE_A>
            - <YOUR_ZONE_B>
            - <YOUR_ZONE_C>
      taints:
        - effect: NoSchedule
          key: workload
          value: e6operator

---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: e6operator
spec:
  amiSelectorTerms:
    - alias: al2023@latest
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 50Gi
        volumeType: gp3
  detailedMonitoring: false
  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: disabled
    httpPutResponseHopLimit: 1
    httpTokens: required
  role: <YOUR_KARPENTER_NODE_ROLE>
  securityGroupSelectorTerms:
    - tags:
        aws:eks:cluster-name: <YOUR_CLUSTER_NAME>
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: <YOUR_CLUSTER_NAME>
  tags:
    ManagedBy: karpenter
    Name: e6operator
    app: e6data

Apply the configuration:

kubectl apply -f operator-nodepool.yaml

Step 2: Install Cert-Manager with Tolerations

Install cert-manager configured to run on the operator's tainted nodes.

Create a values file:

# cert-manager-values.yaml
tolerations:
  - key: workload
    operator: Equal
    value: e6operator
    effect: NoSchedule

nodeSelector:
  app: e6operator

webhook:
  tolerations:
    - key: workload
      operator: Equal
      value: e6operator
      effect: NoSchedule
  nodeSelector:
    app: e6operator

cainjector:
  tolerations:
    - key: workload
      operator: Equal
      value: e6operator
      effect: NoSchedule
  nodeSelector:
    app: e6operator

startupapicheck:
  tolerations:
    - key: workload
      operator: Equal
      value: e6operator
      effect: NoSchedule
  nodeSelector:
    app: e6operator

Install cert-manager:

helm install cert-manager oci://quay.io/jetstack/charts/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --set crds.enabled=true \
  --set prometheus.enabled=false \
  --set webhook.timeoutSeconds=4 \
  -f cert-manager-values.yaml

Verify installation:

kubectl wait --for=condition=Available --timeout=120s -n cert-manager \
  deployment/cert-manager deployment/cert-manager-webhook deployment/cert-manager-cainjector

Step 3: Create Image Pull Secret

Create the image pull secret for accessing e6data container images.

# Create operator namespace
kubectl create namespace e6operator

# Create secret
kubectl create secret docker-registry gcr-key \
  --namespace e6operator \
  --docker-server=us-docker.pkg.dev \
  --docker-username=_json_key \
  --docker-password="$(cat /path/to/service-account.json)" \
  --docker-email=your-email@example.com

Step 4: Install CRDs

Install the e6data Custom Resource Definitions:

helm install e6-operator-crds ./e6-operator/helm/e6-operator-crds \
  --namespace e6operator

Verify CRDs are installed:

kubectl get crds | grep e6data

Expected output:

authgateways.e6data.io
catalogrefreshschedules.e6data.io
catalogrefreshes.e6data.io
e6catalogs.e6data.io
e6consoles.e6data.io
governances.e6data.io
metadataservices.e6data.io
monitoringservices.e6data.io
namespaceconfigs.e6data.io
pools.e6data.io
queryservices.e6data.io
trafficinfras.e6data.io


Step 5: Install Operator

Create the operator Helm values:

# operator-values.yaml
replicaCount: 2

image:
  repository: us-docker.pkg.dev/e6data-analytics/e6data/e6-operator
  pullPolicy: IfNotPresent

imagePullSecrets:
  - name: gcr-key

serviceMonitor:
  enabled: false

tolerations:
  - key: workload
    operator: Equal
    value: e6operator
    effect: NoSchedule

nodeSelector:
  app: e6operator

karpenter:
  enabled: true

Install the operator:

helm install e6-operator ./e6-operator/helm/e6-operator \
  --namespace e6operator \
  -f operator-values.yaml

Verify the operator is running:

kubectl get pods -n e6operator
kubectl logs -n e6operator -l app.kubernetes.io/name=e6-operator --tail=50

Step 6: Create Workspace Namespace and RBAC

Create a workspace namespace and configure RBAC for engine and monitoring service accounts.

# workspace-rbac.yaml
---
# Create namespace
apiVersion: v1
kind: Namespace
metadata:
  name: <WORKSPACE_NAME>

---
# Engine ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
  name: <WORKSPACE_NAME>-engine
  namespace: <WORKSPACE_NAME>

---
# Engine Role (Namespace-scoped)
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: <WORKSPACE_NAME>-engine-role
  namespace: <WORKSPACE_NAME>
rules:
  # Pod status (read-only)
  - apiGroups: [""]
    resources: ["pods", "pods/status", "endpoints"]
    verbs: ["get", "list", "watch"]
  # Events (for observability)
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["get", "list", "watch", "create", "patch"]
  # Service discovery
  - apiGroups: [""]
    resources: ["services"]
    verbs: ["get", "list", "watch"]
  # Deployment status (read-only)
  - apiGroups: ["apps"]
    resources: ["deployments", "deployments/status", "replicasets", "replicasets/status"]
    verbs: ["get", "list", "watch"]
  # Governance policies
  - apiGroups: ["e6data.io"]
    resources: ["governances"]
    verbs: ["get", "list", "watch"]

---
# Engine RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: <WORKSPACE_NAME>-engine-role-binding
  namespace: <WORKSPACE_NAME>
subjects:
  - kind: ServiceAccount
    name: <WORKSPACE_NAME>-engine
    namespace: <WORKSPACE_NAME>
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: <WORKSPACE_NAME>-engine-role

---
# Monitoring ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
  name: <WORKSPACE_NAME>-monitoring
  namespace: <WORKSPACE_NAME>

---
# Monitoring ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: <WORKSPACE_NAME>-monitoring-role
rules:
  # Pod/service discovery and metrics
  - apiGroups: [""]
    resources: ["nodes", "nodes/proxy", "services", "endpoints", "pods", "events"]
    verbs: ["get", "list", "watch"]
  # Deployment tracking
  - apiGroups: ["apps"]
    resources: ["replicasets"]
    verbs: ["get", "list", "watch"]
  # Prometheus metrics scraping
  - nonResourceURLs: ["/metrics"]
    verbs: ["get"]

---
# Monitoring ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: <WORKSPACE_NAME>-monitoring-role-binding
subjects:
  - kind: ServiceAccount
    name: <WORKSPACE_NAME>-monitoring
    namespace: <WORKSPACE_NAME>
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: <WORKSPACE_NAME>-monitoring-role

Apply the RBAC configuration:

# Replace <WORKSPACE_NAME> with your workspace name (e.g., workspace1)
sed 's/<WORKSPACE_NAME>/workspace1/g' workspace-rbac.yaml | kubectl apply -f -

Create the image pull secret in the workspace namespace:

kubectl create secret docker-registry gcr-key \
  --namespace <WORKSPACE_NAME> \
  --docker-server=us-docker.pkg.dev \
  --docker-username=_json_key \
  --docker-password="$(cat /path/to/service-account.json)" \
  --docker-email=your-email@example.com

Step 7: Create S3 Bucket for Metadata

Create an S3 bucket for workspace metadata storage with security best practices.

# Set variables
BUCKET_NAME="e6-<WORKSPACE_NAME>-metadata"
REGION="us-east-1"

# Create bucket
aws s3api create-bucket \
  --bucket ${BUCKET_NAME} \
  --region ${REGION}

# Block public access
aws s3api put-public-access-block \
  --bucket ${BUCKET_NAME} \
  --public-access-block-configuration '{
    "BlockPublicAcls": true,
    "IgnorePublicAcls": true,
    "BlockPublicPolicy": true,
    "RestrictPublicBuckets": true
  }'

# Enable server-side encryption
aws s3api put-bucket-encryption \
  --bucket ${BUCKET_NAME} \
  --server-side-encryption-configuration '{
    "Rules": [{
      "ApplyServerSideEncryptionByDefault": {
        "SSEAlgorithm": "AES256"
      }
    }]
  }'

# Add bucket policy to deny insecure transport
aws s3api put-bucket-policy \
  --bucket ${BUCKET_NAME} \
  --policy '{
    "Version": "2012-10-17",
    "Statement": [{
      "Sid": "DenyInsecureTransport",
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:*",
      "Resource": [
        "arn:aws:s3:::'${BUCKET_NAME}'",
        "arn:aws:s3:::'${BUCKET_NAME}'/*"
      ],
      "Condition": {
        "Bool": {
          "aws:SecureTransport": "false"
        }
      }
    }]
  }'

Step 8: Create IAM Roles and Pod Identity Associations

8.1 Create Trust Policy

Create trust-policy.json:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "pods.eks.amazonaws.com"
      },
      "Action": ["sts:AssumeRole", "sts:TagSession"]
    }
  ]
}

8.2 Create S3 Access Policy

Create s3-access-policy.json:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ReadWriteMetadataBucket",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:AbortMultipartUpload",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::<METADATA_BUCKET>",
        "arn:aws:s3:::<METADATA_BUCKET>/*"
      ]
    },
    {
      "Sid": "ReadOnlyDataBuckets",
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:ListBucket"],
      "Resource": ["arn:aws:s3:::*"]
    }
  ]
}

8.3 Create Glue Read Policy

Create glue-read-policy.json:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "GlueCatalogReadOnly",
      "Effect": "Allow",
      "Action": [
        "glue:GetCatalogImportStatus",
        "glue:GetDatabase",
        "glue:GetDatabases",
        "glue:GetTable",
        "glue:GetTables",
        "glue:GetTableVersion",
        "glue:GetTableVersions",
        "glue:GetPartition",
        "glue:GetPartitions",
        "glue:GetUserDefinedFunction",
        "glue:GetUserDefinedFunctions",
        "glue:SearchTables",
        "glue:GetDataCatalogEncryptionSettings"
      ],
      "Resource": "*"
    }
  ]
}

8.4 Create S3 Monitoring Policy

Create s3-monitoring-policy.json:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ReadWriteMetadataBucket",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::<METADATA_BUCKET>",
        "arn:aws:s3:::<METADATA_BUCKET>/*"
      ]
    }
  ]
}

8.5 Create Engine IAM Role and Pod Identity

# Set variables
WORKSPACE_NAME="<WORKSPACE_NAME>"
CLUSTER_NAME="<YOUR_CLUSTER_NAME>"
ACCOUNT_ID="<YOUR_AWS_ACCOUNT_ID>"

# Create engine role
aws iam create-role \
  --role-name ${WORKSPACE_NAME}-engine-access-role \
  --assume-role-policy-document file://trust-policy.json

# Create and attach S3 policy
aws iam create-policy \
  --policy-name ${WORKSPACE_NAME}-engine-s3-policy \
  --policy-document file://s3-access-policy.json

aws iam attach-role-policy \
  --role-name ${WORKSPACE_NAME}-engine-access-role \
  --policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/${WORKSPACE_NAME}-engine-s3-policy

# Create and attach Glue policy
aws iam create-policy \
  --policy-name ${WORKSPACE_NAME}-engine-glue-policy \
  --policy-document file://glue-read-policy.json

aws iam attach-role-policy \
  --role-name ${WORKSPACE_NAME}-engine-access-role \
  --policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/${WORKSPACE_NAME}-engine-glue-policy

# Create Pod Identity Association for engine
aws eks create-pod-identity-association \
  --cluster-name ${CLUSTER_NAME} \
  --namespace ${WORKSPACE_NAME} \
  --service-account ${WORKSPACE_NAME}-engine \
  --role-arn arn:aws:iam::${ACCOUNT_ID}:role/${WORKSPACE_NAME}-engine-access-role

8.6 Create Monitoring IAM Role and Pod Identity

# Create monitoring role
aws iam create-role \
  --role-name ${WORKSPACE_NAME}-monitoring-access-role \
  --assume-role-policy-document file://trust-policy.json

# Create and attach monitoring S3 policy
aws iam create-policy \
  --policy-name ${WORKSPACE_NAME}-monitoring-s3-policy \
  --policy-document file://s3-monitoring-policy.json

aws iam attach-role-policy \
  --role-name ${WORKSPACE_NAME}-monitoring-access-role \
  --policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/${WORKSPACE_NAME}-monitoring-s3-policy

# Create Pod Identity Association for monitoring
aws eks create-pod-identity-association \
  --cluster-name ${CLUSTER_NAME} \
  --namespace ${WORKSPACE_NAME} \
  --service-account ${WORKSPACE_NAME}-monitoring \
  --role-arn arn:aws:iam::${ACCOUNT_ID}:role/${WORKSPACE_NAME}-monitoring-access-role

Verify Pod Identity Associations:

aws eks list-pod-identity-associations --cluster-name ${CLUSTER_NAME}

Step 9: Create Workspace NodePool and EC2NodeClass

Create compute nodes for the workspace with NVMe instance store support.

# workspace-nodepool.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  labels:
    workspace-name: <WORKSPACE_NAME>
  name: <WORKSPACE_NAME>-nodepool
spec:
  disruption:
    budgets:
      - nodes: 100%
        reasons:
          - Empty
      - nodes: "0"
        reasons:
          - Drifted
    consolidateAfter: 30s
    consolidationPolicy: WhenEmpty
  limits:
    cpu: 10000
  template:
    metadata:
      labels:
        workspace-name: <WORKSPACE_NAME>
    spec:
      expireAfter: 720h
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: <WORKSPACE_NAME>-nodeclass
      requirements:
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values:
            - c7g
            - c7gd
            - c8g
            - r7g
            - r7gd
            - r8g
            - m7g
            - m7gd
            - i8g
        - key: topology.kubernetes.io/zone
          operator: In
          values:
            - <YOUR_ZONE_A>
            - <YOUR_ZONE_B>
            - <YOUR_ZONE_C>
        - key: karpenter.k8s.aws/instance-size
          operator: NotIn
          values:
            - metal
      taints:
        - effect: NoSchedule
          key: workspace-name
          value: <WORKSPACE_NAME>

---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: <WORKSPACE_NAME>-nodeclass
spec:
  amiSelectorTerms:
    - alias: al2023@latest
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
  kubelet:
    maxPods: 18
  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: disabled
    httpPutResponseHopLimit: 1
    httpTokens: required
  role: <YOUR_KARPENTER_NODE_ROLE>
  securityGroupSelectorTerms:
    - tags:
        aws:eks:cluster-name: <YOUR_CLUSTER_NAME>
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: <YOUR_CLUSTER_NAME>
  tags:
    Name: <WORKSPACE_NAME>
    app: e6data
    namespace: <WORKSPACE_NAME>
  userData: |
    mount_location="/app/tmp"
    mkdir -p $mount_location
    yum install nvme-cli -y

    # Check if NVMe instance store drives are present
    if nvme list | grep -q "Amazon EC2 NVMe Instance Storage"; then
        nvme_drives=$(nvme list | grep "Amazon EC2 NVMe Instance Storage" | cut -d " " -f 1 || true)
        readarray -t nvme_drives <<< "$nvme_drives"
        num_drives=${#nvme_drives[@]}

        if [ $num_drives -gt 1 ]; then
            # Multiple NVMe drives - create RAID0 array
            yum install mdadm -y
            mdadm --create /dev/md0 --level=0 --name=md0 --raid-devices=$num_drives "${nvme_drives[@]}"
            mkfs.ext4 /dev/md0
            mount /dev/md0 $mount_location
            mdadm --detail --scan >> /etc/mdadm.conf
            echo /dev/md0 $mount_location ext4 defaults,noatime 0 2 >> /etc/fstab
        else
            # Single NVMe drive - format and mount directly
            for disk in "${nvme_drives[@]}"; do
                mkfs.ext4 -F $disk
                mount $disk $mount_location
                echo $disk $mount_location ext4 defaults,noatime 0 2 >> /etc/fstab
            done
        fi
    else
        echo "No NVMe drives detected. Skipping NVMe configuration."
    fi

    chmod 777 $mount_location

Apply the configuration:

sed 's/<WORKSPACE_NAME>/workspace1/g' workspace-nodepool.yaml | kubectl apply -f -

Step 10: Deploy NamespaceConfig

Configure shared settings for the workspace:

# namespaceconfig.yaml
apiVersion: e6data.io/v1alpha1
kind: NamespaceConfig
metadata:
  name: <WORKSPACE_NAME>-config
  namespace: <WORKSPACE_NAME>
spec:
  cloud: AWS
  imagePullSecrets:
    - gcr-key
  karpenterNodePool: <WORKSPACE_NAME>-nodepool
  serviceAccounts:
    data: <WORKSPACE_NAME>-engine
    monitoring: <WORKSPACE_NAME>-monitoring
  storageBackend: s3a://<METADATA_BUCKET>

Apply:

kubectl apply -f namespaceconfig.yaml

Step 11: Deploy MetadataServices

Deploy the metadata storage and schema services:

# metadataservices.yaml
apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
  name: <WORKSPACE_NAME>-mds
  namespace: <WORKSPACE_NAME>
spec:
  workspace: <WORKSPACE_NAME>
  tenant: <YOUR_TENANT>

  storage:
    replicas: 2
    resources:
      memory: "8Gi"
      cpu: "4"

  schema:
    replicas: 2
    resources:
      memory: "8Gi"
      cpu: "4"

Apply and watch status:

kubectl apply -f metadataservices.yaml
kubectl get mds -n <WORKSPACE_NAME> -w

Step 12: Register E6Catalog

Register your data catalog (AWS Glue example):

# e6catalog.yaml
apiVersion: e6data.io/v1alpha1
kind: E6Catalog
metadata:
  name: glue-catalog
  namespace: <WORKSPACE_NAME>
spec:
  catalogType: GLUE
  metadataServicesRef: <WORKSPACE_NAME>-mds
  isDefault: true
  connectionMetadata:
    catalogConnection:
      glueConnection:
        region: <YOUR_AWS_REGION>

Apply:

kubectl apply -f e6catalog.yaml
kubectl get e6cat -n <WORKSPACE_NAME>

Step 13: Deploy QueryService

Deploy the query execution cluster:

# queryservice.yaml
apiVersion: e6data.io/v1alpha1
kind: QueryService
metadata:
  name: <WORKSPACE_NAME>-cluster
  namespace: <WORKSPACE_NAME>
spec:
  alias: <WORKSPACE_NAME>
  workspace: <WORKSPACE_NAME>

  planner:
    resources:
      memory: "8Gi"
      cpu: "4"

  queue:
    resources:
      memory: "4Gi"
      cpu: "2"

  executor:
    replicas: 2
    resources:
      memory: "32Gi"
      cpu: "16"
    autoscaling:
      enabled: true
      minReplicas: 1
      maxReplicas: 10

Apply and watch status:

kubectl apply -f queryservice.yaml
kubectl get qs -n <WORKSPACE_NAME> -w

Step 14: Deploy TrafficInfra

Deploy the Envoy-based traffic infrastructure:

# trafficinfra.yaml
apiVersion: e6data.io/v1alpha2
kind: TrafficInfra
metadata:
  name: <WORKSPACE_NAME>-traffic
  namespace: <WORKSPACE_NAME>
spec:
  envoy:
    replicas: 2
    resources:
      cpu: "500m"
      memory: "512Mi"

  xds:
    resources:
      cpu: "100m"
      memory: "128Mi"

Apply:

kubectl apply -f trafficinfra.yaml
kubectl get trafficinfra -n <WORKSPACE_NAME>

Step 15: Deploy AuthGateway (Optional)

Deploy authentication gateway with AWS NLB:

# authgateway.yaml
apiVersion: e6data.io/v1alpha1
kind: AuthGateway
metadata:
  name: <WORKSPACE_NAME>-auth
  namespace: <WORKSPACE_NAME>
spec:
  domain: <YOUR_DOMAIN>
  replicas: 2
  resources:
    cpu: 200m
    memory: 256Mi
  service:
    type: LoadBalancer
    loadBalancerClass: service.k8s.aws/nlb
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
      service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
      service.beta.kubernetes.io/aws-load-balancer-alpn-policy: HTTP2Only
      service.beta.kubernetes.io/aws-load-balancer-ssl-cert: <YOUR_ACM_CERTIFICATE_ARN>
      service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
  services:
    - name: query
      enabled: true
      isGRPC: true
      subdomain: query
      timeout: 30s
      backend:
        serviceName: <WORKSPACE_NAME>-traffic-envoy
        servicePort: 8080

Apply:

kubectl apply -f authgateway.yaml
kubectl get authgateway -n <WORKSPACE_NAME>

Step 16: Configure Governance (Optional)

Set up data access control policies:

# governance.yaml
apiVersion: e6data.io/v1alpha1
kind: Governance
metadata:
  name: <WORKSPACE_NAME>-governance
  namespace: <WORKSPACE_NAME>
spec:
  policies:
    - name: allow-analysts-read
      policyType: GRANT_ACCESS
      effect: ALLOW
      principals:
        users:
          - analyst@company.com
        groups:
          - data-analysts
      resources:
        - catalog: glue-catalog
          database: analytics_db
          table: "*"
      actions:
        - SELECT

    - name: mask-pii-columns
      policyType: COLUMN_MASKING
      maskType: MASK_HASH
      principals:
        groups:
          - data-analysts
      resources:
        - catalog: glue-catalog
          database: customers_db
          table: users
          columns:
            - email
            - phone

Apply:

kubectl apply -f governance.yaml
kubectl get governance -n <WORKSPACE_NAME>

Verification

Check All Components

# Operator
kubectl get pods -n e6operator

# Workspace components
kubectl get all -n <WORKSPACE_NAME>

# CRD statuses
kubectl get mds,qs,e6cat,trafficinfra,authgateway -n <WORKSPACE_NAME>

Test Connectivity

# Get the LoadBalancer endpoint
kubectl get svc -n <WORKSPACE_NAME> -l app.kubernetes.io/component=authgateway

# Test query endpoint (if AuthGateway deployed)
# Connect via JDBC/ODBC to the LoadBalancer endpoint

Next Steps