Azure AKS Prerequisites¶

This guide covers all Azure-specific prerequisites for deploying the e6data Kubernetes Operator on Azure Kubernetes Service (AKS).

Quick Reference¶

Requirement	Status	Notes
AKS 1.24+	Required	Kubernetes cluster
Workload Identity	Required	For Azure AD authentication
Azure Blob Storage	Required	Data lake storage
Azure RBAC	Required	Least-privilege access
Karpenter 1.0+	Recommended	Dynamic node provisioning (ARM64 Ampere Altra)
Azure Synapse	Optional	If using Synapse metastore
Databricks Unity	Optional	If using Unity Catalog

1. Workload Identity Setup¶

Azure Workload Identity is the recommended authentication method for AKS workloads.

1.1 Enable Workload Identity on AKS Cluster¶

# For new cluster
az aks create \
  --resource-group YOUR_RG \
  --name YOUR_CLUSTER \
  --enable-oidc-issuer \
  --enable-workload-identity \
  --location eastus

# For existing cluster
az aks update \
  --resource-group YOUR_RG \
  --name YOUR_CLUSTER \
  --enable-oidc-issuer \
  --enable-workload-identity

1.2 Get OIDC Issuer URL¶

export AKS_OIDC_ISSUER=$(az aks show \
  --resource-group YOUR_RG \
  --name YOUR_CLUSTER \
  --query "oidcIssuerProfile.issuerUrl" \
  --output tsv)

echo "OIDC Issuer: $AKS_OIDC_ISSUER"

1.3 Create Managed Identity¶

# Create managed identity
az identity create \
  --name e6data-workspace-identity \
  --resource-group YOUR_RG \
  --location eastus

# Get identity details
export IDENTITY_CLIENT_ID=$(az identity show \
  --name e6data-workspace-identity \
  --resource-group YOUR_RG \
  --query "clientId" \
  --output tsv)

export IDENTITY_PRINCIPAL_ID=$(az identity show \
  --name e6data-workspace-identity \
  --resource-group YOUR_RG \
  --query "principalId" \
  --output tsv)

echo "Client ID: $IDENTITY_CLIENT_ID"
echo "Principal ID: $IDENTITY_PRINCIPAL_ID"

1.4 Create Federated Credential¶

# Create federated credential for Kubernetes service account
az identity federated-credential create \
  --name e6data-workspace-fedcred \
  --identity-name e6data-workspace-identity \
  --resource-group YOUR_RG \
  --issuer "${AKS_OIDC_ISSUER}" \
  --subject "system:serviceaccount:workspace-prod:analytics-prod" \
  --audience "api://AzureADTokenExchange"

For multiple namespaces:

# Create federated credentials for each namespace
for NS in workspace-prod workspace-staging workspace-dev; do
  SA_NAME="analytics-${NS#workspace-}"
  az identity federated-credential create \
    --name "e6data-${NS}-fedcred" \
    --identity-name e6data-workspace-identity \
    --resource-group YOUR_RG \
    --issuer "${AKS_OIDC_ISSUER}" \
    --subject "system:serviceaccount:${NS}:${SA_NAME}" \
    --audience "api://AzureADTokenExchange"
done

1.5 Create and Annotate Kubernetes ServiceAccount¶

apiVersion: v1
kind: ServiceAccount
metadata:
  name: analytics-prod
  namespace: workspace-prod
  annotations:
    azure.workload.identity/client-id: "${IDENTITY_CLIENT_ID}"
  labels:
    azure.workload.identity/use: "true"

Apply with kubectl:

kubectl apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
  name: analytics-prod
  namespace: workspace-prod
  annotations:
    azure.workload.identity/client-id: "${IDENTITY_CLIENT_ID}"
  labels:
    azure.workload.identity/use: "true"
EOF

2. Azure RBAC (Least Privilege)¶

2.1 Storage Blob Access (Required)¶

Create custom role for blob storage:

# Create custom role definition
cat > e6data-storage-role.json <<EOF
{
  "Name": "e6data Storage Access",
  "Description": "Least privilege blob storage access for e6data workloads",
  "Actions": [],
  "NotActions": [],
  "DataActions": [
    "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read",
    "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write",
    "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete",
    "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/move/action",
    "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/add/action"
  ],
  "NotDataActions": [],
  "AssignableScopes": [
    "/subscriptions/YOUR_SUBSCRIPTION_ID"
  ]
}
EOF

# Create the custom role
az role definition create --role-definition e6data-storage-role.json

2.2 Assign Storage Roles¶

export STORAGE_ACCOUNT_ID=$(az storage account show \
  --name YOUR_STORAGE_ACCOUNT \
  --resource-group YOUR_RG \
  --query "id" \
  --output tsv)

# Grant Storage Blob Data Reader for data access
az role assignment create \
  --role "Storage Blob Data Reader" \
  --assignee "${IDENTITY_PRINCIPAL_ID}" \
  --scope "${STORAGE_ACCOUNT_ID}"

# Grant Storage Blob Data Contributor for cache/metadata writes
# Scope to specific containers for least privilege
az role assignment create \
  --role "Storage Blob Data Contributor" \
  --assignee "${IDENTITY_PRINCIPAL_ID}" \
  --scope "${STORAGE_ACCOUNT_ID}/blobServices/default/containers/e6data-cache"

az role assignment create \
  --role "Storage Blob Data Contributor" \
  --assignee "${IDENTITY_PRINCIPAL_ID}" \
  --scope "${STORAGE_ACCOUNT_ID}/blobServices/default/containers/e6data-metadata"

2.3 Storage Access Policy JSON (For Terraform/ARM)¶

Create e6data-storage-policy.json:

{
  "properties": {
    "roleName": "e6data Storage Access",
    "description": "Least privilege storage access for e6data workloads",
    "type": "CustomRole",
    "permissions": [
      {
        "actions": [],
        "notActions": [],
        "dataActions": [
          "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read",
          "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write",
          "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete"
        ],
        "notDataActions": []
      }
    ],
    "assignableScopes": [
      "/subscriptions/YOUR_SUBSCRIPTION_ID/resourceGroups/YOUR_RG/providers/Microsoft.Storage/storageAccounts/YOUR_STORAGE_ACCOUNT"
    ]
  }
}

2.4 Azure Synapse/SQL Access (If Using Azure Metastore)¶

# Grant Synapse SQL Administrator (if using Synapse)
az role assignment create \
  --role "Synapse SQL Administrator" \
  --assignee "${IDENTITY_PRINCIPAL_ID}" \
  --scope "/subscriptions/YOUR_SUBSCRIPTION_ID/resourceGroups/YOUR_RG/providers/Microsoft.Synapse/workspaces/YOUR_SYNAPSE_WORKSPACE"

# Or grant Reader role for read-only access
az role assignment create \
  --role "Synapse Artifact User" \
  --assignee "${IDENTITY_PRINCIPAL_ID}" \
  --scope "/subscriptions/YOUR_SUBSCRIPTION_ID/resourceGroups/YOUR_RG/providers/Microsoft.Synapse/workspaces/YOUR_SYNAPSE_WORKSPACE"

2.5 Databricks Unity Catalog Access (If Using Unity Catalog)¶

For Unity Catalog, you need to configure access in Databricks workspace:

# First, get the Azure AD application ID
export APP_ID=$(az ad app show --id $IDENTITY_CLIENT_ID --query appId -o tsv)

# In Databricks, add the service principal
# This is typically done through Databricks UI or API:
# 1. Go to Admin Console > Service Principals
# 2. Add service principal with Application ID
# 3. Grant access to Unity Catalog

Unity Catalog permissions (set in Databricks): - USE CATALOG on catalog - USE SCHEMA on schemas - SELECT on tables

2.6 GreptimeDB Storage Access¶

# Create separate identity for GreptimeDB
az identity create \
  --name e6data-greptime-identity \
  --resource-group YOUR_RG \
  --location eastus

GREPTIME_PRINCIPAL_ID=$(az identity show \
  --name e6data-greptime-identity \
  --resource-group YOUR_RG \
  --query "principalId" \
  --output tsv)

# Grant full blob access to GreptimeDB container
az role assignment create \
  --role "Storage Blob Data Contributor" \
  --assignee "${GREPTIME_PRINCIPAL_ID}" \
  --scope "${STORAGE_ACCOUNT_ID}/blobServices/default/containers/greptime-data"

2.7 Complete IAM Setup Script¶

#!/bin/bash
set -e

SUBSCRIPTION_ID="your-subscription-id"
RESOURCE_GROUP="your-rg"
CLUSTER_NAME="your-cluster"
STORAGE_ACCOUNT="your-storage-account"
LOCATION="eastus"

# Get OIDC issuer
AKS_OIDC_ISSUER=$(az aks show \
  --resource-group $RESOURCE_GROUP \
  --name $CLUSTER_NAME \
  --query "oidcIssuerProfile.issuerUrl" \
  --output tsv)

# Create managed identity
az identity create \
  --name e6data-workspace-identity \
  --resource-group $RESOURCE_GROUP \
  --location $LOCATION

IDENTITY_CLIENT_ID=$(az identity show \
  --name e6data-workspace-identity \
  --resource-group $RESOURCE_GROUP \
  --query "clientId" \
  --output tsv)

IDENTITY_PRINCIPAL_ID=$(az identity show \
  --name e6data-workspace-identity \
  --resource-group $RESOURCE_GROUP \
  --query "principalId" \
  --output tsv)

# Create federated credential
az identity federated-credential create \
  --name e6data-workspace-fedcred \
  --identity-name e6data-workspace-identity \
  --resource-group $RESOURCE_GROUP \
  --issuer "${AKS_OIDC_ISSUER}" \
  --subject "system:serviceaccount:workspace-prod:analytics-prod" \
  --audience "api://AzureADTokenExchange"

# Get storage account ID
STORAGE_ACCOUNT_ID=$(az storage account show \
  --name $STORAGE_ACCOUNT \
  --resource-group $RESOURCE_GROUP \
  --query "id" \
  --output tsv)

# Grant storage access
az role assignment create \
  --role "Storage Blob Data Reader" \
  --assignee "${IDENTITY_PRINCIPAL_ID}" \
  --scope "${STORAGE_ACCOUNT_ID}"

echo "Setup complete. Create Kubernetes SA with:"
echo "  annotation: azure.workload.identity/client-id: $IDENTITY_CLIENT_ID"
echo "  label: azure.workload.identity/use: true"

3. Karpenter Setup (Recommended)¶

Azure supports Karpenter for dynamic node provisioning.

3.1 Install Karpenter on AKS¶

export KARPENTER_VERSION="1.0.0"
export CLUSTER_NAME="your-cluster"
export RESOURCE_GROUP="your-rg"
export LOCATION="eastus"

# Add Karpenter Helm repo
helm repo add karpenter https://charts.karpenter.sh
helm repo update

# Install Karpenter
helm upgrade --install karpenter karpenter/karpenter \
  --namespace karpenter --create-namespace \
  --version ${KARPENTER_VERSION} \
  --set settings.azure.clusterName=${CLUSTER_NAME} \
  --set settings.azure.resourceGroup=${RESOURCE_GROUP} \
  --set settings.azure.subscriptionID=${SUBSCRIPTION_ID} \
  --wait

3.2 NodePool for e6data (ARM64 Ampere Altra - Recommended)¶

Azure offers Dpsv5 and Epsv5 series with Ampere Altra ARM64 processors:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: e6data-compute
spec:
  template:
    metadata:
      labels:
        e6data.io/node-type: compute
    spec:
      requirements:
        # Prefer ARM64 (Ampere Altra) for cost savings
        - key: kubernetes.io/arch
          operator: In
          values: ["arm64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.azure.com/sku-family
          operator: In
          values:
            - Dpsv5    # ARM64 general purpose (best price/performance)
            - Epsv5    # ARM64 memory-optimized
            - Dpdsv5   # ARM64 with local SSD
            - Epdsv5   # ARM64 memory-optimized with local SSD
        - key: karpenter.azure.com/sku-cpu
          operator: In
          values:
            - "16"
            - "32"
            - "48"
            - "64"
        # Capacity type
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]

      # Taints for workload isolation
      taints:
        - key: e6data-workspace-name
          value: "prod"
          effect: NoSchedule
        # Azure spot instance toleration (auto-added by e6 operator)
        - key: kubernetes.azure.com/scalesetpriority
          value: spot
          effect: NoSchedule

      # Node expiry
      expireAfter: 720h  # 30 days

      nodeClassRef:
        group: karpenter.azure.com
        kind: AKSNodeClass
        name: e6data-arm64

  # Resource limits
  limits:
    cpu: 2000
    memory: 8000Gi

  # Disruption settings
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 5m
    budgets:
      - nodes: "10%"

3.3 AKSNodeClass¶

apiVersion: karpenter.azure.com/v1alpha1
kind: AKSNodeClass
metadata:
  name: e6data-arm64
spec:
  # Image reference
  imageFamily: Ubuntu2204

  # Network configuration
  vnetSubnetID: /subscriptions/SUBSCRIPTION_ID/resourceGroups/YOUR_RG/providers/Microsoft.Network/virtualNetworks/YOUR_VNET/subnets/YOUR_SUBNET

  # OS disk configuration
  osDiskSizeGB: 128
  osDiskType: Managed
  osDiskStorageAccountType: Premium_LRS

  # Data disk for caching (optional)
  dataDisks:
    - diskSizeGB: 256
      storageAccountType: Premium_LRS
      lun: 0
      caching: ReadWrite

  # Tags
  tags:
    environment: production
    team: data-platform
    managed-by: karpenter

3.4 NodePool for AMD64/Intel (If ARM64 Not Suitable)¶

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: e6data-compute-amd64
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.azure.com/sku-family
          operator: In
          values:
            - Dsv5     # Intel general purpose
            - Esv5     # Intel memory-optimized
            - Ddsv5    # Intel with local SSD
            - Edsv5    # Intel memory-optimized with local SSD
            - Dasv5    # AMD general purpose
            - Easv5    # AMD memory-optimized
        - key: karpenter.azure.com/sku-cpu
          operator: In
          values:
            - "16"
            - "32"
            - "48"
            - "64"
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]

      taints:
        - key: e6data-workspace-name
          value: "prod"
          effect: NoSchedule

      nodeClassRef:
        group: karpenter.azure.com
        kind: AKSNodeClass
        name: e6data-amd64

  limits:
    cpu: 1000
    memory: 4000Gi

4. Alternative: AKS Node Auto-Provisioning¶

AKS has built-in Node Auto-Provisioning (NAP) as an alternative to Karpenter.

4.1 Enable Node Auto-Provisioning¶

az aks update \
  --resource-group YOUR_RG \
  --name YOUR_CLUSTER \
  --enable-node-provisioning

4.2 NAP vs Karpenter¶

Feature	NAP	Karpenter
Management	Azure-managed	Self-managed
ARM64 support	Yes	Yes
Customization	Limited	Full
Spot support	Yes	Yes
Multi-AZ	Automatic	Configurable
Learning curve	Lower	Higher

Recommendation: Use Karpenter for more control, NAP for simpler deployments.

5. Spot Instance Configuration¶

Azure Spot VMs can reduce costs by up to 90%.

5.1 Configure Spot Priority¶

In the NodePool spec:

spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]  # Prefer spot, fallback to on-demand

5.2 Spot Eviction Handling¶

The e6data operator automatically adds spot toleration:

tolerations:
  - key: kubernetes.azure.com/scalesetpriority
    value: spot
    effect: NoSchedule

6. Verification¶

6.1 Test Workload Identity¶

# Create test pod
kubectl run test-wi --rm -it --restart=Never \
  --namespace=workspace-prod \
  --serviceaccount=analytics-prod \
  --image=mcr.microsoft.com/azure-cli \
  -- az account show

# Expected: Shows managed identity information

6.2 Test Blob Storage Access¶

kubectl run test-storage --rm -it --restart=Never \
  --namespace=workspace-prod \
  --serviceaccount=analytics-prod \
  --image=mcr.microsoft.com/azure-cli \
  -- az storage blob list \
    --account-name YOUR_STORAGE_ACCOUNT \
    --container-name YOUR_CONTAINER \
    --auth-mode login \
    --num-results 5

6.3 Verify Karpenter¶

# Check Karpenter pods
kubectl get pods -n karpenter

# Check NodePools
kubectl get nodepools

# Check AKSNodeClasses
kubectl get aksnodeclasses

# Watch provisioned nodes
kubectl get nodes -l karpenter.sh/nodepool=e6data-compute -w

7. Best Practices¶

7.1 Security¶

Workload Identity: Always use Workload Identity (never use cluster identity)
Least privilege: Only grant required storage containers and scopes
Private endpoints: Use private endpoints for storage accounts in production
Managed identities: Use separate identities for workspace vs GreptimeDB
Azure Policy: Enable Azure Policy for AKS compliance

7.2 Cost Optimization¶

ARM64 (Ampere Altra): 20-40% cheaper than comparable x86 instances
Spot VMs: Use for fault-tolerant workloads (up to 90% savings)
SKU families: Let Karpenter choose optimal size within family
Reserved instances: For predictable workloads
Auto-scaling: Enable cluster autoscaler or Karpenter

7.3 Performance¶

VM size: Use 16 vCPU or larger for query executors
Memory-optimized: Use E-series for large datasets
Premium storage: Use Premium SSD for OS disk
Ephemeral OS disk: Consider for stateless workloads
Proximity placement groups: For latency-sensitive workloads

7.4 High Availability¶

Availability Zones: Deploy across multiple AZs
Regional clusters: Use regional clusters for zone redundancy
Zone-redundant storage: Use ZRS for storage accounts