Skip to content

Azure AKS Prerequisites

This guide covers all Azure-specific prerequisites for deploying the e6data Kubernetes Operator on Azure Kubernetes Service (AKS).


Quick Reference

Requirement Status Notes
AKS 1.24+ Required Kubernetes cluster
Workload Identity Required For Azure AD authentication
Azure Blob Storage Required Data lake storage
Azure RBAC Required Least-privilege access
Karpenter 1.0+ Recommended Dynamic node provisioning (ARM64 Ampere Altra)
Azure Synapse Optional If using Synapse metastore
Databricks Unity Optional If using Unity Catalog

1. Workload Identity Setup

Azure Workload Identity is the recommended authentication method for AKS workloads.

1.1 Enable Workload Identity on AKS Cluster

# For new cluster
az aks create \
  --resource-group YOUR_RG \
  --name YOUR_CLUSTER \
  --enable-oidc-issuer \
  --enable-workload-identity \
  --location eastus

# For existing cluster
az aks update \
  --resource-group YOUR_RG \
  --name YOUR_CLUSTER \
  --enable-oidc-issuer \
  --enable-workload-identity

1.2 Get OIDC Issuer URL

export AKS_OIDC_ISSUER=$(az aks show \
  --resource-group YOUR_RG \
  --name YOUR_CLUSTER \
  --query "oidcIssuerProfile.issuerUrl" \
  --output tsv)

echo "OIDC Issuer: $AKS_OIDC_ISSUER"

1.3 Create Managed Identity

# Create managed identity
az identity create \
  --name e6data-workspace-identity \
  --resource-group YOUR_RG \
  --location eastus

# Get identity details
export IDENTITY_CLIENT_ID=$(az identity show \
  --name e6data-workspace-identity \
  --resource-group YOUR_RG \
  --query "clientId" \
  --output tsv)

export IDENTITY_PRINCIPAL_ID=$(az identity show \
  --name e6data-workspace-identity \
  --resource-group YOUR_RG \
  --query "principalId" \
  --output tsv)

echo "Client ID: $IDENTITY_CLIENT_ID"
echo "Principal ID: $IDENTITY_PRINCIPAL_ID"

1.4 Create Federated Credential

# Create federated credential for Kubernetes service account
az identity federated-credential create \
  --name e6data-workspace-fedcred \
  --identity-name e6data-workspace-identity \
  --resource-group YOUR_RG \
  --issuer "${AKS_OIDC_ISSUER}" \
  --subject "system:serviceaccount:workspace-prod:analytics-prod" \
  --audience "api://AzureADTokenExchange"

For multiple namespaces:

# Create federated credentials for each namespace
for NS in workspace-prod workspace-staging workspace-dev; do
  SA_NAME="analytics-${NS#workspace-}"
  az identity federated-credential create \
    --name "e6data-${NS}-fedcred" \
    --identity-name e6data-workspace-identity \
    --resource-group YOUR_RG \
    --issuer "${AKS_OIDC_ISSUER}" \
    --subject "system:serviceaccount:${NS}:${SA_NAME}" \
    --audience "api://AzureADTokenExchange"
done

1.5 Create and Annotate Kubernetes ServiceAccount

apiVersion: v1
kind: ServiceAccount
metadata:
  name: analytics-prod
  namespace: workspace-prod
  annotations:
    azure.workload.identity/client-id: "${IDENTITY_CLIENT_ID}"
  labels:
    azure.workload.identity/use: "true"

Apply with kubectl:

kubectl apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
  name: analytics-prod
  namespace: workspace-prod
  annotations:
    azure.workload.identity/client-id: "${IDENTITY_CLIENT_ID}"
  labels:
    azure.workload.identity/use: "true"
EOF

2. Azure RBAC (Least Privilege)

2.1 Storage Blob Access (Required)

Create custom role for blob storage:

# Create custom role definition
cat > e6data-storage-role.json <<EOF
{
  "Name": "e6data Storage Access",
  "Description": "Least privilege blob storage access for e6data workloads",
  "Actions": [],
  "NotActions": [],
  "DataActions": [
    "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read",
    "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write",
    "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete",
    "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/move/action",
    "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/add/action"
  ],
  "NotDataActions": [],
  "AssignableScopes": [
    "/subscriptions/YOUR_SUBSCRIPTION_ID"
  ]
}
EOF

# Create the custom role
az role definition create --role-definition e6data-storage-role.json

2.2 Assign Storage Roles

export STORAGE_ACCOUNT_ID=$(az storage account show \
  --name YOUR_STORAGE_ACCOUNT \
  --resource-group YOUR_RG \
  --query "id" \
  --output tsv)

# Grant Storage Blob Data Reader for data access
az role assignment create \
  --role "Storage Blob Data Reader" \
  --assignee "${IDENTITY_PRINCIPAL_ID}" \
  --scope "${STORAGE_ACCOUNT_ID}"

# Grant Storage Blob Data Contributor for cache/metadata writes
# Scope to specific containers for least privilege
az role assignment create \
  --role "Storage Blob Data Contributor" \
  --assignee "${IDENTITY_PRINCIPAL_ID}" \
  --scope "${STORAGE_ACCOUNT_ID}/blobServices/default/containers/e6data-cache"

az role assignment create \
  --role "Storage Blob Data Contributor" \
  --assignee "${IDENTITY_PRINCIPAL_ID}" \
  --scope "${STORAGE_ACCOUNT_ID}/blobServices/default/containers/e6data-metadata"

2.3 Storage Access Policy JSON (For Terraform/ARM)

Create e6data-storage-policy.json:

{
  "properties": {
    "roleName": "e6data Storage Access",
    "description": "Least privilege storage access for e6data workloads",
    "type": "CustomRole",
    "permissions": [
      {
        "actions": [],
        "notActions": [],
        "dataActions": [
          "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read",
          "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write",
          "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete"
        ],
        "notDataActions": []
      }
    ],
    "assignableScopes": [
      "/subscriptions/YOUR_SUBSCRIPTION_ID/resourceGroups/YOUR_RG/providers/Microsoft.Storage/storageAccounts/YOUR_STORAGE_ACCOUNT"
    ]
  }
}

2.4 Azure Synapse/SQL Access (If Using Azure Metastore)

# Grant Synapse SQL Administrator (if using Synapse)
az role assignment create \
  --role "Synapse SQL Administrator" \
  --assignee "${IDENTITY_PRINCIPAL_ID}" \
  --scope "/subscriptions/YOUR_SUBSCRIPTION_ID/resourceGroups/YOUR_RG/providers/Microsoft.Synapse/workspaces/YOUR_SYNAPSE_WORKSPACE"

# Or grant Reader role for read-only access
az role assignment create \
  --role "Synapse Artifact User" \
  --assignee "${IDENTITY_PRINCIPAL_ID}" \
  --scope "/subscriptions/YOUR_SUBSCRIPTION_ID/resourceGroups/YOUR_RG/providers/Microsoft.Synapse/workspaces/YOUR_SYNAPSE_WORKSPACE"

2.5 Databricks Unity Catalog Access (If Using Unity Catalog)

For Unity Catalog, you need to configure access in Databricks workspace:

# First, get the Azure AD application ID
export APP_ID=$(az ad app show --id $IDENTITY_CLIENT_ID --query appId -o tsv)

# In Databricks, add the service principal
# This is typically done through Databricks UI or API:
# 1. Go to Admin Console > Service Principals
# 2. Add service principal with Application ID
# 3. Grant access to Unity Catalog

Unity Catalog permissions (set in Databricks): - USE CATALOG on catalog - USE SCHEMA on schemas - SELECT on tables

2.6 GreptimeDB Storage Access

# Create separate identity for GreptimeDB
az identity create \
  --name e6data-greptime-identity \
  --resource-group YOUR_RG \
  --location eastus

GREPTIME_PRINCIPAL_ID=$(az identity show \
  --name e6data-greptime-identity \
  --resource-group YOUR_RG \
  --query "principalId" \
  --output tsv)

# Grant full blob access to GreptimeDB container
az role assignment create \
  --role "Storage Blob Data Contributor" \
  --assignee "${GREPTIME_PRINCIPAL_ID}" \
  --scope "${STORAGE_ACCOUNT_ID}/blobServices/default/containers/greptime-data"

2.7 Complete IAM Setup Script

#!/bin/bash
set -e

SUBSCRIPTION_ID="your-subscription-id"
RESOURCE_GROUP="your-rg"
CLUSTER_NAME="your-cluster"
STORAGE_ACCOUNT="your-storage-account"
LOCATION="eastus"

# Get OIDC issuer
AKS_OIDC_ISSUER=$(az aks show \
  --resource-group $RESOURCE_GROUP \
  --name $CLUSTER_NAME \
  --query "oidcIssuerProfile.issuerUrl" \
  --output tsv)

# Create managed identity
az identity create \
  --name e6data-workspace-identity \
  --resource-group $RESOURCE_GROUP \
  --location $LOCATION

IDENTITY_CLIENT_ID=$(az identity show \
  --name e6data-workspace-identity \
  --resource-group $RESOURCE_GROUP \
  --query "clientId" \
  --output tsv)

IDENTITY_PRINCIPAL_ID=$(az identity show \
  --name e6data-workspace-identity \
  --resource-group $RESOURCE_GROUP \
  --query "principalId" \
  --output tsv)

# Create federated credential
az identity federated-credential create \
  --name e6data-workspace-fedcred \
  --identity-name e6data-workspace-identity \
  --resource-group $RESOURCE_GROUP \
  --issuer "${AKS_OIDC_ISSUER}" \
  --subject "system:serviceaccount:workspace-prod:analytics-prod" \
  --audience "api://AzureADTokenExchange"

# Get storage account ID
STORAGE_ACCOUNT_ID=$(az storage account show \
  --name $STORAGE_ACCOUNT \
  --resource-group $RESOURCE_GROUP \
  --query "id" \
  --output tsv)

# Grant storage access
az role assignment create \
  --role "Storage Blob Data Reader" \
  --assignee "${IDENTITY_PRINCIPAL_ID}" \
  --scope "${STORAGE_ACCOUNT_ID}"

echo "Setup complete. Create Kubernetes SA with:"
echo "  annotation: azure.workload.identity/client-id: $IDENTITY_CLIENT_ID"
echo "  label: azure.workload.identity/use: true"

Azure supports Karpenter for dynamic node provisioning.

3.1 Install Karpenter on AKS

export KARPENTER_VERSION="1.0.0"
export CLUSTER_NAME="your-cluster"
export RESOURCE_GROUP="your-rg"
export LOCATION="eastus"

# Add Karpenter Helm repo
helm repo add karpenter https://charts.karpenter.sh
helm repo update

# Install Karpenter
helm upgrade --install karpenter karpenter/karpenter \
  --namespace karpenter --create-namespace \
  --version ${KARPENTER_VERSION} \
  --set settings.azure.clusterName=${CLUSTER_NAME} \
  --set settings.azure.resourceGroup=${RESOURCE_GROUP} \
  --set settings.azure.subscriptionID=${SUBSCRIPTION_ID} \
  --wait

Azure offers Dpsv5 and Epsv5 series with Ampere Altra ARM64 processors:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: e6data-compute
spec:
  template:
    metadata:
      labels:
        e6data.io/node-type: compute
    spec:
      requirements:
        # Prefer ARM64 (Ampere Altra) for cost savings
        - key: kubernetes.io/arch
          operator: In
          values: ["arm64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.azure.com/sku-family
          operator: In
          values:
            - Dpsv5    # ARM64 general purpose (best price/performance)
            - Epsv5    # ARM64 memory-optimized
            - Dpdsv5   # ARM64 with local SSD
            - Epdsv5   # ARM64 memory-optimized with local SSD
        - key: karpenter.azure.com/sku-cpu
          operator: In
          values:
            - "16"
            - "32"
            - "48"
            - "64"
        # Capacity type
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]

      # Taints for workload isolation
      taints:
        - key: e6data-workspace-name
          value: "prod"
          effect: NoSchedule
        # Azure spot instance toleration (auto-added by e6 operator)
        - key: kubernetes.azure.com/scalesetpriority
          value: spot
          effect: NoSchedule

      # Node expiry
      expireAfter: 720h  # 30 days

      nodeClassRef:
        group: karpenter.azure.com
        kind: AKSNodeClass
        name: e6data-arm64

  # Resource limits
  limits:
    cpu: 2000
    memory: 8000Gi

  # Disruption settings
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 5m
    budgets:
      - nodes: "10%"

3.3 AKSNodeClass

apiVersion: karpenter.azure.com/v1alpha1
kind: AKSNodeClass
metadata:
  name: e6data-arm64
spec:
  # Image reference
  imageFamily: Ubuntu2204

  # Network configuration
  vnetSubnetID: /subscriptions/SUBSCRIPTION_ID/resourceGroups/YOUR_RG/providers/Microsoft.Network/virtualNetworks/YOUR_VNET/subnets/YOUR_SUBNET

  # OS disk configuration
  osDiskSizeGB: 128
  osDiskType: Managed
  osDiskStorageAccountType: Premium_LRS

  # Data disk for caching (optional)
  dataDisks:
    - diskSizeGB: 256
      storageAccountType: Premium_LRS
      lun: 0
      caching: ReadWrite

  # Tags
  tags:
    environment: production
    team: data-platform
    managed-by: karpenter

3.4 NodePool for AMD64/Intel (If ARM64 Not Suitable)

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: e6data-compute-amd64
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.azure.com/sku-family
          operator: In
          values:
            - Dsv5     # Intel general purpose
            - Esv5     # Intel memory-optimized
            - Ddsv5    # Intel with local SSD
            - Edsv5    # Intel memory-optimized with local SSD
            - Dasv5    # AMD general purpose
            - Easv5    # AMD memory-optimized
        - key: karpenter.azure.com/sku-cpu
          operator: In
          values:
            - "16"
            - "32"
            - "48"
            - "64"
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]

      taints:
        - key: e6data-workspace-name
          value: "prod"
          effect: NoSchedule

      nodeClassRef:
        group: karpenter.azure.com
        kind: AKSNodeClass
        name: e6data-amd64

  limits:
    cpu: 1000
    memory: 4000Gi

4. Alternative: AKS Node Auto-Provisioning

AKS has built-in Node Auto-Provisioning (NAP) as an alternative to Karpenter.

4.1 Enable Node Auto-Provisioning

az aks update \
  --resource-group YOUR_RG \
  --name YOUR_CLUSTER \
  --enable-node-provisioning

4.2 NAP vs Karpenter

Feature NAP Karpenter
Management Azure-managed Self-managed
ARM64 support Yes Yes
Customization Limited Full
Spot support Yes Yes
Multi-AZ Automatic Configurable
Learning curve Lower Higher

Recommendation: Use Karpenter for more control, NAP for simpler deployments.


5. Spot Instance Configuration

Azure Spot VMs can reduce costs by up to 90%.

5.1 Configure Spot Priority

In the NodePool spec:

spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]  # Prefer spot, fallback to on-demand

5.2 Spot Eviction Handling

The e6data operator automatically adds spot toleration:

tolerations:
  - key: kubernetes.azure.com/scalesetpriority
    value: spot
    effect: NoSchedule

6. Verification

6.1 Test Workload Identity

# Create test pod
kubectl run test-wi --rm -it --restart=Never \
  --namespace=workspace-prod \
  --serviceaccount=analytics-prod \
  --image=mcr.microsoft.com/azure-cli \
  -- az account show

# Expected: Shows managed identity information

6.2 Test Blob Storage Access

kubectl run test-storage --rm -it --restart=Never \
  --namespace=workspace-prod \
  --serviceaccount=analytics-prod \
  --image=mcr.microsoft.com/azure-cli \
  -- az storage blob list \
    --account-name YOUR_STORAGE_ACCOUNT \
    --container-name YOUR_CONTAINER \
    --auth-mode login \
    --num-results 5

6.3 Verify Karpenter

# Check Karpenter pods
kubectl get pods -n karpenter

# Check NodePools
kubectl get nodepools

# Check AKSNodeClasses
kubectl get aksnodeclasses

# Watch provisioned nodes
kubectl get nodes -l karpenter.sh/nodepool=e6data-compute -w

7. Best Practices

7.1 Security

  • Workload Identity: Always use Workload Identity (never use cluster identity)
  • Least privilege: Only grant required storage containers and scopes
  • Private endpoints: Use private endpoints for storage accounts in production
  • Managed identities: Use separate identities for workspace vs GreptimeDB
  • Azure Policy: Enable Azure Policy for AKS compliance

7.2 Cost Optimization

  • ARM64 (Ampere Altra): 20-40% cheaper than comparable x86 instances
  • Spot VMs: Use for fault-tolerant workloads (up to 90% savings)
  • SKU families: Let Karpenter choose optimal size within family
  • Reserved instances: For predictable workloads
  • Auto-scaling: Enable cluster autoscaler or Karpenter

7.3 Performance

  • VM size: Use 16 vCPU or larger for query executors
  • Memory-optimized: Use E-series for large datasets
  • Premium storage: Use Premium SSD for OS disk
  • Ephemeral OS disk: Consider for stateless workloads
  • Proximity placement groups: For latency-sensitive workloads

7.4 High Availability

  • Availability Zones: Deploy across multiple AZs
  • Regional clusters: Use regional clusters for zone redundancy
  • Zone-redundant storage: Use ZRS for storage accounts