Azure AKS Prerequisites¶
This guide covers all Azure-specific prerequisites for deploying the e6data Kubernetes Operator on Azure Kubernetes Service (AKS).
Quick Reference¶
| Requirement | Status | Notes |
|---|---|---|
| AKS 1.24+ | Required | Kubernetes cluster |
| Workload Identity | Required | For Azure AD authentication |
| Azure Blob Storage | Required | Data lake storage |
| Azure RBAC | Required | Least-privilege access |
| Karpenter 1.0+ | Recommended | Dynamic node provisioning (ARM64 Ampere Altra) |
| Azure Synapse | Optional | If using Synapse metastore |
| Databricks Unity | Optional | If using Unity Catalog |
1. Workload Identity Setup¶
Azure Workload Identity is the recommended authentication method for AKS workloads.
1.1 Enable Workload Identity on AKS Cluster¶
# For new cluster
az aks create \
--resource-group YOUR_RG \
--name YOUR_CLUSTER \
--enable-oidc-issuer \
--enable-workload-identity \
--location eastus
# For existing cluster
az aks update \
--resource-group YOUR_RG \
--name YOUR_CLUSTER \
--enable-oidc-issuer \
--enable-workload-identity
1.2 Get OIDC Issuer URL¶
export AKS_OIDC_ISSUER=$(az aks show \
--resource-group YOUR_RG \
--name YOUR_CLUSTER \
--query "oidcIssuerProfile.issuerUrl" \
--output tsv)
echo "OIDC Issuer: $AKS_OIDC_ISSUER"
1.3 Create Managed Identity¶
# Create managed identity
az identity create \
--name e6data-workspace-identity \
--resource-group YOUR_RG \
--location eastus
# Get identity details
export IDENTITY_CLIENT_ID=$(az identity show \
--name e6data-workspace-identity \
--resource-group YOUR_RG \
--query "clientId" \
--output tsv)
export IDENTITY_PRINCIPAL_ID=$(az identity show \
--name e6data-workspace-identity \
--resource-group YOUR_RG \
--query "principalId" \
--output tsv)
echo "Client ID: $IDENTITY_CLIENT_ID"
echo "Principal ID: $IDENTITY_PRINCIPAL_ID"
1.4 Create Federated Credential¶
# Create federated credential for Kubernetes service account
az identity federated-credential create \
--name e6data-workspace-fedcred \
--identity-name e6data-workspace-identity \
--resource-group YOUR_RG \
--issuer "${AKS_OIDC_ISSUER}" \
--subject "system:serviceaccount:workspace-prod:analytics-prod" \
--audience "api://AzureADTokenExchange"
For multiple namespaces:
# Create federated credentials for each namespace
for NS in workspace-prod workspace-staging workspace-dev; do
SA_NAME="analytics-${NS#workspace-}"
az identity federated-credential create \
--name "e6data-${NS}-fedcred" \
--identity-name e6data-workspace-identity \
--resource-group YOUR_RG \
--issuer "${AKS_OIDC_ISSUER}" \
--subject "system:serviceaccount:${NS}:${SA_NAME}" \
--audience "api://AzureADTokenExchange"
done
1.5 Create and Annotate Kubernetes ServiceAccount¶
apiVersion: v1
kind: ServiceAccount
metadata:
name: analytics-prod
namespace: workspace-prod
annotations:
azure.workload.identity/client-id: "${IDENTITY_CLIENT_ID}"
labels:
azure.workload.identity/use: "true"
Apply with kubectl:
kubectl apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
name: analytics-prod
namespace: workspace-prod
annotations:
azure.workload.identity/client-id: "${IDENTITY_CLIENT_ID}"
labels:
azure.workload.identity/use: "true"
EOF
2. Azure RBAC (Least Privilege)¶
2.1 Storage Blob Access (Required)¶
Create custom role for blob storage:
# Create custom role definition
cat > e6data-storage-role.json <<EOF
{
"Name": "e6data Storage Access",
"Description": "Least privilege blob storage access for e6data workloads",
"Actions": [],
"NotActions": [],
"DataActions": [
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/move/action",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/add/action"
],
"NotDataActions": [],
"AssignableScopes": [
"/subscriptions/YOUR_SUBSCRIPTION_ID"
]
}
EOF
# Create the custom role
az role definition create --role-definition e6data-storage-role.json
2.2 Assign Storage Roles¶
export STORAGE_ACCOUNT_ID=$(az storage account show \
--name YOUR_STORAGE_ACCOUNT \
--resource-group YOUR_RG \
--query "id" \
--output tsv)
# Grant Storage Blob Data Reader for data access
az role assignment create \
--role "Storage Blob Data Reader" \
--assignee "${IDENTITY_PRINCIPAL_ID}" \
--scope "${STORAGE_ACCOUNT_ID}"
# Grant Storage Blob Data Contributor for cache/metadata writes
# Scope to specific containers for least privilege
az role assignment create \
--role "Storage Blob Data Contributor" \
--assignee "${IDENTITY_PRINCIPAL_ID}" \
--scope "${STORAGE_ACCOUNT_ID}/blobServices/default/containers/e6data-cache"
az role assignment create \
--role "Storage Blob Data Contributor" \
--assignee "${IDENTITY_PRINCIPAL_ID}" \
--scope "${STORAGE_ACCOUNT_ID}/blobServices/default/containers/e6data-metadata"
2.3 Storage Access Policy JSON (For Terraform/ARM)¶
Create e6data-storage-policy.json:
{
"properties": {
"roleName": "e6data Storage Access",
"description": "Least privilege storage access for e6data workloads",
"type": "CustomRole",
"permissions": [
{
"actions": [],
"notActions": [],
"dataActions": [
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete"
],
"notDataActions": []
}
],
"assignableScopes": [
"/subscriptions/YOUR_SUBSCRIPTION_ID/resourceGroups/YOUR_RG/providers/Microsoft.Storage/storageAccounts/YOUR_STORAGE_ACCOUNT"
]
}
}
2.4 Azure Synapse/SQL Access (If Using Azure Metastore)¶
# Grant Synapse SQL Administrator (if using Synapse)
az role assignment create \
--role "Synapse SQL Administrator" \
--assignee "${IDENTITY_PRINCIPAL_ID}" \
--scope "/subscriptions/YOUR_SUBSCRIPTION_ID/resourceGroups/YOUR_RG/providers/Microsoft.Synapse/workspaces/YOUR_SYNAPSE_WORKSPACE"
# Or grant Reader role for read-only access
az role assignment create \
--role "Synapse Artifact User" \
--assignee "${IDENTITY_PRINCIPAL_ID}" \
--scope "/subscriptions/YOUR_SUBSCRIPTION_ID/resourceGroups/YOUR_RG/providers/Microsoft.Synapse/workspaces/YOUR_SYNAPSE_WORKSPACE"
2.5 Databricks Unity Catalog Access (If Using Unity Catalog)¶
For Unity Catalog, you need to configure access in Databricks workspace:
# First, get the Azure AD application ID
export APP_ID=$(az ad app show --id $IDENTITY_CLIENT_ID --query appId -o tsv)
# In Databricks, add the service principal
# This is typically done through Databricks UI or API:
# 1. Go to Admin Console > Service Principals
# 2. Add service principal with Application ID
# 3. Grant access to Unity Catalog
Unity Catalog permissions (set in Databricks): - USE CATALOG on catalog - USE SCHEMA on schemas - SELECT on tables
2.6 GreptimeDB Storage Access¶
# Create separate identity for GreptimeDB
az identity create \
--name e6data-greptime-identity \
--resource-group YOUR_RG \
--location eastus
GREPTIME_PRINCIPAL_ID=$(az identity show \
--name e6data-greptime-identity \
--resource-group YOUR_RG \
--query "principalId" \
--output tsv)
# Grant full blob access to GreptimeDB container
az role assignment create \
--role "Storage Blob Data Contributor" \
--assignee "${GREPTIME_PRINCIPAL_ID}" \
--scope "${STORAGE_ACCOUNT_ID}/blobServices/default/containers/greptime-data"
2.7 Complete IAM Setup Script¶
#!/bin/bash
set -e
SUBSCRIPTION_ID="your-subscription-id"
RESOURCE_GROUP="your-rg"
CLUSTER_NAME="your-cluster"
STORAGE_ACCOUNT="your-storage-account"
LOCATION="eastus"
# Get OIDC issuer
AKS_OIDC_ISSUER=$(az aks show \
--resource-group $RESOURCE_GROUP \
--name $CLUSTER_NAME \
--query "oidcIssuerProfile.issuerUrl" \
--output tsv)
# Create managed identity
az identity create \
--name e6data-workspace-identity \
--resource-group $RESOURCE_GROUP \
--location $LOCATION
IDENTITY_CLIENT_ID=$(az identity show \
--name e6data-workspace-identity \
--resource-group $RESOURCE_GROUP \
--query "clientId" \
--output tsv)
IDENTITY_PRINCIPAL_ID=$(az identity show \
--name e6data-workspace-identity \
--resource-group $RESOURCE_GROUP \
--query "principalId" \
--output tsv)
# Create federated credential
az identity federated-credential create \
--name e6data-workspace-fedcred \
--identity-name e6data-workspace-identity \
--resource-group $RESOURCE_GROUP \
--issuer "${AKS_OIDC_ISSUER}" \
--subject "system:serviceaccount:workspace-prod:analytics-prod" \
--audience "api://AzureADTokenExchange"
# Get storage account ID
STORAGE_ACCOUNT_ID=$(az storage account show \
--name $STORAGE_ACCOUNT \
--resource-group $RESOURCE_GROUP \
--query "id" \
--output tsv)
# Grant storage access
az role assignment create \
--role "Storage Blob Data Reader" \
--assignee "${IDENTITY_PRINCIPAL_ID}" \
--scope "${STORAGE_ACCOUNT_ID}"
echo "Setup complete. Create Kubernetes SA with:"
echo " annotation: azure.workload.identity/client-id: $IDENTITY_CLIENT_ID"
echo " label: azure.workload.identity/use: true"
3. Karpenter Setup (Recommended)¶
Azure supports Karpenter for dynamic node provisioning.
3.1 Install Karpenter on AKS¶
export KARPENTER_VERSION="1.0.0"
export CLUSTER_NAME="your-cluster"
export RESOURCE_GROUP="your-rg"
export LOCATION="eastus"
# Add Karpenter Helm repo
helm repo add karpenter https://charts.karpenter.sh
helm repo update
# Install Karpenter
helm upgrade --install karpenter karpenter/karpenter \
--namespace karpenter --create-namespace \
--version ${KARPENTER_VERSION} \
--set settings.azure.clusterName=${CLUSTER_NAME} \
--set settings.azure.resourceGroup=${RESOURCE_GROUP} \
--set settings.azure.subscriptionID=${SUBSCRIPTION_ID} \
--wait
3.2 NodePool for e6data (ARM64 Ampere Altra - Recommended)¶
Azure offers Dpsv5 and Epsv5 series with Ampere Altra ARM64 processors:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: e6data-compute
spec:
template:
metadata:
labels:
e6data.io/node-type: compute
spec:
requirements:
# Prefer ARM64 (Ampere Altra) for cost savings
- key: kubernetes.io/arch
operator: In
values: ["arm64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.azure.com/sku-family
operator: In
values:
- Dpsv5 # ARM64 general purpose (best price/performance)
- Epsv5 # ARM64 memory-optimized
- Dpdsv5 # ARM64 with local SSD
- Epdsv5 # ARM64 memory-optimized with local SSD
- key: karpenter.azure.com/sku-cpu
operator: In
values:
- "16"
- "32"
- "48"
- "64"
# Capacity type
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
# Taints for workload isolation
taints:
- key: e6data-workspace-name
value: "prod"
effect: NoSchedule
# Azure spot instance toleration (auto-added by e6 operator)
- key: kubernetes.azure.com/scalesetpriority
value: spot
effect: NoSchedule
# Node expiry
expireAfter: 720h # 30 days
nodeClassRef:
group: karpenter.azure.com
kind: AKSNodeClass
name: e6data-arm64
# Resource limits
limits:
cpu: 2000
memory: 8000Gi
# Disruption settings
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 5m
budgets:
- nodes: "10%"
3.3 AKSNodeClass¶
apiVersion: karpenter.azure.com/v1alpha1
kind: AKSNodeClass
metadata:
name: e6data-arm64
spec:
# Image reference
imageFamily: Ubuntu2204
# Network configuration
vnetSubnetID: /subscriptions/SUBSCRIPTION_ID/resourceGroups/YOUR_RG/providers/Microsoft.Network/virtualNetworks/YOUR_VNET/subnets/YOUR_SUBNET
# OS disk configuration
osDiskSizeGB: 128
osDiskType: Managed
osDiskStorageAccountType: Premium_LRS
# Data disk for caching (optional)
dataDisks:
- diskSizeGB: 256
storageAccountType: Premium_LRS
lun: 0
caching: ReadWrite
# Tags
tags:
environment: production
team: data-platform
managed-by: karpenter
3.4 NodePool for AMD64/Intel (If ARM64 Not Suitable)¶
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: e6data-compute-amd64
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.azure.com/sku-family
operator: In
values:
- Dsv5 # Intel general purpose
- Esv5 # Intel memory-optimized
- Ddsv5 # Intel with local SSD
- Edsv5 # Intel memory-optimized with local SSD
- Dasv5 # AMD general purpose
- Easv5 # AMD memory-optimized
- key: karpenter.azure.com/sku-cpu
operator: In
values:
- "16"
- "32"
- "48"
- "64"
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
taints:
- key: e6data-workspace-name
value: "prod"
effect: NoSchedule
nodeClassRef:
group: karpenter.azure.com
kind: AKSNodeClass
name: e6data-amd64
limits:
cpu: 1000
memory: 4000Gi
4. Alternative: AKS Node Auto-Provisioning¶
AKS has built-in Node Auto-Provisioning (NAP) as an alternative to Karpenter.
4.1 Enable Node Auto-Provisioning¶
4.2 NAP vs Karpenter¶
| Feature | NAP | Karpenter |
|---|---|---|
| Management | Azure-managed | Self-managed |
| ARM64 support | Yes | Yes |
| Customization | Limited | Full |
| Spot support | Yes | Yes |
| Multi-AZ | Automatic | Configurable |
| Learning curve | Lower | Higher |
Recommendation: Use Karpenter for more control, NAP for simpler deployments.
5. Spot Instance Configuration¶
Azure Spot VMs can reduce costs by up to 90%.
5.1 Configure Spot Priority¶
In the NodePool spec:
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"] # Prefer spot, fallback to on-demand
5.2 Spot Eviction Handling¶
The e6data operator automatically adds spot toleration:
6. Verification¶
6.1 Test Workload Identity¶
# Create test pod
kubectl run test-wi --rm -it --restart=Never \
--namespace=workspace-prod \
--serviceaccount=analytics-prod \
--image=mcr.microsoft.com/azure-cli \
-- az account show
# Expected: Shows managed identity information
6.2 Test Blob Storage Access¶
kubectl run test-storage --rm -it --restart=Never \
--namespace=workspace-prod \
--serviceaccount=analytics-prod \
--image=mcr.microsoft.com/azure-cli \
-- az storage blob list \
--account-name YOUR_STORAGE_ACCOUNT \
--container-name YOUR_CONTAINER \
--auth-mode login \
--num-results 5
6.3 Verify Karpenter¶
# Check Karpenter pods
kubectl get pods -n karpenter
# Check NodePools
kubectl get nodepools
# Check AKSNodeClasses
kubectl get aksnodeclasses
# Watch provisioned nodes
kubectl get nodes -l karpenter.sh/nodepool=e6data-compute -w
7. Best Practices¶
7.1 Security¶
- Workload Identity: Always use Workload Identity (never use cluster identity)
- Least privilege: Only grant required storage containers and scopes
- Private endpoints: Use private endpoints for storage accounts in production
- Managed identities: Use separate identities for workspace vs GreptimeDB
- Azure Policy: Enable Azure Policy for AKS compliance
7.2 Cost Optimization¶
- ARM64 (Ampere Altra): 20-40% cheaper than comparable x86 instances
- Spot VMs: Use for fault-tolerant workloads (up to 90% savings)
- SKU families: Let Karpenter choose optimal size within family
- Reserved instances: For predictable workloads
- Auto-scaling: Enable cluster autoscaler or Karpenter
7.3 Performance¶
- VM size: Use 16 vCPU or larger for query executors
- Memory-optimized: Use E-series for large datasets
- Premium storage: Use Premium SSD for OS disk
- Ephemeral OS disk: Consider for stateless workloads
- Proximity placement groups: For latency-sensitive workloads
7.4 High Availability¶
- Availability Zones: Deploy across multiple AZs
- Regional clusters: Use regional clusters for zone redundancy
- Zone-redundant storage: Use ZRS for storage accounts