MetadataServices¶
API Version: e6data.io/v1alpha1 Kind: MetadataServices Short Names: mds, metadata
1. Purpose¶
MetadataServices manages the storage service and schema service components of the e6data analytics platform. These services are responsible for:
- Storage Service: Handles table metadata caching, partition discovery, and data file location resolution across cloud object stores (S3, GCS, Azure Blob)
- Schema Service: Provides schema inference, column statistics, and metadata for query optimization
Create a MetadataServices resource when setting up a new e6data workspace. This is typically the first CRD you deploy after NamespaceConfig, as QueryService and E6Catalog depend on it.
Note: Infrastructure settings (cloud, storage backend, tolerations, node selectors, image pull secrets) are now managed by NamespaceConfig. MetadataServices inherits these settings automatically.
2. High-level Behavior¶
When you create a MetadataServices CR, the operator:
- Inherits infrastructure settings from NamespaceConfig in the same namespace
- Creates ConfigMaps with auto-populated configuration variables (CLOUD, WORKSPACE, E6_BUCKET, etc.)
- Deploys Storage Service (primary, and optionally secondary for HA)
- Deploys Schema Service for schema inference
- Creates Services (ClusterIP) for internal access
- Implements Blue-Green deployment for zero-downtime updates
- Tracks release history (last 10 releases) for rollback support
Prerequisites¶
- NamespaceConfig must exist in the same namespace (provides cloud, storage backend, scheduling config)
Child Resources Created¶
| Resource Type | Name Pattern | Purpose |
|---|---|---|
| Deployment | {name}-storage-{blue\|green} | Storage service pods |
| Deployment | {name}-storage-secondary-{blue\|green} | Secondary storage (if HA enabled) |
| Deployment | {name}-schema-{blue\|green} | Schema service pods |
| ConfigMap | {name}-storage-config-{blue\|green} | Storage config.properties |
| ConfigMap | {name}-schema-config-{blue\|green} | Schema config.properties |
| ConfigMap | {name}-common-config | Active strategy routing |
| Secret | {name}-common-secret | Shared secrets |
| Service | {name}-storage | Storage service endpoint |
| Service | {name}-storage-secondary | Secondary storage endpoint |
| Service | {name}-schema | Schema service endpoint |
| ServiceAccount | {workspace} | Pod identity (if autoCreateRBAC) |
External Dependencies¶
- Object Storage: S3, GCS, or Azure Blob bucket (specified in
storageBackend) - IAM/Workload Identity: Service account with read access to data lake
- Kubernetes: 1.24+ recommended
3. Spec Reference¶
3.1 Top-level Fields¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
workspace | string | No | CR name | Workspace name (used for namespacing and node scheduling) |
tenant | string | Yes | - | Tenant identifier (customer/organization ID) |
releaseVersion | string | No | Auto-generated | Version identifier for tracking releases |
storage | StorageSpec | No | See defaults | Storage service configuration |
schema | SchemaSpec | No | See defaults | Schema service configuration |
podAnnotations | map[string]string | No | {} | Annotations for all pods (Prometheus scraping, etc.) |
governance | GovernanceSpec | No | disabled | Data governance configuration |
Inherited from NamespaceConfig¶
The following fields are inherited from NamespaceConfig and no longer specified in MetadataServices:
| Field | Description |
|---|---|
cloud | Cloud provider (AWS, GCP, AZURE) |
storageBackend | Object storage path (s3a://, gs://, abfs://) |
s3Endpoint | Custom S3 endpoint for S3-compatible storage |
imageRepository | Container registry path |
imagePullSecrets | Secrets for private registries |
tolerations | Pod tolerations for scheduling |
nodeSelector | Node labels for pod placement |
affinity | Advanced scheduling rules |
karpenterNodePool | Karpenter NodePool name |
serviceAccount | ServiceAccount for pods (via serviceAccounts.data) |
3.2 Storage (StorageSpec)¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
imageTag | string | Yes | - | Image tag/version (e.g., 3.0.217) |
replicas | int32 | No | 1 | Number of storage pods |
resources | ResourceSpec | No | - | CPU/Memory limits |
ports | PortSpec | No | See defaults | Service ports |
environmentVariables | map[string]string | No | {} | Container environment variables |
configVariables | map[string]string | No | {} | config.properties entries |
ha | HASpec | No | disabled | High availability (secondary storage) |
Auto-populated Environment Variables: - IS_KUBE=true - POD_NAME, POD_IP, NAMESPACE (from pod metadata) - JAVA_TOOL_OPTIONS (auto-calculated Xmx/Xms at 80% of memory)
Auto-populated Config Variables: - CLOUD, ALIAS, WORKSPACE, E6_BUCKET - STORAGE_SERVICE_HOST, SCHEMA_SERVICE_HOST
3.3 Schema (SchemaSpec)¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
imageTag | string | Yes | - | Image tag/version |
replicas | int32 | No | 1 | Number of schema pods |
resources | ResourceSpec | No | 30Gi memory, 16 CPU | CPU/Memory limits |
ports | PortSpec | No | See defaults | Service ports |
environmentVariables | map[string]string | No | {} | Container environment variables |
configVariables | map[string]string | No | {} | config.properties entries |
3.4 ResourceSpec¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
memory | string | Yes | - | Memory limit (e.g., 8Gi, 30Gi) |
cpu | string | Yes | - | CPU limit (e.g., 2, 16) |
3.5 PortSpec¶
| Field | Type | Default | Description |
|---|---|---|---|
thrift | int32 | 9005 (storage), 9006 (schema) | Thrift RPC port |
web | int32 | 8081 | HTTP API port |
metrics | int32 | 9090 | Prometheus metrics port |
3.6 Governance (GovernanceSpec)¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
enabled | bool | No | false | Enable governance integration |
provider | string | No | ranger | Provider: ranger, unity, lakeformation |
policyPath | string | No | - | Path in bucket for Ranger policies |
unity | UnityGovernanceSpec | No | - | Unity Catalog settings |
lakeFormation | LakeFormationGovernanceSpec | No | - | AWS Lake Formation settings |
filtering | FilteringSpec | No | all enabled | Catalog/schema/table/column filtering |
queryRewriting | QueryRewritingSpec | No | enabled | Row-level filtering and column masking |
3.7 HA (HASpec)¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
enabled | bool | No | false | Deploy secondary storage for HA |
replicas | int32 | No | Primary replicas | Override replica count |
resources | ResourceSpec | No | Primary resources | Override resources |
3.8 ConfigVariables Reference¶
ConfigVariables are written to a config.properties file mounted in the container. These configure storage and schema service behavior.
Auto-Populated Variables (Do Not Override)¶
The operator automatically sets these values. Specifying them in configVariables will be ignored:
| Variable | Auto-Value | Description |
|---|---|---|
CLOUD | spec.cloud or auto-detected | Cloud provider (AWS/GCP/AZURE) |
ALIAS | spec.workspace | Alias (same as workspace) |
WORKSPACE | spec.workspace | Workspace name |
E6_BUCKET | spec.storageBackend | Object storage path |
STORAGE_SERVICE_HOST | {name}-storage | Storage service hostname |
SCHEMA_SERVICE_HOST | {name}-schema | Schema service hostname |
Common Storage ConfigVariables¶
| Variable | Type | Default | Description |
|---|---|---|---|
ENABLE_TABLES_BACKGROUND_REFRESH | bool | true | Enable background table refresh |
BACKGROUND_REFRESH_TABLE_ACCESS_WINDOW_MINUTES | int | 60 | Window for recently accessed tables |
MAX_TABLES_TO_REFRESH | int | 100 | Max tables per background refresh cycle |
REFRESH_TIMEOUT_SECONDS | int | 1800 | Refresh operation timeout (30 min) |
ENABLE_RANGER_AUTH | bool | false | Enable Ranger authorization |
ENABLE_SCHEMA_AUTHZ | bool | false | Enable schema authorization |
PERMISSIONS_REFRESH_INTERVAL_SECONDS | int | 30 | Permissions cache refresh interval |
DELTA_READER_THREADPOOL_SIZE | int | 1000 | Delta reader thread pool size |
DELTA_SKIP_TABLE_UUID_CHECK | bool | true | Skip Delta table UUID validation |
DELTA_TABLE_PARTITION_SOFT_REFRESH_DURATION_SECONDS | int | 300 | Delta partition soft refresh |
ENABLE_ICEBERG_POSITIONAL_DELETES | bool | false | Support Iceberg positional deletes |
INITIALIZE_TABLES_WITH_PARTITIONS_ON_STARTUP | bool | true | Load partitions on startup |
IS_128BIT_NUMERIC_SUPPORTED | bool | true | Support 128-bit decimals |
ENABLE_V2 | bool | true | Enable V2 API |
FETCH_PERMISSION_FROM_UNITY_CATALOG | bool | false | Fetch permissions from Unity |
Common Schema ConfigVariables¶
| Variable | Type | Default | Description |
|---|---|---|---|
SCHEMA_CACHE_TTL_SECONDS | int | 3600 | Schema cache time-to-live |
MAX_SCHEMA_CACHE_SIZE | int | 10000 | Maximum schemas in cache |
ENABLE_COLUMN_STATISTICS | bool | true | Collect column statistics |
STATISTICS_SAMPLE_ROWS | int | 10000 | Rows to sample for statistics |
3.9 EnvironmentVariables Reference¶
EnvironmentVariables are set as container environment variables.
Auto-Populated Variables (Do Not Override)¶
The operator automatically sets these values from pod metadata:
| Variable | Auto-Value | Description |
|---|---|---|
IS_KUBE | "true" | Indicates Kubernetes environment |
POD_NAME | Pod metadata | Current pod name |
POD_IP | Pod status | Current pod IP |
NAMESPACE | Pod metadata | Current namespace |
JAVA_TOOL_OPTIONS | Auto-calculated | JVM options (80% of memory) |
Note on JAVA_TOOL_OPTIONS: The operator automatically calculates JVM heap settings based on container memory: - Xmx and Xms are set to 80% of resources.memory - Example: For 30Gi memory → -Xmx24G -Xms24G - Includes: -Djava.io.tmpdir=/tmp, OOM exit settings, G1GC config, JMX agent
Common EnvironmentVariables¶
| Variable | Type | Default | Description |
|---|---|---|---|
E6_LOGGING_LEVEL | string | E6_INFO | Log level: E6_DEBUG, E6_INFO, E6_WARN, E6_ERROR |
LOG_FORMAT | string | json | Log format: json, text |
TZ | string | UTC | Timezone |
ENABLE_JMX | bool | true | Enable JMX metrics |
Example: Custom Configuration¶
storage:
imageTag: "3.0.217"
resources:
memory: "30Gi"
cpu: "16"
configVariables:
ENABLE_TABLES_BACKGROUND_REFRESH: "true"
BACKGROUND_REFRESH_TABLE_ACCESS_WINDOW_MINUTES: "120"
MAX_TABLES_TO_REFRESH: "200"
ENABLE_RANGER_AUTH: "true"
ENABLE_SCHEMA_AUTHZ: "true"
environmentVariables:
E6_LOGGING_LEVEL: "E6_DEBUG"
schema:
imageTag: "3.0.217"
resources:
memory: "30Gi"
cpu: "16"
configVariables:
SCHEMA_CACHE_TTL_SECONDS: "7200"
ENABLE_COLUMN_STATISTICS: "true"
environmentVariables:
E6_LOGGING_LEVEL: "E6_INFO"
4. Example Manifests¶
Important: Before creating MetadataServices, ensure a NamespaceConfig exists in the same namespace with cloud, storage backend, and scheduling settings.
4.1 Minimal Example¶
# First, create NamespaceConfig
apiVersion: e6data.io/v1alpha1
kind: NamespaceConfig
metadata:
name: config
namespace: workspace-analytics-prod
spec:
storageBackend: s3a://acme-data-lake
---
# Then, create MetadataServices
apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
name: analytics-prod
namespace: workspace-analytics-prod
spec:
workspace: analytics-prod
tenant: acme-corp
storage:
imageTag: "3.0.217"
resources:
memory: "8Gi"
cpu: "4"
schema:
imageTag: "3.0.217"
resources:
memory: "16Gi"
cpu: "8"
4.2 Production Example (Full Configuration)¶
# NamespaceConfig with all infrastructure settings
apiVersion: e6data.io/v1alpha1
kind: NamespaceConfig
metadata:
name: config
namespace: workspace-analytics-prod
spec:
cloud: AWS
storageBackend: s3a://acme-data-lake-prod
imageRepository: us-docker.pkg.dev/e6data-analytics/e6-engine
imagePullSecrets:
- e6data-registry-secret
serviceAccounts:
data: analytics-prod-sa
karpenterNodePool: metadata-services
tolerations:
- key: "e6data-workspace-name"
operator: "Equal"
value: "analytics-prod"
effect: "NoSchedule"
nodeSelector:
e6data-workspace-name: analytics-prod
---
# MetadataServices - now simplified (inherits from NamespaceConfig)
apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
name: analytics-prod
namespace: workspace-analytics-prod
labels:
e6data.io/workspace: analytics-prod
e6data.io/environment: production
spec:
workspace: analytics-prod
tenant: acme-corp
# Storage service configuration
storage:
imageTag: "3.0.217"
replicas: 2
resources:
memory: "30Gi"
cpu: "16"
ports:
thrift: 9005
web: 8081
metrics: 9090
environmentVariables:
E6_LOGGING_LEVEL: "E6_INFO"
configVariables:
ENABLE_TABLES_BACKGROUND_REFRESH: "true"
BACKGROUND_REFRESH_TABLE_ACCESS_WINDOW_MINUTES: "60"
MAX_TABLES_TO_REFRESH: "100"
REFRESH_TIMEOUT_SECONDS: "1800"
# High availability with secondary storage
ha:
enabled: true
replicas: 2
resources:
memory: "30Gi"
cpu: "16"
# Schema service configuration
schema:
imageTag: "3.0.217"
replicas: 2
resources:
memory: "30Gi"
cpu: "16"
# Governance configuration
governance:
enabled: true
provider: ranger
policyPath: "governance/policies"
filtering:
catalog: true
schema: true
table: true
column: true
queryRewriting:
enabled: true
# Pod annotations for Prometheus scraping
podAnnotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8081"
prometheus.io/path: "/metrics"
4.3 S3-Compatible Storage (Linode/Wasabi/MinIO)¶
# NamespaceConfig for S3-compatible storage
apiVersion: e6data.io/v1alpha1
kind: NamespaceConfig
metadata:
name: config
namespace: workspace-analytics
spec:
cloud: AWS # Use AWS for S3-compatible storage
storageBackend: s3a://my-bucket
s3Endpoint: https://us-east-1.linodeobjects.com
---
apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
name: analytics-linode
namespace: workspace-analytics
spec:
workspace: analytics-linode
tenant: startup-corp
storage:
imageTag: "3.0.217"
resources:
memory: "8Gi"
cpu: "4"
schema:
imageTag: "3.0.217"
resources:
memory: "16Gi"
cpu: "8"
5. Status & Lifecycle¶
5.1 Status Fields¶
| Field | Type | Description |
|---|---|---|
phase | string | Current lifecycle phase |
message | string | Human-readable status message |
ready | bool | true when all components are ready |
storageDeployment | DeploymentStatus | Storage deployment status |
secondaryStorageDeployment | DeploymentStatus | Secondary storage status (if HA enabled) |
schemaDeployment | DeploymentStatus | Schema deployment status |
observedGeneration | int64 | Last observed spec generation |
activeStrategy | string | Current active deployment (blue or green) |
activeReleaseVersion | string | Currently running version |
pendingStrategy | string | Deployment being prepared |
deploymentPhase | string | Blue-green phase: Stable, Deploying, Switching, Draining, Cleanup |
releaseHistory | []ReleaseRecord | Last 10 releases for rollback |
5.2 Phase Values¶
| Phase | Description |
|---|---|
Pending | Resource created, waiting to reconcile |
Creating | Initial deployment in progress |
Running | All components healthy and serving |
Updating | Blue-green update in progress |
Failed | Deployment failed (check conditions) |
Terminating | Deletion in progress |
Degraded | Partially healthy (some pods unhealthy) |
5.3 Deployment Phases (Blue-Green)¶
| Phase | Description |
|---|---|
Stable | Single strategy active, no changes pending |
Deploying | New strategy being deployed |
Switching | Traffic switching to new strategy |
Cleanup | Old strategy resources being removed |
5.4 Conditions¶
| Type | Description |
|---|---|
Ready | All deployments are ready |
StorageReady | Storage service is healthy |
SchemaReady | Schema service is healthy |
SecondaryStorageReady | Secondary storage is healthy (if HA) |
Progressing | Reconciliation in progress |
Available | At least one pod is available |
6. Related Resources¶
Dependencies¶
| CRD | Relationship |
|---|---|
| NamespaceConfig | Required - provides cloud, storage, scheduling configuration |
CRDs that Reference MetadataServices¶
| CRD | Reference Field | Relationship |
|---|---|---|
| E6Catalog | spec.metadataServicesRef | Discovers storage service endpoint |
| QueryService | Same namespace | Uses same NamespaceConfig settings |
Labels Applied to Child Resources¶
app.kubernetes.io/name: {storage|schema|storage-secondary}
app.kubernetes.io/instance: {cr-name}
app.kubernetes.io/component: {storage|schema}
app.kubernetes.io/managed-by: e6-operator
e6data.io/workspace: {workspace}
e6data.io/strategy: {blue|green}
7. Troubleshooting¶
7.1 Common Issues¶
Storage Service CrashLoopBackOff¶
Symptoms:
$ kubectl get pods -l app.kubernetes.io/name=storage
NAME READY STATUS RESTARTS
analytics-prod-storage-blue-xxx 0/1 CrashLoopBackOff 5
Possible Causes: 1. Invalid storageBackend path 2. Missing IAM permissions for S3/GCS/Azure 3. Incorrect s3Endpoint for S3-compatible storage 4. Java heap too large for container memory
Suggested Checks:
# Check pod logs
kubectl logs -l app.kubernetes.io/name=storage --tail=100
# Verify storage backend access
kubectl exec -it analytics-prod-storage-blue-xxx -- aws s3 ls s3://bucket
# Check Java options
kubectl get cm analytics-prod-storage-config-blue -o yaml | grep JAVA_TOOL_OPTIONS
Pods Stuck in Pending¶
Symptoms:
Possible Causes: 1. Insufficient cluster resources 2. NodeSelector/tolerations don't match any nodes 3. Karpenter provisioner not ready
Suggested Checks:
# Check pod events
kubectl describe pod analytics-prod-storage-blue-xxx
# Check node availability
kubectl get nodes -l e6data-workspace-name=analytics-prod
# Check Karpenter provisioner
kubectl get nodepools
Blue-Green Stuck in Deploying¶
Symptoms:
Possible Causes: 1. New deployment pods failing health checks 2. Insufficient resources for new strategy 3. Image pull failures
Suggested Checks:
# Check both strategies
kubectl get deploy -l e6data.io/workspace=analytics-prod
# Check pending strategy pods
kubectl get pods -l e6data.io/strategy=green
# Force rollback via annotation (if needed)
kubectl annotate metadataservices analytics-prod e6data.io/rollback-to=previous
7.2 Useful Commands¶
# Get MetadataServices status
kubectl get mds analytics-prod -o yaml
# Watch deployment progress
kubectl get mds -w
# Check all resources created by operator
kubectl get all -l app.kubernetes.io/instance=analytics-prod
# View release history
kubectl get mds analytics-prod -o jsonpath='{.status.releaseHistory[*].version}'
# Trigger manual rollback
kubectl annotate mds analytics-prod e6data.io/rollback-to=v1.0.0
# Check operator logs
kubectl logs -n e6-operator-system -l app=e6-operator --tail=200
8. Validation Webhooks¶
MetadataServices has 30+ validation checks at apply time:
| Check | Error Message |
|---|---|
| Missing workspace | spec.workspace is required |
| Missing tenant | spec.tenant is required |
| Invalid cloud | spec.cloud must be AWS, GCP, or AZURE |
| Invalid storageBackend | spec.storageBackend must start with s3a://, gs://, or abfs:// |
| StorageBackend/cloud mismatch | spec.storageBackend (s3a://) requires cloud=AWS |
| Missing imageTag | spec.storage.imageTag is required |
| "latest" tag in production | WARNING: Using "latest" tag is not recommended |
| Minimum resources | memory must be at least 1Gi |
| Port conflicts | thrift, web, and metrics ports must be different |
| Immutable fields on update | spec.workspace cannot be changed |