Status Fields and Diagnostics Guide¶

This guide explains how to interpret status fields, phases, and diagnostics across all e6data CRDs.

Quick Reference: kubectl Commands¶

# List all resources with status
kubectl get mds,qs,e6cat,pool -n workspace-prod

# Detailed status for a resource
kubectl describe mds my-metadata -n workspace-prod

# Get specific status field
kubectl get mds my-metadata -n workspace-prod -o jsonpath='{.status.phase}'

# Watch status changes in real-time
kubectl get mds -n workspace-prod -w

# Get full status as YAML
kubectl get mds my-metadata -n workspace-prod -o yaml | yq '.status'

MetadataServices Status¶

Phase Values¶

Phase	Description	Action
`Pending`	CR created, waiting to start reconciliation	Wait for operator
`Creating`	First deployment in progress	Wait ~2-5 minutes
`Running`	All components healthy and serving	Normal operation
`Updating`	Blue-green deployment in progress	Wait ~2-5 minutes
`Failed`	Deployment failed (pods not starting)	Check pod logs
`Degraded`	Partial failure (some pods unhealthy)	Check specific component
`Terminating`	Being deleted, cleanup in progress	Wait for finalizer

Status Fields Explained¶

status:
  phase: Running              # Current lifecycle phase
  ready: true                 # Overall readiness (all components healthy)
  message: "All services running"  # Human-readable status message

  # Blue-green deployment tracking
  activeStrategy: blue        # Currently serving traffic (blue or green)
  pendingStrategy: ""         # Strategy being deployed (empty when stable)
  deploymentPhase: Stable     # Stable|Deploying|Switching|Draining|Cleanup
  activeReleaseVersion: "v1.0.462"  # Current active version

  # Per-component status
  storageDeployment:
    name: my-metadata-storage-blue
    ready: true
    replicas: 2
    readyReplicas: 2          # Should equal replicas when healthy

  secondaryStorageDeployment:  # Only if HA enabled
    name: my-metadata-secondary-storage-blue
    ready: true
    replicas: 1
    readyReplicas: 1

  schemaDeployment:
    name: my-metadata-schema-blue
    ready: true
    replicas: 1
    readyReplicas: 1

  # Release history (last 10 deployments)
  releaseHistory:
    - version: "v1.0.462"
      strategy: blue
      storageTag: "1.0.462-4730d5a"
      schemaTag: "1.0.562-5a58ed2"
      timestamp: "2024-12-09T10:30:00Z"
      status: Active          # Active|Superseded|Failed

Diagnosing Issues¶

# Check which pods are unhealthy
kubectl get pods -l app.kubernetes.io/instance=my-metadata -n workspace-prod

# Check pod events
kubectl describe pod my-metadata-storage-blue-xxx -n workspace-prod

# Check container logs
kubectl logs my-metadata-storage-blue-xxx -n workspace-prod

# Check if readiness probe failing
kubectl get pods -n workspace-prod -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.containerStatuses[*].ready}{"\n"}{end}'

QueryService Status¶

Phase Values¶

Phase	Description	Action
`Waiting`	Waiting for MetadataServices to be ready	Check MDS status
`Deploying`	Initial deployment or update in progress	Wait ~3-5 minutes
`Ready`	All components healthy	Normal operation
`Updating`	Blue-green update in progress	Wait ~3-5 minutes
`Failed`	Deployment failed	Check component logs
`Degraded`	Some components unhealthy	Check specific component

Status Fields Explained¶

status:
  phase: Ready
  ready: true
  message: "Query cluster ready"

  # Blue-green deployment
  activeStrategy: blue
  pendingStrategy: ""
  deploymentPhase: Stable
  activeReleaseVersion: "v1.0.1160"

  # Component statuses
  plannerDeployment:
    ready: true
    replicas: 1
    readyReplicas: 1

  queueDeployment:
    ready: true
    replicas: 1
    readyReplicas: 1

  executorDeployment:
    ready: true
    replicas: 4
    readyReplicas: 4

  # Pool executor status (if using Pool)
  poolExecutorDeployment:
    ready: true
    replicas: 2
    readyReplicas: 2
  poolName: "burst-pool"
  poolNamespace: "e6-pools"
  regularExecutorReplicas: 4    # Executors on regular nodes
  poolExecutorReplicas: 2       # Executors on pool nodes

  # Service endpoints (traffic routing handled by Envoy + xDS)
  plannerService: "my-cluster-planner-blue.workspace-prod.svc:10001"
  queueService: "my-cluster-queue-blue.workspace-prod.svc:10003"

  # Scaling history (last 20 operations)
  scalingHistory:
    - timestamp: "2024-12-09T10:30:00Z"
      component: executor
      oldReplicas: 2
      newReplicas: 4
      trigger: autoscaling-api    # autoscaling-api|kubectl|manual
      strategy: blue

  # Suspension history (last 20 operations)
  suspensionHistory:
    - timestamp: "2024-12-09T08:00:00Z"
      action: suspend             # suspend|resume
      trigger: auto-suspension-api
      strategy: blue
      componentsSuspended: [planner, queue, executor]
      preSuspensionReplicas:
        plannerReplicas: 1
        queueReplicas: 1
        executorReplicas: 4

Diagnosing Issues¶

# Check all QueryService components
kubectl get pods -l app.kubernetes.io/instance=my-cluster -n workspace-prod

# Check Envoy proxy (traffic routing)
kubectl logs -l e6data.io/component=envoy -n workspace-prod --tail=50

# Check planner for query errors
kubectl logs -l app=planner -n workspace-prod --tail=100 | grep -i error

# Check executor health
kubectl get pods -l app=executor -n workspace-prod -o wide

E6Catalog Status¶

Phase Values¶

Phase	Description	Action
`Waiting`	Waiting for MetadataServices	Check MDS status
`Creating`	Catalog registration in progress	Wait ~1-2 minutes
`Ready`	Catalog registered and accessible	Normal operation
`Updating`	Catalog update in progress	Wait ~1-2 minutes
`Refreshing`	Metadata refresh in progress	Wait for completion
`Deleting`	Catalog being removed	Wait for finalizer
`Failed`	Operation failed	Check operationStatus

Status Fields Explained¶

status:
  phase: Ready

  # Storage service being used
  activeStorageService: "my-metadata-storage-blue"
  storageServiceEndpoint: "http://my-metadata-storage-blue.workspace-prod.svc:8081"

  # Catalog information from API
  catalogDetails:
    catalogName: "data-lake"
    catalogType: "GLUE"
    isDefault: true
    status: "ACTIVE"
    createdAt: "2024-12-09T10:00:00Z"
    updatedAt: "2024-12-09T10:30:00Z"

  # Last refresh timestamp
  lastRefreshTime: "2024-12-09T10:30:00Z"

  # Current operation status (populated during async operations)
  operationStatus:
    operation: update           # create|update|refresh
    status: success             # in_progress|success|partial_success|failed
    message: "Catalog updated successfully"
    startTime: "2024-12-09T10:28:00Z"
    lastUpdated: "2024-12-09T10:30:00Z"
    totalDBsRefreshed: 15
    totalTablesRefreshed: 234

    # Only populated on failure or partial success
    diagnosticsFilePath: "s3://bucket/diagnostics/catalog-update-2024-12-09.json"
    failures:
      - type: table
        name: "db1.problematic_table"
        reason: "Schema inference failed: unsupported data type"
      - type: database
        name: "restricted_db"
        reason: "Access denied"

Operation Status Values¶

Status	Description	Meaning
`in_progress`	Operation running	Poll again in 10 seconds
`success`	All items succeeded	Operation complete
`partial_success`	Some items failed	Catalog usable, check failures
`failed`	Operation failed completely	Check error message and logs

Diagnosing Issues¶

# Check operation status
kubectl get e6cat my-catalog -n workspace-prod -o jsonpath='{.status.operationStatus}'

# View failures inline
kubectl get e6cat my-catalog -o jsonpath='{.status.operationStatus.failures}' | jq

# Get diagnostics file path
kubectl get e6cat my-catalog -o jsonpath='{.status.operationStatus.diagnosticsFilePath}'

# Download and view diagnostics file (AWS S3)
aws s3 cp s3://bucket/diagnostics/catalog-update-2024-12-09.json - | jq

# Check storage service logs for catalog operations
kubectl logs -l app.kubernetes.io/name=storage -n workspace-prod | grep -i catalog

Diagnostics File Structure¶

{
  "operation": "update",
  "catalogName": "data-lake",
  "startTime": "2024-12-09T10:28:00Z",
  "endTime": "2024-12-09T10:30:00Z",
  "summary": {
    "totalDatabases": 16,
    "successfulDatabases": 15,
    "failedDatabases": 1,
    "totalTables": 250,
    "successfulTables": 234,
    "failedTables": 16
  },
  "failures": [
    {
      "type": "database",
      "name": "restricted_db",
      "reason": "Access denied: IAM role lacks glue:GetDatabase permission",
      "timestamp": "2024-12-09T10:28:15Z"
    },
    {
      "type": "table",
      "database": "db1",
      "name": "problematic_table",
      "reason": "Schema inference failed: column 'data' has unsupported type 'struct<nested:array<map<string,int>>>'",
      "timestamp": "2024-12-09T10:28:45Z"
    }
  ],
  "successful": [
    {"type": "database", "name": "db1"},
    {"type": "database", "name": "db2"},
    // ...
  ]
}

CatalogRefresh Status¶

Phase Values¶

Phase	Description	Action
`Pending`	Waiting to start (another refresh may be running)	Wait for lock
`Running`	Refresh operation in progress	Wait for completion
`Succeeded`	All databases/tables refreshed successfully	Complete
`PartialSuccess`	Some items failed but catalog is usable	Check failures
`Failed`	Refresh failed completely	Check error message
`TimedOut`	Exceeded configured timeout	Retry with longer timeout

Status Fields Explained¶

status:
  phase: Succeeded

  # Timing
  startTime: "2024-12-09T10:00:00Z"
  completionTime: "2024-12-09T10:05:32Z"

  # Results
  databasesRefreshed: 15
  tablesRefreshed: 234
  message: "Refresh completed in 5m32s"

  # Diagnostics (for partial_success or failed)
  diagnosticsFilePath: "s3://bucket/diagnostics/refresh-2024-12-09.json"
  failures:
    - type: table
      name: "db1.broken_table"
      reason: "Parquet file corrupted"

Diagnosing Issues¶

# Check refresh status
kubectl get catalogrefresh -n workspace-prod

# View detailed status
kubectl describe catalogrefresh refresh-20241209 -n workspace-prod

# Check for failures
kubectl get catalogrefresh refresh-20241209 -o jsonpath='{.status.failures}' | jq

# If timed out, check what was in progress
kubectl logs -l app.kubernetes.io/name=storage -n workspace-prod | grep -i refresh

CatalogRefreshSchedule Status¶

Status Fields Explained¶

status:
  # Next scheduled run
  nextScheduledTime: "2024-12-10T02:00:00Z"

  # Currently running refresh (if any)
  activeRefreshes:
    - name: "nightly-refresh-20241209-020000"
      startTime: "2024-12-09T02:00:00Z"

  # Statistics
  statistics:
    totalRuns: 45
    successfulRuns: 42
    partialSuccessRuns: 2
    failedRuns: 1
    totalDatabasesRefreshed: 675
    totalTablesRefreshed: 10530
    averageRefreshDurationSeconds: 332

  # Recent history (last 5 runs)
  recentHistory:
    - name: "nightly-refresh-20241209-020000"
      phase: Succeeded
      startTime: "2024-12-09T02:00:00Z"
      completionTime: "2024-12-09T02:05:32Z"
      databasesRefreshed: 15
      tablesRefreshed: 234
    - name: "nightly-refresh-20241208-020000"
      phase: PartialSuccess
      startTime: "2024-12-08T02:00:00Z"
      completionTime: "2024-12-08T02:06:15Z"
      databasesRefreshed: 14
      tablesRefreshed: 220

Diagnosing Issues¶

# List all scheduled refreshes
kubectl get catalogrefreshschedule -n workspace-prod

# Check recent runs
kubectl get catalogrefresh -n workspace-prod --sort-by=.metadata.creationTimestamp

# Check schedule statistics
kubectl get crs nightly-refresh -o jsonpath='{.status.statistics}' | jq

# View failed runs
kubectl get catalogrefresh -n workspace-prod -o json | \
  jq '.items[] | select(.status.phase == "Failed") | {name: .metadata.name, message: .status.message}'

Pool Status¶

Phase Values¶

Phase	Description	Action
`Pending`	Pool created, initializing	Wait for Karpenter
`Creating`	Creating NodePool/NodeClass	Wait ~1-2 minutes
`Active`	Pool ready for allocations	Normal operation
`Suspended`	Pool manually suspended	Resume when needed
`Deleting`	Being deleted	Wait for finalizer
`Failed`	NodePool creation failed	Check Karpenter logs

Status Fields Explained¶

status:
  phase: Active

  # Node provisioning
  nodePoolName: "burst-pool-nodepool"
  nodeClassName: "burst-pool-ec2nodeclass"  # AWS

  # Capacity
  availableExecutors: 16
  allocatedExecutors: 6

  # Attached QueryServices
  attachedQueryServices:
    - name: analytics-cluster
      namespace: workspace-prod
      allocatedExecutors: 4
      compatible: true
    - name: reporting-cluster
      namespace: workspace-prod
      allocatedExecutors: 2
      compatible: true

  # Warmup status
  warmupDaemonSets:
    - name: burst-pool-warmup-executor-1-0-2123
      imageTag: "1.0.2123-abe4ff294"
      readyNodes: 4
      desiredNodes: 4

Diagnosing Issues¶

# Check pool status
kubectl get pool -A

# Check Karpenter NodePool
kubectl get nodepool burst-pool-nodepool -o yaml

# Check if nodes are being provisioned
kubectl get nodes -l karpenter.sh/nodepool=burst-pool-nodepool

# Check warmup DaemonSets
kubectl get daemonset -l e6data.io/pool=burst-pool

# Check Karpenter logs for provisioning issues
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter | grep burst-pool

Common Diagnostic Patterns¶

Check All E6 Resources¶

# Get all e6data resources in a namespace
kubectl get mds,qs,e6cat,catalogrefresh,crs,pool,gov -n workspace-prod

# Get all resources across all namespaces
kubectl get mds,qs,e6cat,pool -A

Collect Diagnostic Bundle¶

#!/bin/bash
NAMESPACE=$1
OUTPUT="e6-diagnostic-$(date +%Y%m%d-%H%M%S).yaml"

echo "Collecting diagnostics for namespace: $NAMESPACE"

{
  echo "--- MetadataServices ---"
  kubectl get mds -n $NAMESPACE -o yaml

  echo "--- QueryServices ---"
  kubectl get qs -n $NAMESPACE -o yaml

  echo "--- E6Catalogs ---"
  kubectl get e6cat -n $NAMESPACE -o yaml

  echo "--- Pods ---"
  kubectl get pods -n $NAMESPACE -o yaml

  echo "--- Events ---"
  kubectl get events -n $NAMESPACE --sort-by='.lastTimestamp'

  echo "--- Operator Logs ---"
  kubectl logs -n e6-operator-system -l app=e6-operator --tail=500
} > $OUTPUT

echo "Diagnostics saved to: $OUTPUT"

Watch for Status Changes¶

# Watch all resources
watch -n 2 "kubectl get mds,qs,e6cat -n workspace-prod"

# Watch with custom columns
kubectl get qs -n workspace-prod -w \
  -o custom-columns=NAME:.metadata.name,PHASE:.status.phase,EXECUTORS:.status.executorDeployment.readyReplicas

Troubleshooting Guide - Common issues and solutions
CRD Catalog - Quick reference for all CRDs
MetadataServices - Full MetadataServices documentation
QueryService - Full QueryService documentation
E6Catalog - Full E6Catalog documentation

Status Fields and Diagnostics Guide¶

Quick Reference: kubectl Commands¶

MetadataServices Status¶

Phase Values¶

Status Fields Explained¶

Diagnosing Issues¶

QueryService Status¶

Phase Values¶

Status Fields Explained¶

Diagnosing Issues¶

E6Catalog Status¶

Phase Values¶

Status Fields Explained¶

Operation Status Values¶

Diagnosing Issues¶

Diagnostics File Structure¶

CatalogRefresh Status¶

Phase Values¶

Status Fields Explained¶

Diagnosing Issues¶

CatalogRefreshSchedule Status¶

Status Fields Explained¶

Diagnosing Issues¶

Pool Status¶

Phase Values¶

Status Fields Explained¶

Diagnosing Issues¶

Common Diagnostic Patterns¶

Check All E6 Resources¶

Collect Diagnostic Bundle¶

Watch for Status Changes¶

Related Documentation¶