Skip to content

GreptimeDBCluster

API Version: e6data.io/v1alpha1 Kind: GreptimeDBCluster Short Names: gdb, greptime


1. Purpose

GreptimeDBCluster deploys a GreptimeDB time-series database cluster for storing and querying operational metrics, including:

  • Query history: Real-time query event storage from QueryService
  • Metrics storage: Prometheus-compatible metrics backend
  • Time-series analytics: SQL and PromQL query support

GreptimeDB provides multiple query interfaces: - HTTP SQL API - gRPC - MySQL wire protocol - PostgreSQL wire protocol - Prometheus remote read/write


2. High-level Behavior

When you create a GreptimeDBCluster CR, the operator:

  1. Detects cloud provider for storage backend configuration
  2. Deploys etcd for metadata persistence
  3. Deploys Meta service for cluster coordination
  4. Deploys Datanodes for data storage and processing
  5. Deploys Frontend for query routing
  6. Optionally deploys Grafana for visualization
  7. Creates Services for all endpoints

Child Resources Created

Resource Type Name Pattern Purpose
StatefulSet {name}-etcd Etcd cluster
Deployment {name}-meta Meta coordination service
StatefulSet {name}-datanode Data storage and processing
Deployment {name}-frontend Query endpoint
Deployment {name}-grafana Visualization (optional)
Service {name}-frontend Query endpoints
Service {name}-meta Internal meta service
Service {name}-etcd Internal etcd
Service {name}-grafana Grafana UI
ConfigMap {name}-grafana-datasource GreptimeDB datasource
PVC {name}-etcd-data-* Etcd storage
PVC {name}-datanode-data-* Datanode cache
ServiceAccount greptime Pod identity

Architecture

                    ┌─────────────────────┐
                    │      Frontend       │
                    │  (HTTP/gRPC/MySQL/  │
                    │   PostgreSQL/Prom)  │
                    └──────────┬──────────┘
              ┌────────────────┼────────────────┐
              │                │                │
              ▼                ▼                ▼
       ┌──────────┐     ┌──────────┐     ┌──────────┐
       │ Datanode │     │ Datanode │     │ Datanode │
       └────┬─────┘     └────┬─────┘     └────┬─────┘
            │                │                │
            └────────────────┼────────────────┘
                    ┌────────┴────────┐
                    │  Object Storage │
                    │  (S3/GCS/Azure) │
                    └─────────────────┘
                    ┌────────┴────────┐
                    │      Meta       │
                    │  (Coordination) │
                    └────────┬────────┘
                    ┌────────┴────────┐
                    │      Etcd       │
                    │  (Metadata)     │
                    └─────────────────┘

3. Spec Reference

3.1 Top-level Fields

Field Type Required Default Description
storage GreptimeStorageSpec Yes - Object storage configuration
frontend GreptimeFrontendSpec No See defaults Frontend configuration
datanode GreptimeDatanodeSpec No See defaults Datanode configuration
meta GreptimeMetaSpec No See defaults Meta service configuration
etcd GreptimeEtcdSpec No See defaults Etcd configuration
grafana GrafanaSpec No Enabled Grafana configuration
image GreptimeImageSpec No See defaults Common image settings
cloud string No Auto-detected Cloud provider
tolerations []Toleration No [] Pod tolerations
nodeSelector map[string]string No {} Node selection
affinity Affinity No - Affinity rules
karpenterNodePool string No - Karpenter NodePool
serviceAccount string No greptime ServiceAccount name
autoCreateRBAC bool No true Auto-create RBAC
imagePullSecrets []string No [] Registry secrets
podAnnotations map[string]string No {} Pod annotations

3.2 Storage

Field Type Required Default Description
backend string Yes - Storage type: s3, gcs, azure
bucket string Yes - Bucket name
region string Yes - Cloud region
endpoint string No - Custom S3 endpoint (for S3-compatible storage)
prefix string No - Object key prefix
accessKeyId string No - Access key (for non-IRSA setups)
secretAccessKey string No - Secret key (for non-IRSA setups)
secretRef string No - Secret containing credentials
useIRSA bool No true Use IAM Roles for SA (AWS only)

Authentication Methods

1. IRSA (AWS - Recommended)

For AWS, use IAM Roles for Service Accounts:

storage:
  backend: s3
  bucket: my-bucket
  region: us-east-1
  useIRSA: true  # Default - uses ServiceAccount with IRSA annotation

2. Static Credentials (Non-AWS S3-compatible)

For Linode, DigitalOcean, MinIO, Wasabi, and other S3-compatible storage:

storage:
  backend: s3
  bucket: my-bucket
  region: us-east-1  # Use any valid region string
  endpoint: "https://us-east-1.linodeobjects.com"  # Required for non-AWS
  useIRSA: false
  accessKeyId: "YOUR_ACCESS_KEY"
  secretAccessKey: "YOUR_SECRET_KEY"

3. Secret Reference

For better security, store credentials in a Kubernetes Secret:

# Create secret first
apiVersion: v1
kind: Secret
metadata:
  name: s3-credentials
  namespace: monitoring
type: Opaque
stringData:
  accessKeyId: "YOUR_ACCESS_KEY"
  secretAccessKey: "YOUR_SECRET_KEY"
---
# Reference in GreptimeDBCluster
storage:
  backend: s3
  bucket: my-bucket
  region: us-east-1
  endpoint: "https://us-east-1.linodeobjects.com"
  secretRef: s3-credentials
  useIRSA: false

S3-Compatible Endpoints

Provider Endpoint Format
Linode Object Storage https://{region}.linodeobjects.com
DigitalOcean Spaces https://{region}.digitaloceanspaces.com
Wasabi https://s3.{region}.wasabisys.com
MinIO https://{your-minio-server}:9000
Backblaze B2 https://s3.{region}.backblazeb2.com

3.3 Frontend

Field Type Required Default Description
replicas int32 No 2 Number of replicas
imageTag string No From image.tag Image tag override
resources GreptimeResourceSpec No - CPU/Memory
ports.http int32 No 4000 HTTP SQL API
ports.grpc int32 No 4001 gRPC
ports.mysql int32 No 4002 MySQL protocol
ports.postgresql int32 No 4003 PostgreSQL protocol
ports.prometheus int32 No 4004 Prometheus remote
service.type string No ClusterIP Service type

3.4 Datanode

Field Type Required Default Description
replicas int32 No 3 Number of replicas
imageTag string No From image.tag Image tag override
resources GreptimeResourceSpec No - CPU/Memory
localCache.size string No 50Gi Local cache volume
localCache.storageClass string No gp3 Storage class
ports.grpc int32 No 4001 Internal gRPC
ports.http int32 No 4000 Health/metrics

3.5 Meta

Field Type Required Default Description
replicas int32 No 1 Replicas (1 for dev, 3 for HA)
imageTag string No From image.tag Image tag override
resources GreptimeResourceSpec No - CPU/Memory
ports.grpc int32 No 3002 Metadata gRPC
ports.http int32 No 4000 Health/metrics

3.6 Etcd

Field Type Required Default Description
replicas int32 No 1 Replicas (1 for dev, 3 for HA)
image string No quay.io/coreos/etcd:v3.5.9 Etcd image
resources GreptimeResourceSpec No - CPU/Memory
storage.size string No 10Gi Data volume size
storage.storageClass string No gp3 Storage class
ports.client int32 No 2379 Client port
ports.peer int32 No 2380 Peer port

3.9 GreptimeResourceSpec (Resources)

All components (frontend, datanode, meta, etcd, grafana) use GreptimeResourceSpec for resource configuration:

Field Type Required Default Description
cpu string No - CPU limit (e.g., "2", "500m")
memory string No - Memory limit (e.g., "4Gi", "512Mi")
cpuRequest string No Same as cpu CPU request (if different from limit)
memoryRequest string No Same as memory Memory request (if different from limit)

Example - Configuring Resources:

apiVersion: e6data.io/v1alpha1
kind: GreptimeDBCluster
spec:
  frontend:
    replicas: 3
    resources:
      cpu: "2"
      memory: "4Gi"
      cpuRequest: "500m"      # Lower request for burstable
      memoryRequest: "2Gi"

  datanode:
    replicas: 5
    resources:
      cpu: "4"
      memory: "16Gi"

  meta:
    replicas: 3
    resources:
      cpu: "1"
      memory: "2Gi"

  etcd:
    replicas: 3
    resources:
      cpu: "500m"
      memory: "1Gi"

  grafana:
    resources:
      cpu: "200m"
      memory: "256Mi"

Recommended Resources by Environment:

Component Development Production
Frontend 500m/1Gi 2/4Gi
Datanode 1/4Gi 4/16Gi
Meta 250m/512Mi 1/2Gi
Etcd 250m/512Mi 500m/2Gi
Grafana 100m/128Mi 200m/256Mi

3.7 Grafana

Field Type Required Default Description
enabled bool No true Deploy Grafana
replicas int32 No 1 Replicas
image string No grafana/grafana:latest Grafana image
resources GreptimeResourceSpec No - CPU/Memory
adminPassword string No greptime Admin password
service.type string No ClusterIP Service type
dashboardConfigMaps []string No [] Custom dashboards
externalGrafana.enabled bool No false Use external Grafana
externalGrafana.url string No - External Grafana URL

3.8 Image

Field Type Required Default Description
repository string No greptime/greptimedb Image repository
tag string No v0.17.0 Image tag
pullPolicy string No IfNotPresent Pull policy

4. Example Manifests

4.1 Minimal Development Cluster

apiVersion: e6data.io/v1alpha1
kind: GreptimeDBCluster
metadata:
  name: greptime-dev
  namespace: monitoring
spec:
  storage:
    backend: s3
    bucket: my-greptime-data
    region: us-east-1

  # Single replicas for dev
  frontend:
    replicas: 1
  datanode:
    replicas: 1
  meta:
    replicas: 1
  etcd:
    replicas: 1

4.2 Production HA Cluster

apiVersion: e6data.io/v1alpha1
kind: GreptimeDBCluster
metadata:
  name: greptime-prod
  namespace: monitoring
  labels:
    e6data.io/environment: production
spec:
  # Object storage
  storage:
    backend: s3
    bucket: prod-greptime-data
    region: us-east-1
    prefix: "greptime/prod/"
    useIRSA: true  # Use IAM Roles for Service Accounts

  # Image configuration
  image:
    repository: greptime/greptimedb
    tag: v0.17.0
    pullPolicy: IfNotPresent

  # Frontend - query endpoint
  frontend:
    replicas: 3
    resources:
      cpu: "2"
      memory: "4Gi"
    service:
      type: LoadBalancer
      annotations:
        service.beta.kubernetes.io/aws-load-balancer-internal: "true"

  # Datanode - data processing
  datanode:
    replicas: 5
    resources:
      cpu: "4"
      memory: "16Gi"
    localCache:
      size: 100Gi
      storageClass: gp3

  # Meta - coordination
  meta:
    replicas: 3
    resources:
      cpu: "1"
      memory: "2Gi"

  # Etcd - metadata persistence
  etcd:
    replicas: 3
    resources:
      cpu: "1"
      memory: "2Gi"
    storage:
      size: 20Gi
      storageClass: gp3

  # Grafana for visualization
  grafana:
    enabled: true
    replicas: 2
    adminPassword: "changeme-in-production"
    service:
      type: LoadBalancer

  # Scheduling
  serviceAccount: greptime
  autoCreateRBAC: true

  tolerations:
    - key: "dedicated"
      operator: "Equal"
      value: "monitoring"
      effect: "NoSchedule"

  nodeSelector:
    workload-type: monitoring

  podAnnotations:
    prometheus.io/scrape: "true"

4.3 GCS Backend (GCP)

apiVersion: e6data.io/v1alpha1
kind: GreptimeDBCluster
metadata:
  name: greptime-gcp
  namespace: monitoring
spec:
  cloud: GCP

  storage:
    backend: gcs
    bucket: my-greptime-bucket
    region: us-central1
    useIRSA: true  # Workload Identity

  frontend:
    replicas: 2
  datanode:
    replicas: 3
  meta:
    replicas: 1
  etcd:
    replicas: 1

4.4 Azure Blob Backend

apiVersion: e6data.io/v1alpha1
kind: GreptimeDBCluster
metadata:
  name: greptime-azure
  namespace: monitoring
spec:
  cloud: AZURE

  storage:
    backend: azure
    bucket: my-container  # Azure container name
    region: eastus
    secretRef: azure-storage-secret  # Contains storageAccountName, storageAccountKey

  frontend:
    replicas: 2
  datanode:
    replicas: 3

4.5 External Grafana Integration

apiVersion: e6data.io/v1alpha1
kind: GreptimeDBCluster
metadata:
  name: greptime-cluster
  namespace: monitoring
spec:
  storage:
    backend: s3
    bucket: greptime-data
    region: us-east-1

  frontend:
    replicas: 2
  datanode:
    replicas: 3

  # Don't deploy local Grafana
  grafana:
    enabled: false
    externalGrafana:
      enabled: true
      url: https://grafana.example.com
      apiKeySecretRef: grafana-api-key  # Secret with apiKey

4.6 S3-Compatible Storage (MinIO/Linode)

apiVersion: e6data.io/v1alpha1
kind: GreptimeDBCluster
metadata:
  name: greptime-minio
  namespace: monitoring
spec:
  storage:
    backend: s3
    bucket: greptime
    region: us-east-1
    endpoint: https://minio.example.com  # Custom endpoint
    secretRef: minio-credentials  # Contains accessKeyId, secretAccessKey
    useIRSA: false  # Can't use IRSA with non-AWS S3

  frontend:
    replicas: 1
  datanode:
    replicas: 2

5. Status & Lifecycle

5.1 Status Fields

Field Type Description
phase string Current lifecycle phase
message string Status message
ready bool All components ready
frontend ComponentStatus Frontend status
datanode ComponentStatus Datanode status
meta ComponentStatus Meta status
etcd ComponentStatus Etcd status
grafana ComponentStatus Grafana status
endpoints GreptimeEndpoints Connection endpoints
conditions []Condition Detailed conditions

5.2 Phase Values

Phase Description
Pending Initial state
Initializing Deploying components
Running All components healthy
Degraded Some components unhealthy
Failed Deployment failed

5.3 Endpoints

status:
  endpoints:
    http: "greptime-prod-frontend.monitoring.svc:4000"
    grpc: "greptime-prod-frontend.monitoring.svc:4001"
    mysql: "greptime-prod-frontend.monitoring.svc:4002"
    postgresql: "greptime-prod-frontend.monitoring.svc:4003"
    prometheus: "greptime-prod-frontend.monitoring.svc:4004"
    grafana: "greptime-prod-grafana.monitoring.svc:3000"

Integration with QueryService

QueryService can send query history to GreptimeDB:

apiVersion: e6data.io/v1alpha1
kind: QueryService
metadata:
  name: analytics-cluster
spec:
  # ... other config ...
  queryHistory:
    enabled: true
    greptimeRef:
      name: greptime-prod
      namespace: monitoring
      database: public
      table: query_history

7. Troubleshooting

7.1 Common Issues

Cluster Stuck in Initializing

Symptoms:

$ kubectl get gdb
NAME            PHASE          READY
greptime-prod   Initializing   false

Checks:

# Check component status
kubectl get gdb greptime-prod -o jsonpath='{.status}'

# Check pods
kubectl get pods -l app.kubernetes.io/instance=greptime-prod

# Check etcd first (required for meta)
kubectl logs -l app.kubernetes.io/name=etcd --tail=50

# Check meta service
kubectl logs -l app.kubernetes.io/name=meta --tail=50

Storage Access Denied

Symptoms: Datanodes failing with S3/GCS permission errors.

Checks:

# Verify IRSA annotation on ServiceAccount
kubectl get sa greptime -o yaml

# Test storage access from datanode pod
kubectl exec -it greptime-prod-datanode-0 -- aws s3 ls s3://bucket/

# Check for credential secret
kubectl get secret -l app.kubernetes.io/instance=greptime-prod

Datanode PVC Pending

Symptoms: Datanode pods stuck in Pending.

Checks:

# Check PVC status
kubectl get pvc -l app.kubernetes.io/instance=greptime-prod

# Check storage class exists
kubectl get sc gp3

# Check events
kubectl describe pvc greptime-prod-datanode-data-0

7.2 Useful Commands

# Get cluster status
kubectl get gdb greptime-prod -o yaml

# Check all components
kubectl get all -l app.kubernetes.io/instance=greptime-prod

# Get endpoints
kubectl get gdb greptime-prod -o jsonpath='{.status.endpoints}'

# Connect via MySQL protocol
kubectl port-forward svc/greptime-prod-frontend 4002:4002
mysql -h 127.0.0.1 -P 4002

# Connect via PostgreSQL protocol
kubectl port-forward svc/greptime-prod-frontend 4003:4003
psql -h 127.0.0.1 -p 4003 -U root public

# Test HTTP API
kubectl port-forward svc/greptime-prod-frontend 4000:4000
curl http://localhost:4000/v1/sql -d "sql=SHOW TABLES"

# Access Grafana
kubectl port-forward svc/greptime-prod-grafana 3000:3000
# Open http://localhost:3000 (admin/greptime)

8. Query Examples

8.1 SQL via HTTP API

# Create table
curl -X POST http://localhost:4000/v1/sql -d 'sql=
CREATE TABLE query_history (
  timestamp TIMESTAMP TIME INDEX,
  cluster_id STRING,
  query_id STRING,
  duration_ms INT64,
  status STRING,
  PRIMARY KEY (cluster_id, query_id)
)'

# Insert data
curl -X POST http://localhost:4000/v1/sql -d 'sql=
INSERT INTO query_history VALUES
  (now(), "cluster-1", "q-001", 1500, "success"),
  (now(), "cluster-1", "q-002", 2300, "success")'

# Query data
curl -X POST http://localhost:4000/v1/sql -d 'sql=
SELECT cluster_id, avg(duration_ms) as avg_duration
FROM query_history
WHERE timestamp > now() - interval "1 hour"
GROUP BY cluster_id'

8.2 Prometheus Remote Write

# prometheus.yml
remote_write:
  - url: http://greptime-prod-frontend:4004/v1/prometheus/write

8.3 MySQL Protocol

-- Connect: mysql -h greptime-frontend -P 4002 -u root

SHOW DATABASES;
USE public;
SHOW TABLES;

SELECT * FROM query_history
WHERE timestamp > '2024-01-01'
ORDER BY timestamp DESC
LIMIT 100;