GreptimeDBCluster¶
API Version: e6data.io/v1alpha1 Kind: GreptimeDBCluster Short Names: gdb, greptime
1. Purpose¶
GreptimeDBCluster deploys a GreptimeDB time-series database cluster for storing and querying operational metrics, including:
- Query history: Real-time query event storage from QueryService
- Metrics storage: Prometheus-compatible metrics backend
- Time-series analytics: SQL and PromQL query support
GreptimeDB provides multiple query interfaces: - HTTP SQL API - gRPC - MySQL wire protocol - PostgreSQL wire protocol - Prometheus remote read/write
2. High-level Behavior¶
When you create a GreptimeDBCluster CR, the operator:
- Detects cloud provider for storage backend configuration
- Deploys etcd for metadata persistence
- Deploys Meta service for cluster coordination
- Deploys Datanodes for data storage and processing
- Deploys Frontend for query routing
- Optionally deploys Grafana for visualization
- Creates Services for all endpoints
Child Resources Created¶
| Resource Type | Name Pattern | Purpose |
|---|---|---|
| StatefulSet | {name}-etcd | Etcd cluster |
| Deployment | {name}-meta | Meta coordination service |
| StatefulSet | {name}-datanode | Data storage and processing |
| Deployment | {name}-frontend | Query endpoint |
| Deployment | {name}-grafana | Visualization (optional) |
| Service | {name}-frontend | Query endpoints |
| Service | {name}-meta | Internal meta service |
| Service | {name}-etcd | Internal etcd |
| Service | {name}-grafana | Grafana UI |
| ConfigMap | {name}-grafana-datasource | GreptimeDB datasource |
| PVC | {name}-etcd-data-* | Etcd storage |
| PVC | {name}-datanode-data-* | Datanode cache |
| ServiceAccount | greptime | Pod identity |
Architecture¶
┌─────────────────────┐
│ Frontend │
│ (HTTP/gRPC/MySQL/ │
│ PostgreSQL/Prom) │
└──────────┬──────────┘
│
┌────────────────┼────────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Datanode │ │ Datanode │ │ Datanode │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
└────────────────┼────────────────┘
│
┌────────┴────────┐
│ Object Storage │
│ (S3/GCS/Azure) │
└─────────────────┘
▲
┌────────┴────────┐
│ Meta │
│ (Coordination) │
└────────┬────────┘
│
┌────────┴────────┐
│ Etcd │
│ (Metadata) │
└─────────────────┘
3. Spec Reference¶
3.1 Top-level Fields¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
storage | GreptimeStorageSpec | Yes | - | Object storage configuration |
frontend | GreptimeFrontendSpec | No | See defaults | Frontend configuration |
datanode | GreptimeDatanodeSpec | No | See defaults | Datanode configuration |
meta | GreptimeMetaSpec | No | See defaults | Meta service configuration |
etcd | GreptimeEtcdSpec | No | See defaults | Etcd configuration |
grafana | GrafanaSpec | No | Enabled | Grafana configuration |
image | GreptimeImageSpec | No | See defaults | Common image settings |
cloud | string | No | Auto-detected | Cloud provider |
tolerations | []Toleration | No | [] | Pod tolerations |
nodeSelector | map[string]string | No | {} | Node selection |
affinity | Affinity | No | - | Affinity rules |
karpenterNodePool | string | No | - | Karpenter NodePool |
serviceAccount | string | No | greptime | ServiceAccount name |
autoCreateRBAC | bool | No | true | Auto-create RBAC |
imagePullSecrets | []string | No | [] | Registry secrets |
podAnnotations | map[string]string | No | {} | Pod annotations |
3.2 Storage¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
backend | string | Yes | - | Storage type: s3, gcs, azure |
bucket | string | Yes | - | Bucket name |
region | string | Yes | - | Cloud region |
endpoint | string | No | - | Custom S3 endpoint (for S3-compatible storage) |
prefix | string | No | - | Object key prefix |
accessKeyId | string | No | - | Access key (for non-IRSA setups) |
secretAccessKey | string | No | - | Secret key (for non-IRSA setups) |
secretRef | string | No | - | Secret containing credentials |
useIRSA | bool | No | true | Use IAM Roles for SA (AWS only) |
Authentication Methods¶
1. IRSA (AWS - Recommended)
For AWS, use IAM Roles for Service Accounts:
storage:
backend: s3
bucket: my-bucket
region: us-east-1
useIRSA: true # Default - uses ServiceAccount with IRSA annotation
2. Static Credentials (Non-AWS S3-compatible)
For Linode, DigitalOcean, MinIO, Wasabi, and other S3-compatible storage:
storage:
backend: s3
bucket: my-bucket
region: us-east-1 # Use any valid region string
endpoint: "https://us-east-1.linodeobjects.com" # Required for non-AWS
useIRSA: false
accessKeyId: "YOUR_ACCESS_KEY"
secretAccessKey: "YOUR_SECRET_KEY"
3. Secret Reference
For better security, store credentials in a Kubernetes Secret:
# Create secret first
apiVersion: v1
kind: Secret
metadata:
name: s3-credentials
namespace: monitoring
type: Opaque
stringData:
accessKeyId: "YOUR_ACCESS_KEY"
secretAccessKey: "YOUR_SECRET_KEY"
---
# Reference in GreptimeDBCluster
storage:
backend: s3
bucket: my-bucket
region: us-east-1
endpoint: "https://us-east-1.linodeobjects.com"
secretRef: s3-credentials
useIRSA: false
S3-Compatible Endpoints¶
| Provider | Endpoint Format |
|---|---|
| Linode Object Storage | https://{region}.linodeobjects.com |
| DigitalOcean Spaces | https://{region}.digitaloceanspaces.com |
| Wasabi | https://s3.{region}.wasabisys.com |
| MinIO | https://{your-minio-server}:9000 |
| Backblaze B2 | https://s3.{region}.backblazeb2.com |
3.3 Frontend¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
replicas | int32 | No | 2 | Number of replicas |
imageTag | string | No | From image.tag | Image tag override |
resources | GreptimeResourceSpec | No | - | CPU/Memory |
ports.http | int32 | No | 4000 | HTTP SQL API |
ports.grpc | int32 | No | 4001 | gRPC |
ports.mysql | int32 | No | 4002 | MySQL protocol |
ports.postgresql | int32 | No | 4003 | PostgreSQL protocol |
ports.prometheus | int32 | No | 4004 | Prometheus remote |
service.type | string | No | ClusterIP | Service type |
3.4 Datanode¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
replicas | int32 | No | 3 | Number of replicas |
imageTag | string | No | From image.tag | Image tag override |
resources | GreptimeResourceSpec | No | - | CPU/Memory |
localCache.size | string | No | 50Gi | Local cache volume |
localCache.storageClass | string | No | gp3 | Storage class |
ports.grpc | int32 | No | 4001 | Internal gRPC |
ports.http | int32 | No | 4000 | Health/metrics |
3.5 Meta¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
replicas | int32 | No | 1 | Replicas (1 for dev, 3 for HA) |
imageTag | string | No | From image.tag | Image tag override |
resources | GreptimeResourceSpec | No | - | CPU/Memory |
ports.grpc | int32 | No | 3002 | Metadata gRPC |
ports.http | int32 | No | 4000 | Health/metrics |
3.6 Etcd¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
replicas | int32 | No | 1 | Replicas (1 for dev, 3 for HA) |
image | string | No | quay.io/coreos/etcd:v3.5.9 | Etcd image |
resources | GreptimeResourceSpec | No | - | CPU/Memory |
storage.size | string | No | 10Gi | Data volume size |
storage.storageClass | string | No | gp3 | Storage class |
ports.client | int32 | No | 2379 | Client port |
ports.peer | int32 | No | 2380 | Peer port |
3.9 GreptimeResourceSpec (Resources)¶
All components (frontend, datanode, meta, etcd, grafana) use GreptimeResourceSpec for resource configuration:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
cpu | string | No | - | CPU limit (e.g., "2", "500m") |
memory | string | No | - | Memory limit (e.g., "4Gi", "512Mi") |
cpuRequest | string | No | Same as cpu | CPU request (if different from limit) |
memoryRequest | string | No | Same as memory | Memory request (if different from limit) |
Example - Configuring Resources:
apiVersion: e6data.io/v1alpha1
kind: GreptimeDBCluster
spec:
frontend:
replicas: 3
resources:
cpu: "2"
memory: "4Gi"
cpuRequest: "500m" # Lower request for burstable
memoryRequest: "2Gi"
datanode:
replicas: 5
resources:
cpu: "4"
memory: "16Gi"
meta:
replicas: 3
resources:
cpu: "1"
memory: "2Gi"
etcd:
replicas: 3
resources:
cpu: "500m"
memory: "1Gi"
grafana:
resources:
cpu: "200m"
memory: "256Mi"
Recommended Resources by Environment:
| Component | Development | Production |
|---|---|---|
| Frontend | 500m/1Gi | 2/4Gi |
| Datanode | 1/4Gi | 4/16Gi |
| Meta | 250m/512Mi | 1/2Gi |
| Etcd | 250m/512Mi | 500m/2Gi |
| Grafana | 100m/128Mi | 200m/256Mi |
3.7 Grafana¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
enabled | bool | No | true | Deploy Grafana |
replicas | int32 | No | 1 | Replicas |
image | string | No | grafana/grafana:latest | Grafana image |
resources | GreptimeResourceSpec | No | - | CPU/Memory |
adminPassword | string | No | greptime | Admin password |
service.type | string | No | ClusterIP | Service type |
dashboardConfigMaps | []string | No | [] | Custom dashboards |
externalGrafana.enabled | bool | No | false | Use external Grafana |
externalGrafana.url | string | No | - | External Grafana URL |
3.8 Image¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
repository | string | No | greptime/greptimedb | Image repository |
tag | string | No | v0.17.0 | Image tag |
pullPolicy | string | No | IfNotPresent | Pull policy |
4. Example Manifests¶
4.1 Minimal Development Cluster¶
apiVersion: e6data.io/v1alpha1
kind: GreptimeDBCluster
metadata:
name: greptime-dev
namespace: monitoring
spec:
storage:
backend: s3
bucket: my-greptime-data
region: us-east-1
# Single replicas for dev
frontend:
replicas: 1
datanode:
replicas: 1
meta:
replicas: 1
etcd:
replicas: 1
4.2 Production HA Cluster¶
apiVersion: e6data.io/v1alpha1
kind: GreptimeDBCluster
metadata:
name: greptime-prod
namespace: monitoring
labels:
e6data.io/environment: production
spec:
# Object storage
storage:
backend: s3
bucket: prod-greptime-data
region: us-east-1
prefix: "greptime/prod/"
useIRSA: true # Use IAM Roles for Service Accounts
# Image configuration
image:
repository: greptime/greptimedb
tag: v0.17.0
pullPolicy: IfNotPresent
# Frontend - query endpoint
frontend:
replicas: 3
resources:
cpu: "2"
memory: "4Gi"
service:
type: LoadBalancer
annotations:
service.beta.kubernetes.io/aws-load-balancer-internal: "true"
# Datanode - data processing
datanode:
replicas: 5
resources:
cpu: "4"
memory: "16Gi"
localCache:
size: 100Gi
storageClass: gp3
# Meta - coordination
meta:
replicas: 3
resources:
cpu: "1"
memory: "2Gi"
# Etcd - metadata persistence
etcd:
replicas: 3
resources:
cpu: "1"
memory: "2Gi"
storage:
size: 20Gi
storageClass: gp3
# Grafana for visualization
grafana:
enabled: true
replicas: 2
adminPassword: "changeme-in-production"
service:
type: LoadBalancer
# Scheduling
serviceAccount: greptime
autoCreateRBAC: true
tolerations:
- key: "dedicated"
operator: "Equal"
value: "monitoring"
effect: "NoSchedule"
nodeSelector:
workload-type: monitoring
podAnnotations:
prometheus.io/scrape: "true"
4.3 GCS Backend (GCP)¶
apiVersion: e6data.io/v1alpha1
kind: GreptimeDBCluster
metadata:
name: greptime-gcp
namespace: monitoring
spec:
cloud: GCP
storage:
backend: gcs
bucket: my-greptime-bucket
region: us-central1
useIRSA: true # Workload Identity
frontend:
replicas: 2
datanode:
replicas: 3
meta:
replicas: 1
etcd:
replicas: 1
4.4 Azure Blob Backend¶
apiVersion: e6data.io/v1alpha1
kind: GreptimeDBCluster
metadata:
name: greptime-azure
namespace: monitoring
spec:
cloud: AZURE
storage:
backend: azure
bucket: my-container # Azure container name
region: eastus
secretRef: azure-storage-secret # Contains storageAccountName, storageAccountKey
frontend:
replicas: 2
datanode:
replicas: 3
4.5 External Grafana Integration¶
apiVersion: e6data.io/v1alpha1
kind: GreptimeDBCluster
metadata:
name: greptime-cluster
namespace: monitoring
spec:
storage:
backend: s3
bucket: greptime-data
region: us-east-1
frontend:
replicas: 2
datanode:
replicas: 3
# Don't deploy local Grafana
grafana:
enabled: false
externalGrafana:
enabled: true
url: https://grafana.example.com
apiKeySecretRef: grafana-api-key # Secret with apiKey
4.6 S3-Compatible Storage (MinIO/Linode)¶
apiVersion: e6data.io/v1alpha1
kind: GreptimeDBCluster
metadata:
name: greptime-minio
namespace: monitoring
spec:
storage:
backend: s3
bucket: greptime
region: us-east-1
endpoint: https://minio.example.com # Custom endpoint
secretRef: minio-credentials # Contains accessKeyId, secretAccessKey
useIRSA: false # Can't use IRSA with non-AWS S3
frontend:
replicas: 1
datanode:
replicas: 2
5. Status & Lifecycle¶
5.1 Status Fields¶
| Field | Type | Description |
|---|---|---|
phase | string | Current lifecycle phase |
message | string | Status message |
ready | bool | All components ready |
frontend | ComponentStatus | Frontend status |
datanode | ComponentStatus | Datanode status |
meta | ComponentStatus | Meta status |
etcd | ComponentStatus | Etcd status |
grafana | ComponentStatus | Grafana status |
endpoints | GreptimeEndpoints | Connection endpoints |
conditions | []Condition | Detailed conditions |
5.2 Phase Values¶
| Phase | Description |
|---|---|
Pending | Initial state |
Initializing | Deploying components |
Running | All components healthy |
Degraded | Some components unhealthy |
Failed | Deployment failed |
5.3 Endpoints¶
status:
endpoints:
http: "greptime-prod-frontend.monitoring.svc:4000"
grpc: "greptime-prod-frontend.monitoring.svc:4001"
mysql: "greptime-prod-frontend.monitoring.svc:4002"
postgresql: "greptime-prod-frontend.monitoring.svc:4003"
prometheus: "greptime-prod-frontend.monitoring.svc:4004"
grafana: "greptime-prod-grafana.monitoring.svc:3000"
6. Related Resources¶
Integration with QueryService¶
QueryService can send query history to GreptimeDB:
apiVersion: e6data.io/v1alpha1
kind: QueryService
metadata:
name: analytics-cluster
spec:
# ... other config ...
queryHistory:
enabled: true
greptimeRef:
name: greptime-prod
namespace: monitoring
database: public
table: query_history
7. Troubleshooting¶
7.1 Common Issues¶
Cluster Stuck in Initializing¶
Symptoms:
Checks:
# Check component status
kubectl get gdb greptime-prod -o jsonpath='{.status}'
# Check pods
kubectl get pods -l app.kubernetes.io/instance=greptime-prod
# Check etcd first (required for meta)
kubectl logs -l app.kubernetes.io/name=etcd --tail=50
# Check meta service
kubectl logs -l app.kubernetes.io/name=meta --tail=50
Storage Access Denied¶
Symptoms: Datanodes failing with S3/GCS permission errors.
Checks:
# Verify IRSA annotation on ServiceAccount
kubectl get sa greptime -o yaml
# Test storage access from datanode pod
kubectl exec -it greptime-prod-datanode-0 -- aws s3 ls s3://bucket/
# Check for credential secret
kubectl get secret -l app.kubernetes.io/instance=greptime-prod
Datanode PVC Pending¶
Symptoms: Datanode pods stuck in Pending.
Checks:
# Check PVC status
kubectl get pvc -l app.kubernetes.io/instance=greptime-prod
# Check storage class exists
kubectl get sc gp3
# Check events
kubectl describe pvc greptime-prod-datanode-data-0
7.2 Useful Commands¶
# Get cluster status
kubectl get gdb greptime-prod -o yaml
# Check all components
kubectl get all -l app.kubernetes.io/instance=greptime-prod
# Get endpoints
kubectl get gdb greptime-prod -o jsonpath='{.status.endpoints}'
# Connect via MySQL protocol
kubectl port-forward svc/greptime-prod-frontend 4002:4002
mysql -h 127.0.0.1 -P 4002
# Connect via PostgreSQL protocol
kubectl port-forward svc/greptime-prod-frontend 4003:4003
psql -h 127.0.0.1 -p 4003 -U root public
# Test HTTP API
kubectl port-forward svc/greptime-prod-frontend 4000:4000
curl http://localhost:4000/v1/sql -d "sql=SHOW TABLES"
# Access Grafana
kubectl port-forward svc/greptime-prod-grafana 3000:3000
# Open http://localhost:3000 (admin/greptime)
8. Query Examples¶
8.1 SQL via HTTP API¶
# Create table
curl -X POST http://localhost:4000/v1/sql -d 'sql=
CREATE TABLE query_history (
timestamp TIMESTAMP TIME INDEX,
cluster_id STRING,
query_id STRING,
duration_ms INT64,
status STRING,
PRIMARY KEY (cluster_id, query_id)
)'
# Insert data
curl -X POST http://localhost:4000/v1/sql -d 'sql=
INSERT INTO query_history VALUES
(now(), "cluster-1", "q-001", 1500, "success"),
(now(), "cluster-1", "q-002", 2300, "success")'
# Query data
curl -X POST http://localhost:4000/v1/sql -d 'sql=
SELECT cluster_id, avg(duration_ms) as avg_duration
FROM query_history
WHERE timestamp > now() - interval "1 hour"
GROUP BY cluster_id'