Query History Guide¶

This guide covers query event logging and history tracking in QueryService.

Overview¶

Query history captures detailed information about every query executed through QueryService:

Query text and parameters
Execution timing and duration
Resource usage
Success/failure status
User and cluster context

Architecture¶

┌─────────────────────────────────────────────────────────────────────────┐
│                          QueryService Pod                               │
│                                                                          │
│  ┌────────────────────┐        ┌─────────────────────────────────────┐  │
│  │    Queue Service   │        │         Vector Sidecar              │  │
│  │                    │        │                                      │  │
│  │  Query Events:     │  HTTP  │  Sources:                           │  │
│  │  - query_start     │────────│    http_server (:2020)              │  │
│  │  - query_complete  │        │                                      │  │
│  │  - query_failed    │        │  Sinks:                             │  │
│  │                    │        │    S3 (archival, JSON/gzip)         │──┼──▶ S3
│  └────────────────────┘        │    GreptimeDB (real-time, optional) │──┼──▶ GreptimeDB
│                                └─────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────┘

Configuration¶

Basic S3 Setup¶

apiVersion: e6data.io/v1alpha1
kind: QueryService
metadata:
  name: analytics-cluster
  namespace: workspace-prod
spec:
  alias: analytics
  workspace: analytics-prod

  # Query history configuration
  queryHistory:
    enabled: true

    # S3 destination (bucket borrowed from MetadataServices)
    s3Prefix: "query-history"        # Key prefix
    s3Partition: "hour"              # Partition by hour/day

    # Vector HTTP server port (Queue sends events here)
    httpPort: 2020

    # Queue service settings
    timeoutSeconds: 5
    messageQueueCapacity: 10000
    numPublishingThreads: 10
    maxPayloadBatchSize: 100

    # S3 sink settings
    batchTimeoutSecs: 30
    batchMaxBytes: 10485760          # 10MB
    compression: "gzip"

    # Vector sidecar resources
    resources:
      cpu: "200m"
      memory: "256Mi"

  # ... rest of QueryService spec

With GreptimeDB (Real-time Queries)¶

apiVersion: e6data.io/v1alpha1
kind: QueryService
metadata:
  name: analytics-cluster
  namespace: workspace-prod
spec:
  queryHistory:
    enabled: true
    s3Prefix: "query-history"

    # GreptimeDB for real-time queries
    greptimeRef:
      name: greptime-prod
      namespace: greptime-system
      database: "public"           # GreptimeDB database
      table: "query_history"       # Table for events

  # ... rest of spec

Spec Reference¶

QueryHistorySpec¶

Field	Type	Required	Default	Description
`enabled`	bool	No	`true`	Enable query history
`image`	QueryHistoryImageSpec	No	-	Vector image config
`s3Prefix`	string	No	`query-history`	S3 key prefix
`s3Partition`	string	No	`hour`	Time partition (hour/day)
`httpPort`	int32	No	`2020`	Vector HTTP server port
`timeoutSeconds`	int32	No	`5`	Queue publish timeout
`messageQueueCapacity`	int32	No	`10000`	Queue buffer size
`numPublishingThreads`	int32	No	`10`	Concurrent publishers
`maxPayloadBatchSize`	int32	No	`100`	Events per batch
`batchTimeoutSecs`	int32	No	`30`	S3 batch flush timeout
`batchMaxBytes`	int64	No	`10485760`	S3 batch size limit
`compression`	string	No	`gzip`	Compression (gzip/zstd/none)
`resources`	ResourceSpec	No	-	Vector container resources
`greptimeRef`	QueryHistoryGreptimeRef	No	-	GreptimeDB config

QueryHistoryImageSpec¶

Field	Type	Default	Description
`repository`	string	`timberio/vector`	Image repository
`tag`	string	`0.41.0-alpine`	Image tag
`pullPolicy`	string	`IfNotPresent`	Pull policy

QueryHistoryGreptimeRef¶

Field	Type	Required	Default	Description
`name`	string	Yes	-	GreptimeDBCluster CR name
`namespace`	string	No	Same namespace	GreptimeDBCluster namespace
`database`	string	No	`public`	Database name
`table`	string	No	`query_history`	Table name

Data Flow¶

1. Query Execution¶

When a query executes:

Queue service receives query from Planner
Query events are generated (start, complete/failed)
Events sent to Vector sidecar via HTTP (http://localhost:2020/query_history)

2. Event Processing¶

Vector sidecar processes events:

Receives HTTP JSON events
Enriches with cluster metadata
Batches events for efficiency
Writes to S3 (always) and GreptimeDB (if configured)

3. Storage Layout¶

S3 Structure:

s3://bucket/query-history/
├── year=2024/
│   ├── month=12/
│   │   ├── day=09/
│   │   │   ├── hour=10/
│   │   │   │   ├── events-001.json.gz
│   │   │   │   ├── events-002.json.gz
│   │   │   │   └── events-003.json.gz

GreptimeDB Table:

CREATE TABLE query_history (
  timestamp TIMESTAMP TIME INDEX,
  cluster_id STRING,
  workspace STRING,
  query_id STRING PRIMARY KEY,
  user_id STRING,
  query_text STRING,
  status STRING,
  duration_ms INT64,
  rows_returned INT64,
  bytes_scanned INT64,
  error_message STRING
);

Query Event Schema¶

Common Fields¶

Field	Type	Description
`timestamp`	ISO 8601	Event timestamp
`event_type`	string	`query_start`, `query_complete`, `query_failed`
`cluster_id`	string	QueryService cluster UUID
`workspace`	string	Workspace name
`query_id`	string	Unique query identifier
`user_id`	string	User who submitted query

query_start Event¶

{
  "timestamp": "2024-12-09T10:30:00.123Z",
  "event_type": "query_start",
  "cluster_id": "analytics-cluster",
  "workspace": "analytics-prod",
  "query_id": "q-12345678",
  "user_id": "user@example.com",
  "query_text": "SELECT * FROM sales WHERE date > '2024-01-01'",
  "catalog": "data-lake",
  "database": "sales"
}

query_complete Event¶

{
  "timestamp": "2024-12-09T10:30:02.456Z",
  "event_type": "query_complete",
  "cluster_id": "analytics-cluster",
  "workspace": "analytics-prod",
  "query_id": "q-12345678",
  "user_id": "user@example.com",
  "duration_ms": 2333,
  "rows_returned": 1500,
  "bytes_scanned": 52428800,
  "executors_used": 4,
  "queue_wait_ms": 50,
  "planning_ms": 150,
  "execution_ms": 2133
}

query_failed Event¶

{
  "timestamp": "2024-12-09T10:30:01.789Z",
  "event_type": "query_failed",
  "cluster_id": "analytics-cluster",
  "workspace": "analytics-prod",
  "query_id": "q-12345679",
  "user_id": "user@example.com",
  "error_code": "SYNTAX_ERROR",
  "error_message": "Syntax error at line 1, column 15: unexpected token 'FORM'"
}

Querying History¶

From S3 (Archival)¶

# List files
aws s3 ls s3://bucket/query-history/year=2024/month=12/day=09/hour=10/

# Download and decompress
aws s3 cp s3://bucket/query-history/year=2024/month=12/day=09/hour=10/events-001.json.gz - | \
  gunzip | jq '.[] | select(.event_type == "query_complete")'

From GreptimeDB (Real-time)¶

-- Recent queries by user
SELECT query_id, query_text, duration_ms, status
FROM query_history
WHERE user_id = 'user@example.com'
  AND timestamp > now() - interval '1 hour'
ORDER BY timestamp DESC
LIMIT 100;

-- Average query duration by hour
SELECT
  date_bin('1 hour', timestamp) as hour,
  count(*) as query_count,
  avg(duration_ms) as avg_duration
FROM query_history
WHERE event_type = 'query_complete'
  AND timestamp > now() - interval '24 hours'
GROUP BY 1
ORDER BY 1;

-- Slowest queries
SELECT query_id, query_text, duration_ms, bytes_scanned
FROM query_history
WHERE event_type = 'query_complete'
  AND timestamp > now() - interval '24 hours'
ORDER BY duration_ms DESC
LIMIT 10;

-- Error rate
SELECT
  count(*) FILTER (WHERE event_type = 'query_failed') as failed,
  count(*) FILTER (WHERE event_type = 'query_complete') as succeeded,
  count(*) FILTER (WHERE event_type = 'query_failed')::float / count(*) * 100 as error_rate
FROM query_history
WHERE timestamp > now() - interval '1 hour';

Monitoring & Troubleshooting¶

Check Vector Sidecar¶

# Verify Vector is running in Queue pod
kubectl get pods -l app=queue -o jsonpath='{.items[*].spec.containers[*].name}'

# Check Vector logs
kubectl logs -l app=queue -c vector --tail=100

# Check for HTTP errors
kubectl logs -l app=queue -c vector | grep -i error

Check Queue Publishing¶

# Queue logs for publishing status
kubectl logs -l app=queue -c queue | grep -i "query_history\|vector"

Verify S3 Output¶

# List recent files
BUCKET=$(kubectl get mds -o jsonpath='{.spec.storageBackend}' | sed 's|s3a://||')
aws s3 ls s3://$BUCKET/query-history/ --recursive | tail -20

# Check file contents
aws s3 cp s3://$BUCKET/query-history/year=2024/month=12/day=09/hour=10/events-001.json.gz - | \
  gunzip | head -5

Check GreptimeDB¶

# Port-forward to GreptimeDB
kubectl port-forward svc/greptime-prod-frontend 4000:4000 -n greptime-system

# Query recent events
curl -X POST http://localhost:4000/v1/sql -d 'sql=SELECT count(*) FROM query_history'

Best Practices¶

1. Resource Sizing¶

Query Volume	Vector CPU	Vector Memory
< 100 qps	100m	128Mi
100-500 qps	200m	256Mi
500-1000 qps	500m	512Mi
> 1000 qps	1000m	1Gi

2. Retention Strategy¶

S3: Keep forever (low cost), use lifecycle policies for older data
GreptimeDB: Keep 7-30 days for real-time queries
Use compression: Always enable gzip/zstd

3. Query Optimization¶

Use GreptimeDB for recent data (< 7 days)
Use S3 + Athena/Spark for historical analysis
Partition wisely: hourly for high volume, daily for low volume

4. Security¶

Redact sensitive data before logging
Control access via IAM policies
Encrypt at rest in both S3 and GreptimeDB

QueryService - Full QueryService spec
GreptimeDBCluster - GreptimeDB setup
MonitoringServices - General log/metrics collection
Autoscaling - Executor autoscaling configuration