Skip to content

Query History Guide

This guide covers query event logging and history tracking in QueryService.


Overview

Query history captures detailed information about every query executed through QueryService:

  • Query text and parameters
  • Execution timing and duration
  • Resource usage
  • Success/failure status
  • User and cluster context

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                          QueryService Pod                               │
│                                                                          │
│  ┌────────────────────┐        ┌─────────────────────────────────────┐  │
│  │    Queue Service   │        │         Vector Sidecar              │  │
│  │                    │        │                                      │  │
│  │  Query Events:     │  HTTP  │  Sources:                           │  │
│  │  - query_start     │────────│    http_server (:2020)              │  │
│  │  - query_complete  │        │                                      │  │
│  │  - query_failed    │        │  Sinks:                             │  │
│  │                    │        │    S3 (archival, JSON/gzip)         │──┼──▶ S3
│  └────────────────────┘        │    GreptimeDB (real-time, optional) │──┼──▶ GreptimeDB
│                                └─────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────┘

Configuration

Basic S3 Setup

apiVersion: e6data.io/v1alpha1
kind: QueryService
metadata:
  name: analytics-cluster
  namespace: workspace-prod
spec:
  alias: analytics
  workspace: analytics-prod

  # Query history configuration
  queryHistory:
    enabled: true

    # S3 destination (bucket borrowed from MetadataServices)
    s3Prefix: "query-history"        # Key prefix
    s3Partition: "hour"              # Partition by hour/day

    # Vector HTTP server port (Queue sends events here)
    httpPort: 2020

    # Queue service settings
    timeoutSeconds: 5
    messageQueueCapacity: 10000
    numPublishingThreads: 10
    maxPayloadBatchSize: 100

    # S3 sink settings
    batchTimeoutSecs: 30
    batchMaxBytes: 10485760          # 10MB
    compression: "gzip"

    # Vector sidecar resources
    resources:
      cpu: "200m"
      memory: "256Mi"

  # ... rest of QueryService spec

With GreptimeDB (Real-time Queries)

apiVersion: e6data.io/v1alpha1
kind: QueryService
metadata:
  name: analytics-cluster
  namespace: workspace-prod
spec:
  queryHistory:
    enabled: true
    s3Prefix: "query-history"

    # GreptimeDB for real-time queries
    greptimeRef:
      name: greptime-prod
      namespace: greptime-system
      database: "public"           # GreptimeDB database
      table: "query_history"       # Table for events

  # ... rest of spec

Spec Reference

QueryHistorySpec

Field Type Required Default Description
enabled bool No true Enable query history
image QueryHistoryImageSpec No - Vector image config
s3Prefix string No query-history S3 key prefix
s3Partition string No hour Time partition (hour/day)
httpPort int32 No 2020 Vector HTTP server port
timeoutSeconds int32 No 5 Queue publish timeout
messageQueueCapacity int32 No 10000 Queue buffer size
numPublishingThreads int32 No 10 Concurrent publishers
maxPayloadBatchSize int32 No 100 Events per batch
batchTimeoutSecs int32 No 30 S3 batch flush timeout
batchMaxBytes int64 No 10485760 S3 batch size limit
compression string No gzip Compression (gzip/zstd/none)
resources ResourceSpec No - Vector container resources
greptimeRef QueryHistoryGreptimeRef No - GreptimeDB config

QueryHistoryImageSpec

Field Type Default Description
repository string timberio/vector Image repository
tag string 0.41.0-alpine Image tag
pullPolicy string IfNotPresent Pull policy

QueryHistoryGreptimeRef

Field Type Required Default Description
name string Yes - GreptimeDBCluster CR name
namespace string No Same namespace GreptimeDBCluster namespace
database string No public Database name
table string No query_history Table name

Data Flow

1. Query Execution

When a query executes:

  1. Queue service receives query from Planner
  2. Query events are generated (start, complete/failed)
  3. Events sent to Vector sidecar via HTTP (http://localhost:2020/query_history)

2. Event Processing

Vector sidecar processes events:

  1. Receives HTTP JSON events
  2. Enriches with cluster metadata
  3. Batches events for efficiency
  4. Writes to S3 (always) and GreptimeDB (if configured)

3. Storage Layout

S3 Structure:

s3://bucket/query-history/
├── year=2024/
│   ├── month=12/
│   │   ├── day=09/
│   │   │   ├── hour=10/
│   │   │   │   ├── events-001.json.gz
│   │   │   │   ├── events-002.json.gz
│   │   │   │   └── events-003.json.gz

GreptimeDB Table:

CREATE TABLE query_history (
  timestamp TIMESTAMP TIME INDEX,
  cluster_id STRING,
  workspace STRING,
  query_id STRING PRIMARY KEY,
  user_id STRING,
  query_text STRING,
  status STRING,
  duration_ms INT64,
  rows_returned INT64,
  bytes_scanned INT64,
  error_message STRING
);


Query Event Schema

Common Fields

Field Type Description
timestamp ISO 8601 Event timestamp
event_type string query_start, query_complete, query_failed
cluster_id string QueryService cluster UUID
workspace string Workspace name
query_id string Unique query identifier
user_id string User who submitted query

query_start Event

{
  "timestamp": "2024-12-09T10:30:00.123Z",
  "event_type": "query_start",
  "cluster_id": "analytics-cluster",
  "workspace": "analytics-prod",
  "query_id": "q-12345678",
  "user_id": "user@example.com",
  "query_text": "SELECT * FROM sales WHERE date > '2024-01-01'",
  "catalog": "data-lake",
  "database": "sales"
}

query_complete Event

{
  "timestamp": "2024-12-09T10:30:02.456Z",
  "event_type": "query_complete",
  "cluster_id": "analytics-cluster",
  "workspace": "analytics-prod",
  "query_id": "q-12345678",
  "user_id": "user@example.com",
  "duration_ms": 2333,
  "rows_returned": 1500,
  "bytes_scanned": 52428800,
  "executors_used": 4,
  "queue_wait_ms": 50,
  "planning_ms": 150,
  "execution_ms": 2133
}

query_failed Event

{
  "timestamp": "2024-12-09T10:30:01.789Z",
  "event_type": "query_failed",
  "cluster_id": "analytics-cluster",
  "workspace": "analytics-prod",
  "query_id": "q-12345679",
  "user_id": "user@example.com",
  "error_code": "SYNTAX_ERROR",
  "error_message": "Syntax error at line 1, column 15: unexpected token 'FORM'"
}

Querying History

From S3 (Archival)

# List files
aws s3 ls s3://bucket/query-history/year=2024/month=12/day=09/hour=10/

# Download and decompress
aws s3 cp s3://bucket/query-history/year=2024/month=12/day=09/hour=10/events-001.json.gz - | \
  gunzip | jq '.[] | select(.event_type == "query_complete")'

From GreptimeDB (Real-time)

-- Recent queries by user
SELECT query_id, query_text, duration_ms, status
FROM query_history
WHERE user_id = 'user@example.com'
  AND timestamp > now() - interval '1 hour'
ORDER BY timestamp DESC
LIMIT 100;

-- Average query duration by hour
SELECT
  date_bin('1 hour', timestamp) as hour,
  count(*) as query_count,
  avg(duration_ms) as avg_duration
FROM query_history
WHERE event_type = 'query_complete'
  AND timestamp > now() - interval '24 hours'
GROUP BY 1
ORDER BY 1;

-- Slowest queries
SELECT query_id, query_text, duration_ms, bytes_scanned
FROM query_history
WHERE event_type = 'query_complete'
  AND timestamp > now() - interval '24 hours'
ORDER BY duration_ms DESC
LIMIT 10;

-- Error rate
SELECT
  count(*) FILTER (WHERE event_type = 'query_failed') as failed,
  count(*) FILTER (WHERE event_type = 'query_complete') as succeeded,
  count(*) FILTER (WHERE event_type = 'query_failed')::float / count(*) * 100 as error_rate
FROM query_history
WHERE timestamp > now() - interval '1 hour';

Monitoring & Troubleshooting

Check Vector Sidecar

# Verify Vector is running in Queue pod
kubectl get pods -l app=queue -o jsonpath='{.items[*].spec.containers[*].name}'

# Check Vector logs
kubectl logs -l app=queue -c vector --tail=100

# Check for HTTP errors
kubectl logs -l app=queue -c vector | grep -i error

Check Queue Publishing

# Queue logs for publishing status
kubectl logs -l app=queue -c queue | grep -i "query_history\|vector"

Verify S3 Output

# List recent files
BUCKET=$(kubectl get mds -o jsonpath='{.spec.storageBackend}' | sed 's|s3a://||')
aws s3 ls s3://$BUCKET/query-history/ --recursive | tail -20

# Check file contents
aws s3 cp s3://$BUCKET/query-history/year=2024/month=12/day=09/hour=10/events-001.json.gz - | \
  gunzip | head -5

Check GreptimeDB

# Port-forward to GreptimeDB
kubectl port-forward svc/greptime-prod-frontend 4000:4000 -n greptime-system

# Query recent events
curl -X POST http://localhost:4000/v1/sql -d 'sql=SELECT count(*) FROM query_history'

Best Practices

1. Resource Sizing

Query Volume Vector CPU Vector Memory
< 100 qps 100m 128Mi
100-500 qps 200m 256Mi
500-1000 qps 500m 512Mi
> 1000 qps 1000m 1Gi

2. Retention Strategy

  1. S3: Keep forever (low cost), use lifecycle policies for older data
  2. GreptimeDB: Keep 7-30 days for real-time queries
  3. Use compression: Always enable gzip/zstd

3. Query Optimization

  1. Use GreptimeDB for recent data (< 7 days)
  2. Use S3 + Athena/Spark for historical analysis
  3. Partition wisely: hourly for high volume, daily for low volume

4. Security

  1. Redact sensitive data before logging
  2. Control access via IAM policies
  3. Encrypt at rest in both S3 and GreptimeDB