Query History Guide¶
This guide covers query event logging and history tracking in QueryService.
Overview¶
Query history captures detailed information about every query executed through QueryService:
- Query text and parameters
- Execution timing and duration
- Resource usage
- Success/failure status
- User and cluster context
Architecture¶
┌─────────────────────────────────────────────────────────────────────────┐
│ QueryService Pod │
│ │
│ ┌────────────────────┐ ┌─────────────────────────────────────┐ │
│ │ Queue Service │ │ Vector Sidecar │ │
│ │ │ │ │ │
│ │ Query Events: │ HTTP │ Sources: │ │
│ │ - query_start │────────│ http_server (:2020) │ │
│ │ - query_complete │ │ │ │
│ │ - query_failed │ │ Sinks: │ │
│ │ │ │ S3 (archival, JSON/gzip) │──┼──▶ S3
│ └────────────────────┘ │ GreptimeDB (real-time, optional) │──┼──▶ GreptimeDB
│ └─────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
Configuration¶
Basic S3 Setup¶
apiVersion: e6data.io/v1alpha1
kind: QueryService
metadata:
name: analytics-cluster
namespace: workspace-prod
spec:
alias: analytics
workspace: analytics-prod
# Query history configuration
queryHistory:
enabled: true
# S3 destination (bucket borrowed from MetadataServices)
s3Prefix: "query-history" # Key prefix
s3Partition: "hour" # Partition by hour/day
# Vector HTTP server port (Queue sends events here)
httpPort: 2020
# Queue service settings
timeoutSeconds: 5
messageQueueCapacity: 10000
numPublishingThreads: 10
maxPayloadBatchSize: 100
# S3 sink settings
batchTimeoutSecs: 30
batchMaxBytes: 10485760 # 10MB
compression: "gzip"
# Vector sidecar resources
resources:
cpu: "200m"
memory: "256Mi"
# ... rest of QueryService spec
With GreptimeDB (Real-time Queries)¶
apiVersion: e6data.io/v1alpha1
kind: QueryService
metadata:
name: analytics-cluster
namespace: workspace-prod
spec:
queryHistory:
enabled: true
s3Prefix: "query-history"
# GreptimeDB for real-time queries
greptimeRef:
name: greptime-prod
namespace: greptime-system
database: "public" # GreptimeDB database
table: "query_history" # Table for events
# ... rest of spec
Spec Reference¶
QueryHistorySpec¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
enabled | bool | No | true | Enable query history |
image | QueryHistoryImageSpec | No | - | Vector image config |
s3Prefix | string | No | query-history | S3 key prefix |
s3Partition | string | No | hour | Time partition (hour/day) |
httpPort | int32 | No | 2020 | Vector HTTP server port |
timeoutSeconds | int32 | No | 5 | Queue publish timeout |
messageQueueCapacity | int32 | No | 10000 | Queue buffer size |
numPublishingThreads | int32 | No | 10 | Concurrent publishers |
maxPayloadBatchSize | int32 | No | 100 | Events per batch |
batchTimeoutSecs | int32 | No | 30 | S3 batch flush timeout |
batchMaxBytes | int64 | No | 10485760 | S3 batch size limit |
compression | string | No | gzip | Compression (gzip/zstd/none) |
resources | ResourceSpec | No | - | Vector container resources |
greptimeRef | QueryHistoryGreptimeRef | No | - | GreptimeDB config |
QueryHistoryImageSpec¶
| Field | Type | Default | Description |
|---|---|---|---|
repository | string | timberio/vector | Image repository |
tag | string | 0.41.0-alpine | Image tag |
pullPolicy | string | IfNotPresent | Pull policy |
QueryHistoryGreptimeRef¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
name | string | Yes | - | GreptimeDBCluster CR name |
namespace | string | No | Same namespace | GreptimeDBCluster namespace |
database | string | No | public | Database name |
table | string | No | query_history | Table name |
Data Flow¶
1. Query Execution¶
When a query executes:
- Queue service receives query from Planner
- Query events are generated (start, complete/failed)
- Events sent to Vector sidecar via HTTP (
http://localhost:2020/query_history)
2. Event Processing¶
Vector sidecar processes events:
- Receives HTTP JSON events
- Enriches with cluster metadata
- Batches events for efficiency
- Writes to S3 (always) and GreptimeDB (if configured)
3. Storage Layout¶
S3 Structure:
s3://bucket/query-history/
├── year=2024/
│ ├── month=12/
│ │ ├── day=09/
│ │ │ ├── hour=10/
│ │ │ │ ├── events-001.json.gz
│ │ │ │ ├── events-002.json.gz
│ │ │ │ └── events-003.json.gz
GreptimeDB Table:
CREATE TABLE query_history (
timestamp TIMESTAMP TIME INDEX,
cluster_id STRING,
workspace STRING,
query_id STRING PRIMARY KEY,
user_id STRING,
query_text STRING,
status STRING,
duration_ms INT64,
rows_returned INT64,
bytes_scanned INT64,
error_message STRING
);
Query Event Schema¶
Common Fields¶
| Field | Type | Description |
|---|---|---|
timestamp | ISO 8601 | Event timestamp |
event_type | string | query_start, query_complete, query_failed |
cluster_id | string | QueryService cluster UUID |
workspace | string | Workspace name |
query_id | string | Unique query identifier |
user_id | string | User who submitted query |
query_start Event¶
{
"timestamp": "2024-12-09T10:30:00.123Z",
"event_type": "query_start",
"cluster_id": "analytics-cluster",
"workspace": "analytics-prod",
"query_id": "q-12345678",
"user_id": "user@example.com",
"query_text": "SELECT * FROM sales WHERE date > '2024-01-01'",
"catalog": "data-lake",
"database": "sales"
}
query_complete Event¶
{
"timestamp": "2024-12-09T10:30:02.456Z",
"event_type": "query_complete",
"cluster_id": "analytics-cluster",
"workspace": "analytics-prod",
"query_id": "q-12345678",
"user_id": "user@example.com",
"duration_ms": 2333,
"rows_returned": 1500,
"bytes_scanned": 52428800,
"executors_used": 4,
"queue_wait_ms": 50,
"planning_ms": 150,
"execution_ms": 2133
}
query_failed Event¶
{
"timestamp": "2024-12-09T10:30:01.789Z",
"event_type": "query_failed",
"cluster_id": "analytics-cluster",
"workspace": "analytics-prod",
"query_id": "q-12345679",
"user_id": "user@example.com",
"error_code": "SYNTAX_ERROR",
"error_message": "Syntax error at line 1, column 15: unexpected token 'FORM'"
}
Querying History¶
From S3 (Archival)¶
# List files
aws s3 ls s3://bucket/query-history/year=2024/month=12/day=09/hour=10/
# Download and decompress
aws s3 cp s3://bucket/query-history/year=2024/month=12/day=09/hour=10/events-001.json.gz - | \
gunzip | jq '.[] | select(.event_type == "query_complete")'
From GreptimeDB (Real-time)¶
-- Recent queries by user
SELECT query_id, query_text, duration_ms, status
FROM query_history
WHERE user_id = 'user@example.com'
AND timestamp > now() - interval '1 hour'
ORDER BY timestamp DESC
LIMIT 100;
-- Average query duration by hour
SELECT
date_bin('1 hour', timestamp) as hour,
count(*) as query_count,
avg(duration_ms) as avg_duration
FROM query_history
WHERE event_type = 'query_complete'
AND timestamp > now() - interval '24 hours'
GROUP BY 1
ORDER BY 1;
-- Slowest queries
SELECT query_id, query_text, duration_ms, bytes_scanned
FROM query_history
WHERE event_type = 'query_complete'
AND timestamp > now() - interval '24 hours'
ORDER BY duration_ms DESC
LIMIT 10;
-- Error rate
SELECT
count(*) FILTER (WHERE event_type = 'query_failed') as failed,
count(*) FILTER (WHERE event_type = 'query_complete') as succeeded,
count(*) FILTER (WHERE event_type = 'query_failed')::float / count(*) * 100 as error_rate
FROM query_history
WHERE timestamp > now() - interval '1 hour';
Monitoring & Troubleshooting¶
Check Vector Sidecar¶
# Verify Vector is running in Queue pod
kubectl get pods -l app=queue -o jsonpath='{.items[*].spec.containers[*].name}'
# Check Vector logs
kubectl logs -l app=queue -c vector --tail=100
# Check for HTTP errors
kubectl logs -l app=queue -c vector | grep -i error
Check Queue Publishing¶
# Queue logs for publishing status
kubectl logs -l app=queue -c queue | grep -i "query_history\|vector"
Verify S3 Output¶
# List recent files
BUCKET=$(kubectl get mds -o jsonpath='{.spec.storageBackend}' | sed 's|s3a://||')
aws s3 ls s3://$BUCKET/query-history/ --recursive | tail -20
# Check file contents
aws s3 cp s3://$BUCKET/query-history/year=2024/month=12/day=09/hour=10/events-001.json.gz - | \
gunzip | head -5
Check GreptimeDB¶
# Port-forward to GreptimeDB
kubectl port-forward svc/greptime-prod-frontend 4000:4000 -n greptime-system
# Query recent events
curl -X POST http://localhost:4000/v1/sql -d 'sql=SELECT count(*) FROM query_history'
Best Practices¶
1. Resource Sizing¶
| Query Volume | Vector CPU | Vector Memory |
|---|---|---|
| < 100 qps | 100m | 128Mi |
| 100-500 qps | 200m | 256Mi |
| 500-1000 qps | 500m | 512Mi |
| > 1000 qps | 1000m | 1Gi |
2. Retention Strategy¶
- S3: Keep forever (low cost), use lifecycle policies for older data
- GreptimeDB: Keep 7-30 days for real-time queries
- Use compression: Always enable gzip/zstd
3. Query Optimization¶
- Use GreptimeDB for recent data (< 7 days)
- Use S3 + Athena/Spark for historical analysis
- Partition wisely: hourly for high volume, daily for low volume
4. Security¶
- Redact sensitive data before logging
- Control access via IAM policies
- Encrypt at rest in both S3 and GreptimeDB
Related Documentation¶
- QueryService - Full QueryService spec
- GreptimeDBCluster - GreptimeDB setup
- MonitoringServices - General log/metrics collection
- Autoscaling - Executor autoscaling configuration