Governance¶
API Version: e6data.io/v1alpha1 Kind: Governance Short Names: gov, governance
1. Purpose¶
Governance defines data access control policies for e6data catalogs. It supports three types of policies:
- GRANT_ACCESS: Allow or deny access to databases, tables, or columns
- COLUMN_MASKING: Mask sensitive data in query results
- ROW_FILTERING: Filter rows based on user context
2. High-level Behavior¶
Kubernetes-Native Policy Sync¶
When you create a Governance CR, the storage service directly watches and reads policies from Kubernetes:
- Storage service polls all Governance CRs in the namespace every 30 seconds
- Merges policies from all Governance CRs for the Ranger authorization engine
- Applies policies to incoming queries in real-time
This approach is Kubernetes-native and doesn't require intermediate cloud storage for policies.
Architecture¶
┌─────────────────────────────────────────────────────────────┐
│ Kubernetes Namespace │
│ │
│ ┌──────────────┐ polls every 30s ┌────────────────┐ │
│ │ Governance │ ◄───────────────── │ Storage │ │
│ │ CR (sales) │ │ Service │ │
│ └──────────────┘ │ │ │
│ │ ┌────────────┐ │ │
│ ┌──────────────┐ │ │ Governance │ │ │
│ │ Governance │ ◄───────────────── │ │ CRWatcher │ │ │
│ │ CR (pii) │ │ └────────────┘ │ │
│ └──────────────┘ │ │ │ │
│ │ ▼ │ │
│ │ ┌────────────┐ │ │
│ │ │ Ranger │ │ │
│ │ │ Authorizer │ │ │
│ │ └────────────┘ │ │
│ └────────────────┘ │
└─────────────────────────────────────────────────────────────┘
RBAC Requirements¶
The storage service ServiceAccount needs permission to read Governance CRs:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: governance-reader
namespace: <workspace-namespace>
rules:
- apiGroups: ["e6data.io"]
resources: ["governances"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: storage-governance-reader
namespace: <workspace-namespace>
subjects:
- kind: ServiceAccount
name: <storage-service-account>
namespace: <workspace-namespace>
roleRef:
kind: Role
name: governance-reader
apiGroup: rbac.authorization.k8s.io
3. Spec Reference¶
3.1 Top-level Fields¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
catalogName | string | Yes | - | E6Catalog name this governance applies to (must exist in same namespace) |
policies | []Policy | No | [] | List of governance policies |
Note: The catalogName field references an E6Catalog CR by name in the same namespace. The catalog must exist before creating governance policies for it.
3.2 Policy¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
name | string | Yes | - | Policy name (unique identifier) |
type | string | Yes | - | GRANT_ACCESS, COLUMN_MASKING, or ROW_FILTERING |
resources | []Resource | Yes | - | Resources this policy applies to |
users | []string | No | [] | Users this policy applies to |
groups | []string | No | [] | Groups this policy applies to |
allow | bool | No | false | Allow access (for GRANT_ACCESS) |
maskType | string | No | - | Masking type (for COLUMN_MASKING) |
rowFilter | string | No | - | Filter expression (for ROW_FILTERING) |
3.3 Resource¶
| Field | Type | Required | Description |
|---|---|---|---|
catalog | string | No | Catalog name (default: all) |
database | string | No | Database name (default: all) |
table | string | No | Table name (default: all) |
column | string | No | Column name (for column policies) |
3.4 Mask Types¶
| Type | Description | Example Output |
|---|---|---|
MASK | Replace with asterisks | **** |
MASK_HASH | SHA-256 hash | a1b2c3d4... |
MASK_SHOW_LAST_4 | Show last 4 characters | ****1234 |
MASK_SHOW_FIRST_4 | Show first 4 characters | 1234**** |
4. Example Manifests¶
4.1 Grant Access Policy¶
apiVersion: e6data.io/v1alpha1
kind: Governance
metadata:
name: sales-access
namespace: workspace-analytics-prod
spec:
catalogName: data-lake
policies:
# Allow data-analysts group access to sales database
- name: sales-analyst-access
type: GRANT_ACCESS
allow: true
resources:
- database: sales
groups:
- data-analysts
- business-intelligence
# Deny access to sensitive HR tables
- name: deny-hr-tables
type: GRANT_ACCESS
allow: false
resources:
- database: hr
table: salaries
- database: hr
table: performance_reviews
groups:
- data-analysts # Analysts can't see HR data
4.2 Column Masking Policy¶
apiVersion: e6data.io/v1alpha1
kind: Governance
metadata:
name: pii-masking
namespace: workspace-analytics-prod
spec:
catalogName: data-lake
policies:
# Mask SSN - show last 4 digits
- name: mask-ssn
type: COLUMN_MASKING
maskType: MASK_SHOW_LAST_4
resources:
- database: customers
table: users
column: ssn
groups:
- data-analysts
# Fully mask credit card numbers
- name: mask-credit-card
type: COLUMN_MASKING
maskType: MASK
resources:
- database: payments
table: transactions
column: credit_card_number
groups:
- data-analysts
- business-intelligence
# Hash email for analytics
- name: hash-email
type: COLUMN_MASKING
maskType: MASK_HASH
resources:
- database: customers
table: users
column: email
groups:
- marketing-analytics
4.3 Row Filtering Policy¶
apiVersion: e6data.io/v1alpha1
kind: Governance
metadata:
name: regional-filtering
namespace: workspace-analytics-prod
spec:
catalogName: data-lake
policies:
# US team can only see US data
- name: us-region-filter
type: ROW_FILTERING
rowFilter: "region = 'US'"
resources:
- database: sales
table: orders
groups:
- us-sales-team
# EU team can only see EU data
- name: eu-region-filter
type: ROW_FILTERING
rowFilter: "region = 'EU'"
resources:
- database: sales
table: orders
groups:
- eu-sales-team
# Filter by user's department
- name: department-filter
type: ROW_FILTERING
rowFilter: "department = {user.department}" # Dynamic filter
resources:
- database: hr
table: employees
groups:
- managers
4.4 Combined Policies¶
apiVersion: e6data.io/v1alpha1
kind: Governance
metadata:
name: comprehensive-governance
namespace: workspace-analytics-prod
spec:
catalogName: data-lake
policies:
# Access control
- name: analytics-team-access
type: GRANT_ACCESS
allow: true
resources:
- database: analytics
- database: marketing
- database: sales
groups:
- analytics-team
- name: deny-finance-pii
type: GRANT_ACCESS
allow: false
resources:
- database: finance
table: payroll
groups:
- analytics-team
# Column masking
- name: mask-all-ssn
type: COLUMN_MASKING
maskType: MASK_SHOW_LAST_4
resources:
- column: ssn # Any table with ssn column
groups:
- analytics-team
- name: mask-phone-numbers
type: COLUMN_MASKING
maskType: MASK
resources:
- database: customers
column: phone
groups:
- analytics-team
# Row filtering
- name: active-customers-only
type: ROW_FILTERING
rowFilter: "status = 'active'"
resources:
- database: customers
table: profiles
groups:
- marketing-team
4.5 User-Specific Policy¶
apiVersion: e6data.io/v1alpha1
kind: Governance
metadata:
name: admin-override
namespace: workspace-analytics-prod
spec:
catalogName: data-lake
policies:
# Specific user gets full access (no masking)
- name: admin-full-access
type: GRANT_ACCESS
allow: true
resources:
- catalog: "*" # All catalogs
users:
- admin@company.com
- dba@company.com
# Specific user exempted from row filters
- name: compliance-full-view
type: ROW_FILTERING
rowFilter: "1=1" # No filter (see all rows)
resources:
- database: "*"
users:
- compliance@company.com
5. Status & Lifecycle¶
5.1 Status Fields¶
| Field | Type | Description |
|---|---|---|
phase | string | Current phase |
message | string | Status message |
lastSyncTime | Time | When policies were last synced |
policyCount | int | Number of policies synced |
bucketPath | string | Object storage path for policies |
5.2 Phase Values¶
| Phase | Description |
|---|---|
Pending | Initial state |
Syncing | Uploading policies to storage |
Synced | Policies successfully synced |
Failed | Sync failed |
5.3 Example Status¶
status:
phase: Synced
message: "8 policies synced successfully"
lastSyncTime: "2024-01-15T10:30:00Z"
policyCount: 8
bucketPath: "s3://data-lake/governance/policies"
6. Related Resources¶
Dependencies¶
| CRD | Relationship |
|---|---|
| E6Catalog | Governance targets a specific catalog by name |
| MetadataServices | Enables governance with governance.enabled: true |
Integration with MetadataServices¶
The MetadataServices CR must have governance enabled:
apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
name: analytics-prod
spec:
# ... other config ...
governance:
enabled: true
provider: ranger # Uses K8s-native Governance CRs
filtering:
catalog: true
schema: true
table: true
column: true
queryRewriting:
enabled: true # Required for ROW_FILTERING and COLUMN_MASKING
When governance.enabled: true with provider: ranger, the storage service automatically watches Governance CRs in the namespace via RANGER_POLICY_SOURCE=K8S.
7. Troubleshooting¶
7.1 Common Issues¶
Policies Not Applying¶
Symptoms: Queries return data that should be masked/filtered.
Checks:
# Verify Governance CR exists
kubectl get gov sales-access -o yaml
# Verify MetadataServices governance is enabled
kubectl get mds analytics-prod -o jsonpath='{.spec.governance}'
# Check storage service logs for governance polling
kubectl logs -l app.kubernetes.io/name=storage | grep -i governance
# Verify RANGER_POLICY_SOURCE is set
kubectl get configmap analytics-prod-storage-blue -o yaml | grep RANGER_POLICY_SOURCE
RBAC Permission Denied¶
Symptoms: Storage service logs show "forbidden" errors when reading Governance CRs.
Cause: Storage service ServiceAccount doesn't have permission to read Governance CRs.
Fix: Create the required RBAC (see RBAC Requirements in section 2).
# Check if storage SA can read governance
kubectl auth can-i get governances.e6data.io --as=system:serviceaccount:<ns>:<storage-sa> -n <ns>
# Verify RoleBinding exists
kubectl get rolebinding -n <namespace> | grep governance
Policy Count Mismatch¶
Symptoms: Expected policies not being applied.
Check:
# List all Governance CRs in namespace
kubectl get gov -n <namespace>
# View policies in each CR
kubectl get gov sales-access -o jsonpath='{.spec.policies[*].name}'
# Check storage logs for policy count
kubectl logs -l app.kubernetes.io/name=storage | grep -i "Synced.*policies"
Governance CR Not Picked Up¶
Symptoms: New Governance CR created but policies not applied.
Cause: Storage service polls every 30 seconds. Wait for the next sync cycle.
Check:
# Watch storage logs for sync
kubectl logs -f -l app.kubernetes.io/name=storage | grep -i governance
7.2 Useful Commands¶
# Get governance CR details
kubectl get gov sales-access -o yaml
# List all governance policies in namespace
kubectl get gov
# View policies defined in a Governance CR
kubectl get gov sales-access -o jsonpath='{.spec.policies}' | jq
# Check storage service config
kubectl get configmap analytics-prod-storage-blue -o yaml
# Verify RBAC for storage service
kubectl auth can-i list governances.e6data.io \
--as=system:serviceaccount:<ns>:<storage-sa> -n <ns>
# Watch storage service logs for governance sync
kubectl logs -f -l app.kubernetes.io/name=storage --tail=100 | grep -i governance
# Restart storage to force immediate sync
kubectl rollout restart deployment analytics-prod-storage-blue
8. Best Practices¶
8.1 Policy Organization¶
- Separate Governance CRs by concern:
pii-masking- All column masking policiesaccess-control- All GRANT_ACCESS policies-
regional-filtering- All row filtering policies -
Use groups over users for maintainability
-
Document policy intent in CR annotations:
8.2 Security Recommendations¶
- Deny by default: Don't grant broad access; be explicit
- Mask PII always: SSN, credit cards, phone numbers
- Audit policy changes: Use GitOps for governance CRs
- Test policies: Verify masking/filtering in non-prod first
8.3 Row Filter Expressions¶
| Pattern | Example | Description |
|---|---|---|
| Exact match | region = 'US' | Only US region |
| Multiple values | region IN ('US', 'CA') | US or Canada |
| Date range | created_at > '2024-01-01' | Recent data only |
| User context | owner = {user.email} | User's own data |
| Boolean | is_public = true | Public records only |