Skip to content

Governance

API Version: e6data.io/v1alpha1 Kind: Governance Short Names: gov, governance


1. Purpose

Governance defines data access control policies for e6data catalogs. It supports three types of policies:

  • GRANT_ACCESS: Allow or deny access to databases, tables, or columns
  • COLUMN_MASKING: Mask sensitive data in query results
  • ROW_FILTERING: Filter rows based on user context

2. High-level Behavior

Kubernetes-Native Policy Sync

When you create a Governance CR, the storage service directly watches and reads policies from Kubernetes:

  1. Storage service polls all Governance CRs in the namespace every 30 seconds
  2. Merges policies from all Governance CRs for the Ranger authorization engine
  3. Applies policies to incoming queries in real-time

This approach is Kubernetes-native and doesn't require intermediate cloud storage for policies.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Kubernetes Namespace                      │
│                                                              │
│  ┌──────────────┐   polls every 30s   ┌────────────────┐    │
│  │  Governance  │ ◄─────────────────  │ Storage        │    │
│  │  CR (sales)  │                     │ Service        │    │
│  └──────────────┘                     │                │    │
│                                       │ ┌────────────┐ │    │
│  ┌──────────────┐                     │ │ Governance │ │    │
│  │  Governance  │ ◄─────────────────  │ │ CRWatcher  │ │    │
│  │  CR (pii)    │                     │ └────────────┘ │    │
│  └──────────────┘                     │       │        │    │
│                                       │       ▼        │    │
│                                       │ ┌────────────┐ │    │
│                                       │ │  Ranger    │ │    │
│                                       │ │ Authorizer │ │    │
│                                       │ └────────────┘ │    │
│                                       └────────────────┘    │
└─────────────────────────────────────────────────────────────┘

RBAC Requirements

The storage service ServiceAccount needs permission to read Governance CRs:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: governance-reader
  namespace: <workspace-namespace>
rules:
- apiGroups: ["e6data.io"]
  resources: ["governances"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: storage-governance-reader
  namespace: <workspace-namespace>
subjects:
- kind: ServiceAccount
  name: <storage-service-account>
  namespace: <workspace-namespace>
roleRef:
  kind: Role
  name: governance-reader
  apiGroup: rbac.authorization.k8s.io

3. Spec Reference

3.1 Top-level Fields

Field Type Required Default Description
catalogName string Yes - E6Catalog name this governance applies to (must exist in same namespace)
policies []Policy No [] List of governance policies

Note: The catalogName field references an E6Catalog CR by name in the same namespace. The catalog must exist before creating governance policies for it.

3.2 Policy

Field Type Required Default Description
name string Yes - Policy name (unique identifier)
type string Yes - GRANT_ACCESS, COLUMN_MASKING, or ROW_FILTERING
resources []Resource Yes - Resources this policy applies to
users []string No [] Users this policy applies to
groups []string No [] Groups this policy applies to
allow bool No false Allow access (for GRANT_ACCESS)
maskType string No - Masking type (for COLUMN_MASKING)
rowFilter string No - Filter expression (for ROW_FILTERING)

3.3 Resource

Field Type Required Description
catalog string No Catalog name (default: all)
database string No Database name (default: all)
table string No Table name (default: all)
column string No Column name (for column policies)

3.4 Mask Types

Type Description Example Output
MASK Replace with asterisks ****
MASK_HASH SHA-256 hash a1b2c3d4...
MASK_SHOW_LAST_4 Show last 4 characters ****1234
MASK_SHOW_FIRST_4 Show first 4 characters 1234****

4. Example Manifests

4.1 Grant Access Policy

apiVersion: e6data.io/v1alpha1
kind: Governance
metadata:
  name: sales-access
  namespace: workspace-analytics-prod
spec:
  catalogName: data-lake

  policies:
    # Allow data-analysts group access to sales database
    - name: sales-analyst-access
      type: GRANT_ACCESS
      allow: true
      resources:
        - database: sales
      groups:
        - data-analysts
        - business-intelligence

    # Deny access to sensitive HR tables
    - name: deny-hr-tables
      type: GRANT_ACCESS
      allow: false
      resources:
        - database: hr
          table: salaries
        - database: hr
          table: performance_reviews
      groups:
        - data-analysts  # Analysts can't see HR data

4.2 Column Masking Policy

apiVersion: e6data.io/v1alpha1
kind: Governance
metadata:
  name: pii-masking
  namespace: workspace-analytics-prod
spec:
  catalogName: data-lake

  policies:
    # Mask SSN - show last 4 digits
    - name: mask-ssn
      type: COLUMN_MASKING
      maskType: MASK_SHOW_LAST_4
      resources:
        - database: customers
          table: users
          column: ssn
      groups:
        - data-analysts

    # Fully mask credit card numbers
    - name: mask-credit-card
      type: COLUMN_MASKING
      maskType: MASK
      resources:
        - database: payments
          table: transactions
          column: credit_card_number
      groups:
        - data-analysts
        - business-intelligence

    # Hash email for analytics
    - name: hash-email
      type: COLUMN_MASKING
      maskType: MASK_HASH
      resources:
        - database: customers
          table: users
          column: email
      groups:
        - marketing-analytics

4.3 Row Filtering Policy

apiVersion: e6data.io/v1alpha1
kind: Governance
metadata:
  name: regional-filtering
  namespace: workspace-analytics-prod
spec:
  catalogName: data-lake

  policies:
    # US team can only see US data
    - name: us-region-filter
      type: ROW_FILTERING
      rowFilter: "region = 'US'"
      resources:
        - database: sales
          table: orders
      groups:
        - us-sales-team

    # EU team can only see EU data
    - name: eu-region-filter
      type: ROW_FILTERING
      rowFilter: "region = 'EU'"
      resources:
        - database: sales
          table: orders
      groups:
        - eu-sales-team

    # Filter by user's department
    - name: department-filter
      type: ROW_FILTERING
      rowFilter: "department = {user.department}"  # Dynamic filter
      resources:
        - database: hr
          table: employees
      groups:
        - managers

4.4 Combined Policies

apiVersion: e6data.io/v1alpha1
kind: Governance
metadata:
  name: comprehensive-governance
  namespace: workspace-analytics-prod
spec:
  catalogName: data-lake

  policies:
    # Access control
    - name: analytics-team-access
      type: GRANT_ACCESS
      allow: true
      resources:
        - database: analytics
        - database: marketing
        - database: sales
      groups:
        - analytics-team

    - name: deny-finance-pii
      type: GRANT_ACCESS
      allow: false
      resources:
        - database: finance
          table: payroll
      groups:
        - analytics-team

    # Column masking
    - name: mask-all-ssn
      type: COLUMN_MASKING
      maskType: MASK_SHOW_LAST_4
      resources:
        - column: ssn  # Any table with ssn column
      groups:
        - analytics-team

    - name: mask-phone-numbers
      type: COLUMN_MASKING
      maskType: MASK
      resources:
        - database: customers
          column: phone
      groups:
        - analytics-team

    # Row filtering
    - name: active-customers-only
      type: ROW_FILTERING
      rowFilter: "status = 'active'"
      resources:
        - database: customers
          table: profiles
      groups:
        - marketing-team

4.5 User-Specific Policy

apiVersion: e6data.io/v1alpha1
kind: Governance
metadata:
  name: admin-override
  namespace: workspace-analytics-prod
spec:
  catalogName: data-lake

  policies:
    # Specific user gets full access (no masking)
    - name: admin-full-access
      type: GRANT_ACCESS
      allow: true
      resources:
        - catalog: "*"  # All catalogs
      users:
        - admin@company.com
        - dba@company.com

    # Specific user exempted from row filters
    - name: compliance-full-view
      type: ROW_FILTERING
      rowFilter: "1=1"  # No filter (see all rows)
      resources:
        - database: "*"
      users:
        - compliance@company.com

5. Status & Lifecycle

5.1 Status Fields

Field Type Description
phase string Current phase
message string Status message
lastSyncTime Time When policies were last synced
policyCount int Number of policies synced
bucketPath string Object storage path for policies

5.2 Phase Values

Phase Description
Pending Initial state
Syncing Uploading policies to storage
Synced Policies successfully synced
Failed Sync failed

5.3 Example Status

status:
  phase: Synced
  message: "8 policies synced successfully"
  lastSyncTime: "2024-01-15T10:30:00Z"
  policyCount: 8
  bucketPath: "s3://data-lake/governance/policies"

Dependencies

CRD Relationship
E6Catalog Governance targets a specific catalog by name
MetadataServices Enables governance with governance.enabled: true

Integration with MetadataServices

The MetadataServices CR must have governance enabled:

apiVersion: e6data.io/v1alpha1
kind: MetadataServices
metadata:
  name: analytics-prod
spec:
  # ... other config ...
  governance:
    enabled: true
    provider: ranger  # Uses K8s-native Governance CRs
    filtering:
      catalog: true
      schema: true
      table: true
      column: true
    queryRewriting:
      enabled: true  # Required for ROW_FILTERING and COLUMN_MASKING

When governance.enabled: true with provider: ranger, the storage service automatically watches Governance CRs in the namespace via RANGER_POLICY_SOURCE=K8S.


7. Troubleshooting

7.1 Common Issues

Policies Not Applying

Symptoms: Queries return data that should be masked/filtered.

Checks:

# Verify Governance CR exists
kubectl get gov sales-access -o yaml

# Verify MetadataServices governance is enabled
kubectl get mds analytics-prod -o jsonpath='{.spec.governance}'

# Check storage service logs for governance polling
kubectl logs -l app.kubernetes.io/name=storage | grep -i governance

# Verify RANGER_POLICY_SOURCE is set
kubectl get configmap analytics-prod-storage-blue -o yaml | grep RANGER_POLICY_SOURCE

RBAC Permission Denied

Symptoms: Storage service logs show "forbidden" errors when reading Governance CRs.

Cause: Storage service ServiceAccount doesn't have permission to read Governance CRs.

Fix: Create the required RBAC (see RBAC Requirements in section 2).

# Check if storage SA can read governance
kubectl auth can-i get governances.e6data.io --as=system:serviceaccount:<ns>:<storage-sa> -n <ns>

# Verify RoleBinding exists
kubectl get rolebinding -n <namespace> | grep governance

Policy Count Mismatch

Symptoms: Expected policies not being applied.

Check:

# List all Governance CRs in namespace
kubectl get gov -n <namespace>

# View policies in each CR
kubectl get gov sales-access -o jsonpath='{.spec.policies[*].name}'

# Check storage logs for policy count
kubectl logs -l app.kubernetes.io/name=storage | grep -i "Synced.*policies"

Governance CR Not Picked Up

Symptoms: New Governance CR created but policies not applied.

Cause: Storage service polls every 30 seconds. Wait for the next sync cycle.

Check:

# Watch storage logs for sync
kubectl logs -f -l app.kubernetes.io/name=storage | grep -i governance

7.2 Useful Commands

# Get governance CR details
kubectl get gov sales-access -o yaml

# List all governance policies in namespace
kubectl get gov

# View policies defined in a Governance CR
kubectl get gov sales-access -o jsonpath='{.spec.policies}' | jq

# Check storage service config
kubectl get configmap analytics-prod-storage-blue -o yaml

# Verify RBAC for storage service
kubectl auth can-i list governances.e6data.io \
  --as=system:serviceaccount:<ns>:<storage-sa> -n <ns>

# Watch storage service logs for governance sync
kubectl logs -f -l app.kubernetes.io/name=storage --tail=100 | grep -i governance

# Restart storage to force immediate sync
kubectl rollout restart deployment analytics-prod-storage-blue

8. Best Practices

8.1 Policy Organization

  1. Separate Governance CRs by concern:
  2. pii-masking - All column masking policies
  3. access-control - All GRANT_ACCESS policies
  4. regional-filtering - All row filtering policies

  5. Use groups over users for maintainability

  6. Document policy intent in CR annotations:

    metadata:
      annotations:
        e6data.io/description: "GDPR compliance - mask EU customer PII"
    

8.2 Security Recommendations

  1. Deny by default: Don't grant broad access; be explicit
  2. Mask PII always: SSN, credit cards, phone numbers
  3. Audit policy changes: Use GitOps for governance CRs
  4. Test policies: Verify masking/filtering in non-prod first

8.3 Row Filter Expressions

Pattern Example Description
Exact match region = 'US' Only US region
Multiple values region IN ('US', 'CA') US or Canada
Date range created_at > '2024-01-01' Recent data only
User context owner = {user.email} User's own data
Boolean is_public = true Public records only