Policy YAML Reference
Policies are the primary input to the Metatate governance pipeline. Each policy is a YAML document that declares metadata, scope, and a set of typed instructions.
Top-Level Structure
metadata:
id: "policy-uuid"
name: "Customer PII Protection"
version: "1.0"
description: "Classification and masking rules for customer PII"
owner: "data-governance-team"
tags:
- pii
- gdpr
- customer-data
scope:
tables:
- "DB_NAME.SCHEMA_NAME.CUSTOMERS"
- "DB_NAME.SCHEMA_NAME.ORDERS"
instructions:
- type: classification
# ...
- type: masking
# ...
Metadata
| Field | Type | Required | Description |
|---|---|---|---|
id | string | Yes | Unique identifier for the policy. Auto-generated if omitted. |
name | string | Yes | Human-readable policy name. |
version | string | Yes | Semantic version string. |
description | string | No | Explanation of the policy's purpose. |
owner | string | No | Team or individual responsible for this policy. |
tags | string[] | No | Labels for filtering and grouping policies. |
Scope
The scope section defines which tables the policy applies to.
| Field | Type | Required | Description |
|---|---|---|---|
tables | string[] | Yes | Fully qualified table names (DATABASE.SCHEMA.TABLE). |
All instructions in the policy apply to every table listed in scope unless an instruction overrides scope at the instruction level.
Instructions
Each instruction is a typed directive. The type field determines which parameters are valid.
Common Fields
Every instruction shares these fields:
| Field | Type | Required | Description |
|---|---|---|---|
type | string | Yes | One of: classification, masking, usage_guidance, ai_governance, retention, access_control |
title | string | Yes | Short label for the instruction. |
description | string | No | Detailed explanation. |
priority | integer | No | 1 (highest) to 10 (lowest). Determines precedence when multiple policies apply. Default: 5. |
parameters | object | Yes | Type-specific configuration. See below. |
scope | object | No | Instruction-level scope override. Narrows the tables or columns this instruction targets. |
Instruction Types
classification
Assigns data categories and sensitivity levels to columns.
- type: classification
title: "PII Classification"
description: "Identify and classify personally identifiable information"
priority: 1
parameters:
columns:
- name: EMAIL
data_type_id: email_address
data_type_label: "Email Address"
sensitivity: high
confidence: 0.95
category: personal_identifier
subcategory: contact_info
- name: PHONE_NUMBER
data_type_id: phone_number
data_type_label: "Phone Number"
sensitivity: high
confidence: 0.90
category: personal_identifier
subcategory: contact_info
Parameters:
| Field | Type | Description |
|---|---|---|
columns | array | List of column classification definitions. |
columns[].name | string | Column name. |
columns[].data_type_id | string | Canonical data type identifier (e.g., email_address, ssn). |
columns[].data_type_label | string | Human-readable label. |
columns[].sensitivity | string | One of: low, medium, high, critical. |
columns[].confidence | float | Classification confidence (0.0 to 1.0). |
columns[].category | string | Top-level data category. |
columns[].subcategory | string | Narrower data category. |
masking
Defines how sensitive columns should be masked or tokenized.
- type: masking
title: "Email Masking"
priority: 2
parameters:
columns:
- name: EMAIL
masking_type: partial
config:
show_first: 2
show_last: 0
mask_char: "*"
preserve_domain: true
exempt_roles:
- DATA_ENGINEER
- COMPLIANCE_OFFICER
Parameters:
| Field | Type | Description |
|---|---|---|
columns | array | List of column masking definitions. |
columns[].name | string | Column name. Must have a corresponding classification. |
columns[].masking_type | string | One of: full, partial, hash, tokenize, redact. |
columns[].config | object | Type-specific masking configuration. |
columns[].exempt_roles | string[] | Roles that see unmasked data. |
usage_guidance
Declares acceptable and prohibited uses for the data.
- type: usage_guidance
title: "Customer Data Usage Rules"
priority: 3
parameters:
allowed_purposes:
- "Customer support operations"
- "Order fulfillment"
prohibited_uses:
- "Third-party marketing"
- "Automated profiling without consent"
conditions:
- "Requires data processing agreement for external sharing"
- "Must anonymize before use in analytics"
ai_governance
Controls how AI systems may interact with the data.
- type: ai_governance
title: "AI Training Restrictions"
priority: 2
parameters:
allow_training: false
allow_inference: true
allow_embedding: false
restrictions:
- "No use in generative model training"
- "Inference results must not be stored longer than session"
required_safeguards:
- "Output filtering for PII leakage"
retention
Specifies data lifecycle and retention requirements.
- type: retention
title: "GDPR Retention Policy"
priority: 1
parameters:
period: "36 months"
trigger: "account_closure"
action: "delete"
exceptions:
- "Legal hold overrides deletion"
- "Aggregated statistics may be retained indefinitely"
access_control
Defines role-based access recommendations.
- type: access_control
title: "Tiered Access Control"
priority: 2
parameters:
roles:
- role: DATA_ANALYST
access_level: masked
conditions:
- "Must complete PII training"
- role: DATA_ENGINEER
access_level: full
conditions: []
- role: PUBLIC
access_level: denied
Instruction-Level Scope Override
Any instruction can narrow its scope to specific tables or columns within the policy's table list:
- type: classification
title: "SSN Classification"
scope:
tables:
- "DB_NAME.SCHEMA_NAME.CUSTOMERS"
columns:
- SSN
- TAX_ID
parameters:
# ...
When scope is provided at the instruction level, it must be a subset of the policy-level scope.
Complete Example
metadata:
id: "pol-customer-pii-001"
name: "Customer PII Protection"
version: "2.1"
description: "Comprehensive PII governance for customer-facing tables"
owner: "data-governance-team"
tags:
- pii
- gdpr
- production
scope:
tables:
- "ANALYTICS_DB.PUBLIC.CUSTOMERS"
- "ANALYTICS_DB.PUBLIC.CUSTOMER_CONTACTS"
instructions:
- type: classification
title: "PII Column Classification"
priority: 1
parameters:
columns:
- name: EMAIL
data_type_id: email_address
data_type_label: "Email Address"
sensitivity: high
confidence: 0.95
category: personal_identifier
subcategory: contact_info
- name: SSN
data_type_id: social_security_number
data_type_label: "Social Security Number"
sensitivity: critical
confidence: 1.0
category: personal_identifier
subcategory: government_id
- type: masking
title: "PII Masking Rules"
priority: 1
parameters:
columns:
- name: EMAIL
masking_type: partial
config:
show_first: 2
mask_char: "*"
preserve_domain: true
exempt_roles:
- COMPLIANCE_OFFICER
- name: SSN
masking_type: full
config:
replacement: "***-**-****"
exempt_roles: []
- type: usage_guidance
title: "Customer Data Usage"
priority: 3
parameters:
allowed_purposes:
- "Customer support"
- "Order processing"
prohibited_uses:
- "Third-party data sales"
- "Unsolicited marketing"
- type: ai_governance
title: "AI Restrictions"
priority: 2
parameters:
allow_training: false
allow_inference: true
allow_embedding: false
restrictions:
- "No PII in model training data"
- type: retention
title: "Data Retention"
priority: 1
parameters:
period: "36 months"
trigger: "account_closure"
action: "delete"