Skip to main content

Policy YAML Reference

Policies are the primary input to the Metatate governance pipeline. Each policy is a YAML document that declares metadata, scope, and a set of typed instructions.

Top-Level Structure

metadata:
id: "policy-uuid"
name: "Customer PII Protection"
version: "1.0"
description: "Classification and masking rules for customer PII"
owner: "data-governance-team"
tags:
- pii
- gdpr
- customer-data

scope:
tables:
- "DB_NAME.SCHEMA_NAME.CUSTOMERS"
- "DB_NAME.SCHEMA_NAME.ORDERS"

instructions:
- type: classification
# ...
- type: masking
# ...

Metadata

FieldTypeRequiredDescription
idstringYesUnique identifier for the policy. Auto-generated if omitted.
namestringYesHuman-readable policy name.
versionstringYesSemantic version string.
descriptionstringNoExplanation of the policy's purpose.
ownerstringNoTeam or individual responsible for this policy.
tagsstring[]NoLabels for filtering and grouping policies.

Scope

The scope section defines which tables the policy applies to.

FieldTypeRequiredDescription
tablesstring[]YesFully qualified table names (DATABASE.SCHEMA.TABLE).

All instructions in the policy apply to every table listed in scope unless an instruction overrides scope at the instruction level.

Instructions

Each instruction is a typed directive. The type field determines which parameters are valid.

Common Fields

Every instruction shares these fields:

FieldTypeRequiredDescription
typestringYesOne of: classification, masking, usage_guidance, ai_governance, retention, access_control
titlestringYesShort label for the instruction.
descriptionstringNoDetailed explanation.
priorityintegerNo1 (highest) to 10 (lowest). Determines precedence when multiple policies apply. Default: 5.
parametersobjectYesType-specific configuration. See below.
scopeobjectNoInstruction-level scope override. Narrows the tables or columns this instruction targets.

Instruction Types

classification

Assigns data categories and sensitivity levels to columns.

- type: classification
title: "PII Classification"
description: "Identify and classify personally identifiable information"
priority: 1
parameters:
columns:
- name: EMAIL
data_type_id: email_address
data_type_label: "Email Address"
sensitivity: high
confidence: 0.95
category: personal_identifier
subcategory: contact_info
- name: PHONE_NUMBER
data_type_id: phone_number
data_type_label: "Phone Number"
sensitivity: high
confidence: 0.90
category: personal_identifier
subcategory: contact_info

Parameters:

FieldTypeDescription
columnsarrayList of column classification definitions.
columns[].namestringColumn name.
columns[].data_type_idstringCanonical data type identifier (e.g., email_address, ssn).
columns[].data_type_labelstringHuman-readable label.
columns[].sensitivitystringOne of: low, medium, high, critical.
columns[].confidencefloatClassification confidence (0.0 to 1.0).
columns[].categorystringTop-level data category.
columns[].subcategorystringNarrower data category.

masking

Defines how sensitive columns should be masked or tokenized.

- type: masking
title: "Email Masking"
priority: 2
parameters:
columns:
- name: EMAIL
masking_type: partial
config:
show_first: 2
show_last: 0
mask_char: "*"
preserve_domain: true
exempt_roles:
- DATA_ENGINEER
- COMPLIANCE_OFFICER

Parameters:

FieldTypeDescription
columnsarrayList of column masking definitions.
columns[].namestringColumn name. Must have a corresponding classification.
columns[].masking_typestringOne of: full, partial, hash, tokenize, redact.
columns[].configobjectType-specific masking configuration.
columns[].exempt_rolesstring[]Roles that see unmasked data.

usage_guidance

Declares acceptable and prohibited uses for the data.

- type: usage_guidance
title: "Customer Data Usage Rules"
priority: 3
parameters:
allowed_purposes:
- "Customer support operations"
- "Order fulfillment"
prohibited_uses:
- "Third-party marketing"
- "Automated profiling without consent"
conditions:
- "Requires data processing agreement for external sharing"
- "Must anonymize before use in analytics"

ai_governance

Controls how AI systems may interact with the data.

- type: ai_governance
title: "AI Training Restrictions"
priority: 2
parameters:
allow_training: false
allow_inference: true
allow_embedding: false
restrictions:
- "No use in generative model training"
- "Inference results must not be stored longer than session"
required_safeguards:
- "Output filtering for PII leakage"

retention

Specifies data lifecycle and retention requirements.

- type: retention
title: "GDPR Retention Policy"
priority: 1
parameters:
period: "36 months"
trigger: "account_closure"
action: "delete"
exceptions:
- "Legal hold overrides deletion"
- "Aggregated statistics may be retained indefinitely"

access_control

Defines role-based access recommendations.

- type: access_control
title: "Tiered Access Control"
priority: 2
parameters:
roles:
- role: DATA_ANALYST
access_level: masked
conditions:
- "Must complete PII training"
- role: DATA_ENGINEER
access_level: full
conditions: []
- role: PUBLIC
access_level: denied

Instruction-Level Scope Override

Any instruction can narrow its scope to specific tables or columns within the policy's table list:

- type: classification
title: "SSN Classification"
scope:
tables:
- "DB_NAME.SCHEMA_NAME.CUSTOMERS"
columns:
- SSN
- TAX_ID
parameters:
# ...

When scope is provided at the instruction level, it must be a subset of the policy-level scope.

Complete Example

metadata:
id: "pol-customer-pii-001"
name: "Customer PII Protection"
version: "2.1"
description: "Comprehensive PII governance for customer-facing tables"
owner: "data-governance-team"
tags:
- pii
- gdpr
- production

scope:
tables:
- "ANALYTICS_DB.PUBLIC.CUSTOMERS"
- "ANALYTICS_DB.PUBLIC.CUSTOMER_CONTACTS"

instructions:
- type: classification
title: "PII Column Classification"
priority: 1
parameters:
columns:
- name: EMAIL
data_type_id: email_address
data_type_label: "Email Address"
sensitivity: high
confidence: 0.95
category: personal_identifier
subcategory: contact_info
- name: SSN
data_type_id: social_security_number
data_type_label: "Social Security Number"
sensitivity: critical
confidence: 1.0
category: personal_identifier
subcategory: government_id

- type: masking
title: "PII Masking Rules"
priority: 1
parameters:
columns:
- name: EMAIL
masking_type: partial
config:
show_first: 2
mask_char: "*"
preserve_domain: true
exempt_roles:
- COMPLIANCE_OFFICER
- name: SSN
masking_type: full
config:
replacement: "***-**-****"
exempt_roles: []

- type: usage_guidance
title: "Customer Data Usage"
priority: 3
parameters:
allowed_purposes:
- "Customer support"
- "Order processing"
prohibited_uses:
- "Third-party data sales"
- "Unsolicited marketing"

- type: ai_governance
title: "AI Restrictions"
priority: 2
parameters:
allow_training: false
allow_inference: true
allow_embedding: false
restrictions:
- "No PII in model training data"

- type: retention
title: "Data Retention"
priority: 1
parameters:
period: "36 months"
trigger: "account_closure"
action: "delete"