Policy YAML Reference

Policies are the primary input to the Metatate governance pipeline. Each policy is a YAML document that declares metadata, scope, and a set of typed instructions.

Top-Level Structure

metadata:
  id: "policy-uuid"
  name: "Customer PII Protection"
  version: "1.0"
  description: "Classification and masking rules for customer PII"
  owner: "data-governance-team"
  tags:
    - pii
    - gdpr
    - customer-data

scope:
  tables:
    - "DB_NAME.SCHEMA_NAME.CUSTOMERS"
    - "DB_NAME.SCHEMA_NAME.ORDERS"

instructions:
  - type: classification
    # ...
  - type: masking
    # ...

Metadata

Field	Type	Required	Description
`id`	string	Yes	Unique identifier for the policy. Auto-generated if omitted.
`name`	string	Yes	Human-readable policy name.
`version`	string	Yes	Semantic version string.
`description`	string	No	Explanation of the policy's purpose.
`owner`	string	No	Team or individual responsible for this policy.
`tags`	string[]	No	Labels for filtering and grouping policies.

Scope

The scope section defines which tables the policy applies to.

Field	Type	Required	Description
`tables`	string[]	Yes	Fully qualified table names (`DATABASE.SCHEMA.TABLE`).

All instructions in the policy apply to every table listed in scope unless an instruction overrides scope at the instruction level.

Instructions

Each instruction is a typed directive. The type field determines which parameters are valid.

Common Fields

Every instruction shares these fields:

Field	Type	Required	Description
`type`	string	Yes	One of: `classification`, `masking`, `usage_guidance`, `ai_governance`, `retention`, `access_control`
`title`	string	Yes	Short label for the instruction.
`description`	string	No	Detailed explanation.
`priority`	integer	No	1 (highest) to 10 (lowest). Determines precedence when multiple policies apply. Default: 5.
`parameters`	object	Yes	Type-specific configuration. See below.
`scope`	object	No	Instruction-level scope override. Narrows the tables or columns this instruction targets.

Instruction Types

classification

Assigns data categories and sensitivity levels to columns.

- type: classification
  title: "PII Classification"
  description: "Identify and classify personally identifiable information"
  priority: 1
  parameters:
    columns:
      - name: EMAIL
        data_type_id: email_address
        data_type_label: "Email Address"
        sensitivity: high
        confidence: 0.95
        category: personal_identifier
        subcategory: contact_info
      - name: PHONE_NUMBER
        data_type_id: phone_number
        data_type_label: "Phone Number"
        sensitivity: high
        confidence: 0.90
        category: personal_identifier
        subcategory: contact_info

Parameters:

Field	Type	Description
`columns`	array	List of column classification definitions.
`columns[].name`	string	Column name.
`columns[].data_type_id`	string	Canonical data type identifier (e.g., `email_address`, `ssn`).
`columns[].data_type_label`	string	Human-readable label.
`columns[].sensitivity`	string	One of: `low`, `medium`, `high`, `critical`.
`columns[].confidence`	float	Classification confidence (0.0 to 1.0).
`columns[].category`	string	Top-level data category.
`columns[].subcategory`	string	Narrower data category.

masking

Defines how sensitive columns should be masked or tokenized.

- type: masking
  title: "Email Masking"
  priority: 2
  parameters:
    columns:
      - name: EMAIL
        masking_type: partial
        config:
          show_first: 2
          show_last: 0
          mask_char: "*"
          preserve_domain: true
        exempt_roles:
          - DATA_ENGINEER
          - COMPLIANCE_OFFICER

Parameters:

Field	Type	Description
`columns`	array	List of column masking definitions.
`columns[].name`	string	Column name. Must have a corresponding classification.
`columns[].masking_type`	string	One of: `full`, `partial`, `hash`, `tokenize`, `redact`.
`columns[].config`	object	Type-specific masking configuration.
`columns[].exempt_roles`	string[]	Roles that see unmasked data.

usage_guidance

Declares acceptable and prohibited uses for the data.

- type: usage_guidance
  title: "Customer Data Usage Rules"
  priority: 3
  parameters:
    allowed_purposes:
      - "Customer support operations"
      - "Order fulfillment"
    prohibited_uses:
      - "Third-party marketing"
      - "Automated profiling without consent"
    conditions:
      - "Requires data processing agreement for external sharing"
      - "Must anonymize before use in analytics"

ai_governance

Controls how AI systems may interact with the data.

- type: ai_governance
  title: "AI Training Restrictions"
  priority: 2
  parameters:
    allow_training: false
    allow_inference: true
    allow_embedding: false
    restrictions:
      - "No use in generative model training"
      - "Inference results must not be stored longer than session"
    required_safeguards:
      - "Output filtering for PII leakage"

retention

Specifies data lifecycle and retention requirements.

- type: retention
  title: "GDPR Retention Policy"
  priority: 1
  parameters:
    period: "36 months"
    trigger: "account_closure"
    action: "delete"
    exceptions:
      - "Legal hold overrides deletion"
      - "Aggregated statistics may be retained indefinitely"

access_control

Defines role-based access recommendations.

- type: access_control
  title: "Tiered Access Control"
  priority: 2
  parameters:
    roles:
      - role: DATA_ANALYST
        access_level: masked
        conditions:
          - "Must complete PII training"
      - role: DATA_ENGINEER
        access_level: full
        conditions: []
      - role: PUBLIC
        access_level: denied

Instruction-Level Scope Override

Any instruction can narrow its scope to specific tables or columns within the policy's table list:

- type: classification
  title: "SSN Classification"
  scope:
    tables:
      - "DB_NAME.SCHEMA_NAME.CUSTOMERS"
    columns:
      - SSN
      - TAX_ID
  parameters:
    # ...

When scope is provided at the instruction level, it must be a subset of the policy-level scope.

Complete Example

metadata:
  id: "pol-customer-pii-001"
  name: "Customer PII Protection"
  version: "2.1"
  description: "Comprehensive PII governance for customer-facing tables"
  owner: "data-governance-team"
  tags:
    - pii
    - gdpr
    - production

scope:
  tables:
    - "ANALYTICS_DB.PUBLIC.CUSTOMERS"
    - "ANALYTICS_DB.PUBLIC.CUSTOMER_CONTACTS"

instructions:
  - type: classification
    title: "PII Column Classification"
    priority: 1
    parameters:
      columns:
        - name: EMAIL
          data_type_id: email_address
          data_type_label: "Email Address"
          sensitivity: high
          confidence: 0.95
          category: personal_identifier
          subcategory: contact_info
        - name: SSN
          data_type_id: social_security_number
          data_type_label: "Social Security Number"
          sensitivity: critical
          confidence: 1.0
          category: personal_identifier
          subcategory: government_id

  - type: masking
    title: "PII Masking Rules"
    priority: 1
    parameters:
      columns:
        - name: EMAIL
          masking_type: partial
          config:
            show_first: 2
            mask_char: "*"
            preserve_domain: true
          exempt_roles:
            - COMPLIANCE_OFFICER
        - name: SSN
          masking_type: full
          config:
            replacement: "***-**-****"
          exempt_roles: []

  - type: usage_guidance
    title: "Customer Data Usage"
    priority: 3
    parameters:
      allowed_purposes:
        - "Customer support"
        - "Order processing"
      prohibited_uses:
        - "Third-party data sales"
        - "Unsolicited marketing"

  - type: ai_governance
    title: "AI Restrictions"
    priority: 2
    parameters:
      allow_training: false
      allow_inference: true
      allow_embedding: false
      restrictions:
        - "No PII in model training data"

  - type: retention
    title: "Data Retention"
    priority: 1
    parameters:
      period: "36 months"
      trigger: "account_closure"
      action: "delete"

Top-Level Structure​

Metadata​

Scope​

Instructions​

Common Fields​

Instruction Types​

classification​

masking​

usage_guidance​

ai_governance​

retention​

access_control​

Instruction-Level Scope Override​

Complete Example​