All Volumes/Volume 04

AI Data Governance

End-to-end data governance for AI systems including classification, ownership, lifecycle management, privacy controls, and cross-border transfer protocols.

4.1 Data Classification

All data used in AI systems shall be classified according to sensitivity, regulatory requirements, and business criticality. Classification determines handling requirements, access controls, retention periods, and disposal methods.

Classification Schema

Level	Definition	AI Handling Requirements	Examples
Public	Data intended for public disclosure	Minimal controls; standard validation	Published research, public datasets, marketing content
Internal	Data for internal business use only	Access control, usage logging, version control	Internal reports, operational metrics, employee data
Confidential	Data whose unauthorised disclosure would harm the organisation or individuals	Encryption at rest and in transit, need-to-know access, audit trails, DLP controls	Customer PII, financial data, proprietary models, training datasets
Restricted	Data whose unauthorised disclosure would cause severe harm or violate law	Air-gapped environments, multi-factor authentication, full audit logging, encryption with HSM, legal hold capability	Health records, biometrics, government classified data, criminal records

Re-classification Requirement

Data classification shall be reviewed whenever data is combined, transformed, or used for a new purpose. AI-generated synthetic data derived from Confidential or Restricted source data inherits the source classification unless a formal declassification assessment is completed and approved by the Data Governance Committee.

4.2 Data Ownership & Stewardship

Every dataset used in AI systems shall have a designated Data Owner and Data Steward. These roles are accountable for data quality, classification accuracy, lifecycle management, and compliance.

Role Definitions

Role	Accountability	Typical Holder
Data Owner	Business accountability for data accuracy, classification, usage authorisation, and retention	Business unit executive or product owner
Data Steward	Operational accountability for data quality, metadata, lineage, and day-to-day governance	Data analyst, data engineer, or domain specialist
Data Custodian	Technical accountability for storage, backup, security, and infrastructure	IT operations or cloud infrastructure team
Data User	Compliance with handling requirements, reporting anomalies, and appropriate use	Any employee or contractor accessing the data

All datasets in the AI Asset Register have an assigned Data Owner and Data Steward.

Ownership is documented in the Data Catalogue with contact details and delegation rules.

Data Owners review and approve all new AI use cases involving their data.

Data Stewards perform monthly quality assessments and report findings.

Ownership transfers are documented and approved when personnel change roles.

4.3 Data Lifecycle Management

AI data shall be managed through a complete lifecycle from creation or acquisition through to secure disposal. Each phase has defined controls, responsibilities, and evidence requirements.

Lifecycle Phases

Create / Acquire: Data is generated internally or sourced from third parties. Controls: provenance verification, quality assessment, classification, ingress security scanning.
Store: Data is persisted in approved repositories. Controls: encryption, access control, backup, geographic compliance, retention labelling.
Process: Data is transformed, cleaned, feature-engineered, or used for training. Controls: processing logs, lineage tracking, bias detection, privacy-preserving techniques.
Share: Data is transferred between systems, teams, or external parties. Controls: data sharing agreements, anonymisation review, transfer mechanism validation, recipient verification.
Archive: Data is moved to long-term storage after active use. Controls: integrity verification, encryption, access restriction, retention schedule enforcement.
Dispose: Data is securely destroyed at end of retention. Controls: cryptographic erasure, physical destruction certificates, disposal logs, verification sampling.

4.4 Privacy Controls & Cross-Border Transfers

AI systems processing personal data must comply with the Australian Privacy Act 1988, the Australian Privacy Principles (APPs), the GDPR where applicable, and all other relevant privacy legislation.

Privacy by Design Controls

Control	Implementation	Verification
Data Minimisation	Collect only data necessary for the specified AI purpose	DPIA review, data inventory audit
Purpose Limitation	Use data only for purposes disclosed at collection	Usage monitoring, access review
Anonymisation / Pseudonymisation	Apply before model training where direct identifiers are not required	Re-identification risk assessment
Consent Management	Record and manage consent for AI-specific uses, including automated decision-making	Consent register, opt-out mechanism testing
Individual Rights	Enable access, correction, erasure, and portability requests	Request handling SLA, process documentation
Privacy Impact Assessment	Conduct DPIA for all high-risk AI processing of personal data	DPIA completion before deployment
Cross-Border Transfers	Use approved transfer mechanisms (SCCs, adequacy decisions, certification)	Legal review, transfer impact assessment

Cross-Border Data Transfers

AI training data, model weights, and inference inputs containing personal data shall not be transferred to jurisdictions without adequate privacy protections unless Standard Contractual Clauses (SCCs) or Binding Corporate Rules (BCRs) are in place and a Transfer Impact Assessment (TIA) confirms sufficient protection.

4.5 Data Quality & Lineage

Data quality directly determines AI system reliability, fairness, and safety. The organisation shall establish, measure, and maintain data quality standards for all AI-relevant datasets.

Data Quality Dimensions

Dimension	Definition	Measurement
Accuracy	Data correctly represents the real-world entity or event	Error rate, comparison to ground truth, expert sampling
Completeness	All required data elements are present	Null rate, coverage percentage, mandatory field compliance
Consistency	Data is uniform across systems and over time	Cross-system reconciliation, temporal stability metrics
Timeliness	Data is sufficiently current for its purpose	Age metrics, refresh frequency, staleness alerts
Validity	Data conforms to defined formats, ranges, and rules	Schema validation, range checks, format compliance
Uniqueness	No unintended duplicates exist	Duplicate detection rate, deduplication coverage

Data quality rules are defined and documented for each AI dataset.

Quality metrics are calculated automatically and reported monthly.

Data lineage is tracked from source to model training to inference.

Quality exceptions trigger alerts to Data Stewards within 24 hours.

Training datasets are versioned and linked to model versions.