AI Data Governance
End-to-end data governance for AI systems including classification, ownership, lifecycle management, privacy controls, and cross-border transfer protocols.
4.1 Data Classification
All data used in AI systems shall be classified according to sensitivity, regulatory requirements, and business criticality. Classification determines handling requirements, access controls, retention periods, and disposal methods.
Classification Schema
| Level | Definition | AI Handling Requirements | Examples |
|---|---|---|---|
| Public | Data intended for public disclosure | Minimal controls; standard validation | Published research, public datasets, marketing content |
| Internal | Data for internal business use only | Access control, usage logging, version control | Internal reports, operational metrics, employee data |
| Confidential | Data whose unauthorised disclosure would harm the organisation or individuals | Encryption at rest and in transit, need-to-know access, audit trails, DLP controls | Customer PII, financial data, proprietary models, training datasets |
| Restricted | Data whose unauthorised disclosure would cause severe harm or violate law | Air-gapped environments, multi-factor authentication, full audit logging, encryption with HSM, legal hold capability | Health records, biometrics, government classified data, criminal records |
Re-classification Requirement
Data classification shall be reviewed whenever data is combined, transformed, or used for a new purpose. AI-generated synthetic data derived from Confidential or Restricted source data inherits the source classification unless a formal declassification assessment is completed and approved by the Data Governance Committee.
4.2 Data Ownership & Stewardship
Every dataset used in AI systems shall have a designated Data Owner and Data Steward. These roles are accountable for data quality, classification accuracy, lifecycle management, and compliance.
Role Definitions
| Role | Accountability | Typical Holder |
|---|---|---|
| Data Owner | Business accountability for data accuracy, classification, usage authorisation, and retention | Business unit executive or product owner |
| Data Steward | Operational accountability for data quality, metadata, lineage, and day-to-day governance | Data analyst, data engineer, or domain specialist |
| Data Custodian | Technical accountability for storage, backup, security, and infrastructure | IT operations or cloud infrastructure team |
| Data User | Compliance with handling requirements, reporting anomalies, and appropriate use | Any employee or contractor accessing the data |
4.3 Data Lifecycle Management
AI data shall be managed through a complete lifecycle from creation or acquisition through to secure disposal. Each phase has defined controls, responsibilities, and evidence requirements.
Lifecycle Phases
- Create / Acquire: Data is generated internally or sourced from third parties. Controls: provenance verification, quality assessment, classification, ingress security scanning.
- Store: Data is persisted in approved repositories. Controls: encryption, access control, backup, geographic compliance, retention labelling.
- Process: Data is transformed, cleaned, feature-engineered, or used for training. Controls: processing logs, lineage tracking, bias detection, privacy-preserving techniques.
- Share: Data is transferred between systems, teams, or external parties. Controls: data sharing agreements, anonymisation review, transfer mechanism validation, recipient verification.
- Archive: Data is moved to long-term storage after active use. Controls: integrity verification, encryption, access restriction, retention schedule enforcement.
- Dispose: Data is securely destroyed at end of retention. Controls: cryptographic erasure, physical destruction certificates, disposal logs, verification sampling.
4.4 Privacy Controls & Cross-Border Transfers
AI systems processing personal data must comply with the Australian Privacy Act 1988, the Australian Privacy Principles (APPs), the GDPR where applicable, and all other relevant privacy legislation.
Privacy by Design Controls
| Control | Implementation | Verification |
|---|---|---|
| Data Minimisation | Collect only data necessary for the specified AI purpose | DPIA review, data inventory audit |
| Purpose Limitation | Use data only for purposes disclosed at collection | Usage monitoring, access review |
| Anonymisation / Pseudonymisation | Apply before model training where direct identifiers are not required | Re-identification risk assessment |
| Consent Management | Record and manage consent for AI-specific uses, including automated decision-making | Consent register, opt-out mechanism testing |
| Individual Rights | Enable access, correction, erasure, and portability requests | Request handling SLA, process documentation |
| Privacy Impact Assessment | Conduct DPIA for all high-risk AI processing of personal data | DPIA completion before deployment |
| Cross-Border Transfers | Use approved transfer mechanisms (SCCs, adequacy decisions, certification) | Legal review, transfer impact assessment |
Cross-Border Data Transfers
AI training data, model weights, and inference inputs containing personal data shall not be transferred to jurisdictions without adequate privacy protections unless Standard Contractual Clauses (SCCs) or Binding Corporate Rules (BCRs) are in place and a Transfer Impact Assessment (TIA) confirms sufficient protection.
4.5 Data Quality & Lineage
Data quality directly determines AI system reliability, fairness, and safety. The organisation shall establish, measure, and maintain data quality standards for all AI-relevant datasets.
Data Quality Dimensions
| Dimension | Definition | Measurement |
|---|---|---|
| Accuracy | Data correctly represents the real-world entity or event | Error rate, comparison to ground truth, expert sampling |
| Completeness | All required data elements are present | Null rate, coverage percentage, mandatory field compliance |
| Consistency | Data is uniform across systems and over time | Cross-system reconciliation, temporal stability metrics |
| Timeliness | Data is sufficiently current for its purpose | Age metrics, refresh frequency, staleness alerts |
| Validity | Data conforms to defined formats, ranges, and rules | Schema validation, range checks, format compliance |
| Uniqueness | No unintended duplicates exist | Duplicate detection rate, deduplication coverage |