All Volumes/

AI Data Governance

End-to-end data governance for AI systems including classification, ownership, lifecycle management, privacy controls, and cross-border transfer protocols.

4.1 Data Classification

All data used in AI systems shall be classified according to sensitivity, regulatory requirements, and business criticality. Classification determines handling requirements, access controls, retention periods, and disposal methods.

Classification Schema

LevelDefinitionAI Handling RequirementsExamples
PublicData intended for public disclosureMinimal controls; standard validationPublished research, public datasets, marketing content
InternalData for internal business use onlyAccess control, usage logging, version controlInternal reports, operational metrics, employee data
ConfidentialData whose unauthorised disclosure would harm the organisation or individualsEncryption at rest and in transit, need-to-know access, audit trails, DLP controlsCustomer PII, financial data, proprietary models, training datasets
RestrictedData whose unauthorised disclosure would cause severe harm or violate lawAir-gapped environments, multi-factor authentication, full audit logging, encryption with HSM, legal hold capabilityHealth records, biometrics, government classified data, criminal records

Re-classification Requirement

Data classification shall be reviewed whenever data is combined, transformed, or used for a new purpose. AI-generated synthetic data derived from Confidential or Restricted source data inherits the source classification unless a formal declassification assessment is completed and approved by the Data Governance Committee.

4.2 Data Ownership & Stewardship

Every dataset used in AI systems shall have a designated Data Owner and Data Steward. These roles are accountable for data quality, classification accuracy, lifecycle management, and compliance.

Role Definitions

RoleAccountabilityTypical Holder
Data OwnerBusiness accountability for data accuracy, classification, usage authorisation, and retentionBusiness unit executive or product owner
Data StewardOperational accountability for data quality, metadata, lineage, and day-to-day governanceData analyst, data engineer, or domain specialist
Data CustodianTechnical accountability for storage, backup, security, and infrastructureIT operations or cloud infrastructure team
Data UserCompliance with handling requirements, reporting anomalies, and appropriate useAny employee or contractor accessing the data
All datasets in the AI Asset Register have an assigned Data Owner and Data Steward.
Ownership is documented in the Data Catalogue with contact details and delegation rules.
Data Owners review and approve all new AI use cases involving their data.
Data Stewards perform monthly quality assessments and report findings.
Ownership transfers are documented and approved when personnel change roles.

4.3 Data Lifecycle Management

AI data shall be managed through a complete lifecycle from creation or acquisition through to secure disposal. Each phase has defined controls, responsibilities, and evidence requirements.

Lifecycle Phases

  1. Create / Acquire: Data is generated internally or sourced from third parties. Controls: provenance verification, quality assessment, classification, ingress security scanning.
  2. Store: Data is persisted in approved repositories. Controls: encryption, access control, backup, geographic compliance, retention labelling.
  3. Process: Data is transformed, cleaned, feature-engineered, or used for training. Controls: processing logs, lineage tracking, bias detection, privacy-preserving techniques.
  4. Share: Data is transferred between systems, teams, or external parties. Controls: data sharing agreements, anonymisation review, transfer mechanism validation, recipient verification.
  5. Archive: Data is moved to long-term storage after active use. Controls: integrity verification, encryption, access restriction, retention schedule enforcement.
  6. Dispose: Data is securely destroyed at end of retention. Controls: cryptographic erasure, physical destruction certificates, disposal logs, verification sampling.

4.4 Privacy Controls & Cross-Border Transfers

AI systems processing personal data must comply with the Australian Privacy Act 1988, the Australian Privacy Principles (APPs), the GDPR where applicable, and all other relevant privacy legislation.

Privacy by Design Controls

ControlImplementationVerification
Data MinimisationCollect only data necessary for the specified AI purposeDPIA review, data inventory audit
Purpose LimitationUse data only for purposes disclosed at collectionUsage monitoring, access review
Anonymisation / PseudonymisationApply before model training where direct identifiers are not requiredRe-identification risk assessment
Consent ManagementRecord and manage consent for AI-specific uses, including automated decision-makingConsent register, opt-out mechanism testing
Individual RightsEnable access, correction, erasure, and portability requestsRequest handling SLA, process documentation
Privacy Impact AssessmentConduct DPIA for all high-risk AI processing of personal dataDPIA completion before deployment
Cross-Border TransfersUse approved transfer mechanisms (SCCs, adequacy decisions, certification)Legal review, transfer impact assessment

Cross-Border Data Transfers

AI training data, model weights, and inference inputs containing personal data shall not be transferred to jurisdictions without adequate privacy protections unless Standard Contractual Clauses (SCCs) or Binding Corporate Rules (BCRs) are in place and a Transfer Impact Assessment (TIA) confirms sufficient protection.

4.5 Data Quality & Lineage

Data quality directly determines AI system reliability, fairness, and safety. The organisation shall establish, measure, and maintain data quality standards for all AI-relevant datasets.

Data Quality Dimensions

DimensionDefinitionMeasurement
AccuracyData correctly represents the real-world entity or eventError rate, comparison to ground truth, expert sampling
CompletenessAll required data elements are presentNull rate, coverage percentage, mandatory field compliance
ConsistencyData is uniform across systems and over timeCross-system reconciliation, temporal stability metrics
TimelinessData is sufficiently current for its purposeAge metrics, refresh frequency, staleness alerts
ValidityData conforms to defined formats, ranges, and rulesSchema validation, range checks, format compliance
UniquenessNo unintended duplicates existDuplicate detection rate, deduplication coverage
Data quality rules are defined and documented for each AI dataset.
Quality metrics are calculated automatically and reported monthly.
Data lineage is tracked from source to model training to inference.
Quality exceptions trigger alerts to Data Stewards within 24 hours.
Training datasets are versioned and linked to model versions.