All Volumes/Volume 06

AI Incident Response

Comprehensive incident response capabilities covering prompt injection, model poisoning, hallucination management, data leakage, and regulatory notification.

6.1 AI Incident Response Plan

The AI Incident Response Plan establishes the organisational structure, procedures, and resources for detecting, responding to, and recovering from AI-related security incidents. It integrates with the enterprise Incident Response Plan and addresses AI-specific incident types.

Incident Response Phases

Preparation: Maintain incident response capabilities, training, tools, and documentation. Conduct regular tabletop exercises and red-team simulations.
Detection & Analysis: Monitor AI systems for anomalous behaviour, security alerts, and user reports. Validate and classify incidents by severity.
Containment: Implement short-term containment to stop immediate damage. Develop long-term containment strategy for sustained operations.
Eradication: Remove the root cause of the incident, including compromised models, poisoned data, or injected prompts.
Recovery: Restore affected systems to normal operations with enhanced monitoring. Validate integrity before full restoration.
Post-Incident: Conduct lessons-learned review, update controls, and communicate findings to stakeholders.

Phase	Time Objective	Key Actions	Owner
Detection	< 15 minutes	Alert triage, initial validation, severity assignment	SOC Analyst
Containment	< 1 hour	Isolate affected systems, preserve evidence, notify IR lead	Incident Manager
Eradication	< 24 hours	Remove root cause, patch vulnerabilities, clean compromised artefacts	Technical Lead
Recovery	< 72 hours	Restore services, validate integrity, enhanced monitoring	Operations Lead
Post-Incident	< 14 days	Root cause analysis, lessons learned, control updates	Incident Manager

6.2 Severity Classification

All AI incidents shall be classified by severity to ensure appropriate resource allocation, escalation, and communication.

Severity Levels

Severity	Definition	Examples	Response Time	Escalation
Critical (SEV-1)	Active exploitation causing severe business impact, data breach, or safety risk	Active prompt injection on production LLM handling PII; model producing harmful safety-critical outputs	< 15 minutes	CEO, Board, Regulators
High (SEV-2)	Confirmed compromise with significant impact potential	Confirmed data poisoning in training pipeline; unauthorised model access	< 1 hour	C-Suite, Steering Committee
Medium (SEV-3)	Suspicious activity or limited impact incident	Anomalous inference patterns; isolated hallucination in customer-facing system	< 4 hours	Director, Security Lead
Low (SEV-4)	Minor anomaly or policy violation with minimal impact	Failed login attempts; minor policy deviation detected in audit	< 24 hours	Manager, Team Lead
Informational (SEV-5)	No impact; observational or preventive	Threat intelligence alert with no confirmed exposure; control improvement opportunity	< 72 hours	Analyst

6.3 Prompt Injection Response

Prompt injection is a critical attack vector against large language models. This playbook provides step-by-step response procedures for confirmed or suspected prompt injection incidents.

Response Playbook

Detect: Monitor for anomalous outputs, user reports of unexpected behaviour, or automated alerts from input validation systems.
Confirm: Reproduce the suspicious input in a sandboxed environment. Document the exact prompt, model version, and system configuration.
Contain: Immediately disable the affected model endpoint or route traffic to a fallback model. Preserve logs and model state for investigation.
Assess Impact: Determine whether the injection resulted in data exfiltration, unauthorised actions, or harmful output generation.
Notify: If personal data or regulated information was accessed, initiate regulatory notification timelines (72 hours for GDPR, as applicable).
Remediate: Patch prompt templates, strengthen input validation, update system prompts, and retrain or fine-tune if model weights are compromised.
Validate: Test fixes in staging with red-team prompt injection techniques before restoring production traffic.
Document: Complete the Incident Report (Template 9.13) and update the Prompt Register with the attack vector and mitigations.

Immediate Action

Upon confirmation of prompt injection on a production system handling sensitive data or critical functions: isolate the endpoint within 15 minutes, preserve all logs, and escalate to the Incident Manager and CISO immediately.

6.4 Hallucination Management

AI hallucination — the generation of plausible but factually incorrect or unsupported content — poses significant risks for organisations using AI in decision-support, customer-facing, or content-generation roles.

Hallucination Response Procedures

Detection: Implement automated fact-checking, confidence scoring, and retrieval-augmented generation (RAG) grounding checks.
User Reporting: Establish clear channels for users to report suspected hallucinations with structured feedback forms.
Impact Assessment: Classify hallucinations by severity — factual error, harmful misinformation, or safety-critical incorrect advice.
Correction: For high-impact hallucinations, issue corrections to affected users, update knowledge bases, and retrain or fine-tune models.
Prevention: Enhance RAG retrieval quality, implement citation requirements, constrain output to verified sources, and use ensemble models for critical queries.

All customer-facing AI outputs include appropriate confidence indicators and disclaimers.

Hallucination rates are measured and reported monthly by model and use case.

High-confidence AI outputs for regulated domains require human verification.

Retrieval sources are authoritative, current, and validated.

Fallback to human review is enforced when confidence scores fall below thresholds.

6.5 Data Leakage & Insider Threat Response

Data leakage from AI systems can occur through model memorisation, prompt echoing, insecure API responses, or insider actions. This section establishes detection and response procedures.

Data Leakage Response

Detect: Monitor API responses for unexpected inclusion of training data, PII, or confidential content. Use DLP tools on AI outputs.
Confirm: Reproduce the leakage in a controlled environment. Identify the source data, model version, and triggering input.
Contain: Restrict access to the affected model, disable the vulnerable endpoint, and block the triggering input patterns.
Assess Scope: Determine which data elements were exposed, to whom, and for how long. Review access logs comprehensively.
Notify: If personal data was leaked, notify the Privacy Officer immediately. Comply with mandatory breach notification timelines.
Remediate: Retrain or fine-tune the model with differential privacy, implement output filtering, and strengthen data sanitisation.
Review: Assess whether the leakage indicates a broader control failure requiring systemic improvement.

Insider Threat Considerations

Insider threats to AI systems include malicious model extraction, data theft, sabotage of training data, and unauthorised model deployment. Implement least-privilege access, segregation of duties, anomaly detection on developer actions, and regular insider risk assessments.

6.6 Regulatory Notification & Communications

Timely, accurate, and coordinated communication is essential during AI incidents. This section establishes notification requirements, communication templates, and approval workflows.

Notification Matrix

Stakeholder	Trigger	Timeline	Channel	Owner
C-Suite / Board	Any SEV-1 or SEV-2 incident	Within 1 hour	Secure call + written summary	Incident Manager
Regulators (OAIC / ICO)	Personal data breach affecting > 1 individual	72 hours (GDPR) / as soon as practicable (APPs)	Formal notification	Privacy Officer / Legal
Affected Individuals	High risk to rights and freedoms	Without undue delay	Direct communication	Customer Success / Legal
Cyber Insurance	Any incident likely to trigger coverage	Within 24 hours	Formal claim notification	Risk Manager
Law Enforcement	Criminal activity suspected	As appropriate	Formal referral	General Counsel
Media / Public	Incident of public interest	After legal review and Board approval	Approved statement	Communications Director

Communication Discipline

All external communications regarding AI incidents require pre-approval from Legal and the Communications Director. Unauthorised statements by employees are prohibited and may result in disciplinary action.