Adversarial Risks: Prompt Injection, Model Inversion & IP Exposure
AI systems face deliberate attacks that can subvert their behaviour, extract training data, and expose sensitive information. Understand these risks so you can defend against them.
Prompt Injection Demo
Prompt injection is the most common attack against LLM applications. An attacker embeds malicious instructions within input data, attempting to override the system\'s intended behaviour. Explore the two main types below.
Direct Injection
The attacker types malicious instructions directly into the prompt, attempting to override the system prompt or safety guardrails.
Defences
- • Instruction hierarchy: system prompts take priority over user input
- • Input validation and filtering of known injection patterns
- • Output monitoring to detect anomalous responses
- • Sandboxed execution for AI-generated code
Model Inversion & Data Extraction
Beyond prompt injection, AI models face risks of leaking training data. Attackers can potentially extract memorised content or reconstruct sensitive information from model outputs.
Training Data Extraction
Models can memorise and regurgitate snippets of training data. In 2023, a widely-reported bug temporarily exposed conversation histories from other users. Researchers have also demonstrated extracting verbatim training data through carefully crafted prompts.
Model Inversion Attacks
By analysing a model\'s outputs across many queries, an attacker can reconstruct aspects of its training data. This is more relevant to fine-tuned models where the training dataset is smaller and more focused.
Consumer vs Enterprise Data Flows
Consumer Tier (Free / Pro)
Enterprise Tier
Key Insight
Enterprise tiers with training opt-out are the minimum for government use. But even with enterprise agreements, the sanitisation discipline from Lessons 8.1 and 8.2 still applies — belt and braces.
OWASP LLM Top 10 Explorer
The OWASP Top 10 for LLM Applications is the authoritative reference for AI security risks. Click each item to expand its description, defence-relevant example, and recommended mitigations.