AI Security9 min read

The OWASP LLM Top 10, Explained

Language models broke an assumption security quietly depended on for decades: that instructions and data are different things. A parameterized SQL query keeps user input from becoming code. A model has no such boundary. System prompt, user message and retrieved document all arrive as one stream of tokens with equal standing. That single fact sits underneath most of the list below.

The OWASP LLM Top 10 (v2.0, current through 2026) names the risks every team shipping AI features should understand. Here is the short, practical version, with a real example for each.

LLM01 Prompt injection

Crafted input that overrides intended behavior, directly or indirectly. The indirect variant is the dangerous one: instructions hidden in a document, email or web page the model retrieves. A crafted email made a mainstream copilot exfiltrate inbox data when a user simply asked for a summary, with no click required. There is no patch, because this is a vulnerability class, not a bug.

LLM02 Sensitive information disclosure

The model reveals confidential data from its training set, system prompt or retrieval corpus. A support assistant leaking another customer's records through a crafted query against a poorly isolated vector store is the canonical case.

LLM03 Supply chain

Compromise through malicious model weights, datasets or adapters. Loading a tampered model file can execute code via unsafe deserialization, and malicious fine-tuning adapters circulate on public hubs.

LLM04 Data and model poisoning

Corrupting training or fine-tuning data to plant a backdoor or degrade integrity. Research on sleeper agents showed planted backdoors can survive standard safety training and stay dormant until a trigger.

LLM05 Improper output handling

Treating model output as trusted input to another system. Output containing a markdown image URL that exfiltrates data, or a SQL fragment dropped into an unparameterized query, turns a helpful answer into an exploit.

LLM06 Excessive agency

Giving the model too much permission, capability or autonomy. An AI agent with unrestricted write access and no human approval gate deleted a production database. The fix is least privilege and gating consequential actions, not a better prompt.

LLM07 System prompt leakage

Exposure of the system prompt, new in v2.0. Asking a model to repeat its instructions for quality assurance can dump rules and any secrets foolishly embedded there. The lesson: a system prompt is not a security boundary.

LLM08 Vector and embedding weaknesses

Risks specific to retrieval systems. Embeddings can be poisoned to force malicious retrieval, leaked across tenants, or inverted to recover approximate source text. Embeddings are reversible, so they are not a substitute for encryption.

LLM09 Misinformation

Confident, fabricated output with downstream impact. Slopsquatting weaponizes it: attackers register the package names that models hallucinate, so a developer who installs a suggested nonexistent dependency is compromised.

LLM10 Unbounded consumption

Resource and cost abuse, renamed from model denial of service. Denial of wallet is the sharp version: flooding a pay-per-token endpoint to inflate the victim's cloud bill.

No single guardrail closes this list. Defense is layered: input checks, output validation, provenance tagging, action gating and least privilege, working together. Guardrails are a layer, not a perimeter.

Where this is heading

Agentic systems raise the stakes further, adding memory poisoning, tool description poisoning and cascading hallucination across multi-agent chains. Regulation is arriving alongside, with EU AI Act high-risk obligations enforcing from August 2026. This is precisely the territory Cyron AI Security is being built to cover.

Building Cyron AI Security.

We are in ideation and raising a seed round. If you back security companies early, let us talk.

Read the vision