Evaluators & suites

An evaluator is a single attack-and-judge pattern — prompt-injection, bola, sql-injection, and so on. Each is a YAML file: the attacker LLM reads it to craft prompts, and the judge uses its pass/fail criteria to score the response. A suite is a named bundle of evaluators. Pick one suite for a broad scan, or list individual evaluator IDs for a focused one.

Standard vs curated suites

Standard suites (owasp-llm-top10, owasp-mcp-top10, owasp-agentic-ai, …) are auto-derived from each evaluator’s standards: tags. Tag an evaluator and it joins the matching suite automatically — no drift.
Curated suites (harmful-content, pre-deploy-critical, quick-smoke, …) are hand-authored bundles for a specific purpose.

Two catalogs: agent vs MCP

Opfor maintains two independent evaluator catalogs — one for agent / chatbot red-teaming, one for MCP server red-teaming. The target type selects which catalog the engine reads.

A few IDs exist in both catalogs with different content:

owasp-mcp-top10 is a suite in both. The agent-side suite probes how an agent behaves around MCP tools; the MCP-side suite probes the MCP server itself. Same ID, different pipelines.
supply-chain exists in both as an evaluator, with content specific to each catalog.
Agent-tree evaluators prefixed mcp-* (e.g. mcp-scope-escalation) test an agent’s MCP-handling behavior — they are not the MCP-catalog evaluators.

Choosing what to run

A suite
Specific evaluators
MCP server tool

"selection": { "mode": "suite", "suite": "owasp-llm-top10" }

"selection": { "mode": "evaluators", "evaluators": ["prompt-injection", "jailbreaking", "bola"] }

{ "evaluator_ids": ["tool-description-injection", "scope-escalation"] }

The setup wizard (opfor setup) and the browser extension both let you pick a suite or individual evaluators interactively.

Full reference

Every evaluator and suite with OWASP mappings.

Author an evaluator

Add your own — no TypeScript needed.

​Standard vs curated suites

​Two catalogs: agent vs MCP

​Choosing what to run

Full reference

Author an evaluator

Standard vs curated suites

Two catalogs: agent vs MCP

Choosing what to run