Nine layers, one verdict
How BonkLM's nine guardrail modules compose: the GuardrailResult contract, the four validators, the five guards, and the rules that bind them.
BonkLM ships nine named layers. They’re not coupled — each is a small module with a single public function. The interesting part is how they compose. This post walks the architecture.
The contract
Every layer returns a GuardrailResult. Same shape, regardless of whether the layer is a validator (inbound text) or a guard (outbound text or tool args):
interface GuardrailResult {
allowed: boolean
risk_level: 'none' | 'low' | 'medium' | 'high' | 'critical'
risk_score: number // 0..100
reason?: string
findings: Finding[] // raw matches with offsets
timestamp: number
}That uniformity is the whole trick. The engine doesn’t care which layer ran — it accumulates findings and decides the action.
The nine
Four validators inspect inbound:
- Prompt Injection — instruction-override, system-prompt extraction, persona pivots. 30+ patterns.
- Jailbreak — DAN-class roleplay, hypothetical framings, social-engineering ladders. 57 patterns.
- Reformulation — decodes base64, hex, leetspeak, zero-width characters, HTML-comment smuggling. Runs before the other validators so they see the decoded payload.
- Boundary — delimiter abuse, fake-section markers, context-overflow patterns.
Five guards inspect outbound and tool args:
- Secret — 36 credential types: API keys, AWS keys, GitHub tokens, JWT, private SSH/PGP, Stripe.
- PII — 30+ patterns spanning US, EU, UK identifiers; Luhn-validated card numbers; IBAN/NIF/PESEL/DNI/NIE/personnummer.
- XSS — reflected payloads,
on*=handlers,javascript:URIs, SVG/MathML smuggling. - Bash safety — destructive commands, directory escape, dangerous chains via
;/&&/ backticks. - Streaming validator — chunk-level inspection of LLM token streams. Cuts the stream if a layer trips mid-flight.
Composition rules
The GuardrailEngine takes an array of layers and two top-level dials:
shortCircuit— stop at first blocking verdict. Default true. Set false to collect all findings before deciding.parallel— run all layers concurrently. Default false. Worth it when shortCircuit is false and you want to bound latency.
The engine merges findings, picks the highest risk_level across layers, and applies the configured action (block / sanitize / log / allow).
Why nine and not three
Some libraries ship a monolithic “guardrail” that does everything. We split because the layers have different lifecycles. The pattern catalogues evolve at different cadences (secrets churns monthly with new key formats; XSS is much more stable). The layers have different failure modes (a secret false-positive is annoying; a PII false-positive is a compliance incident). And the layers belong to different reviewers — the security team owns Prompt Injection, the privacy team owns PII, platform engineering owns Bash safety.
Splitting them lets each team move at its own pace without breaking the others. The engine’s job is just to compose them.
Where to go next
Each layer has its own reference in the Validators and Guards docs pages. The playground runs five of them against your own prompts.