Configuration
Every validator and guard accepts a config object with two top-level dials: how strict to match (sensitivity), and what to do on a match (action).
Sensitivity levels
strictAggressive matching. Lower false-negative rate, higher false-positive rate. Default for high-stakes flows (payment, account changes).
standardBalanced. The default. Recommended for most chat and assistant surfaces.
permissiveConservative matching. Lower false-positive rate. Good for non-critical surfaces where UX trumps caution.
Action modes
blockReject the request. allowed=false, blocked=true.
sanitizeRedact or strip the offending content, then continue. Useful for PII / secret guards.
logPass through but record the finding. For shadow-deploys before flipping to block.
allowForce-allow regardless of findings. Use sparingly — usually for whitelisted operators.
Engine config
import { GuardrailEngine } from '@blackunicorn/bonklm'
const engine = new GuardrailEngine({
sensitivity: 'standard', // 'strict' | 'standard' | 'permissive'
action: 'block', // 'block' | 'sanitize' | 'log' | 'allow'
validators: [
'prompt-injection', // names or class instances
'jailbreak',
],
guards: ['pii', 'secret'],
shortCircuit: true, // stop at first detection (default true)
parallel: false, // run all layers concurrently (default false)
})GuardrailResult
Every validator and guard returns the same shape:
interface GuardrailResult {
allowed: boolean
blocked: boolean
reason?: string // human-readable detection reason
severity: 'info' | 'low' | 'medium' | 'high' | 'critical'
risk_level: 'none' | 'low' | 'medium' | 'high' | 'critical'
risk_score: number // 0..100
findings: Finding[] // raw matches (pattern, position, snippet)
timestamp: number // Unix ms
}