Docs

Configuration

Every validator and guard accepts a config object with two top-level dials: how strict to match (sensitivity), and what to do on a match (action).

Sensitivity levels

strict

Aggressive matching. Lower false-negative rate, higher false-positive rate. Default for high-stakes flows (payment, account changes).

standard

Balanced. The default. Recommended for most chat and assistant surfaces.

permissive

Conservative matching. Lower false-positive rate. Good for non-critical surfaces where UX trumps caution.

Action modes

block

Reject the request. allowed=false, blocked=true.

sanitize

Redact or strip the offending content, then continue. Useful for PII / secret guards.

log

Pass through but record the finding. For shadow-deploys before flipping to block.

allow

Force-allow regardless of findings. Use sparingly — usually for whitelisted operators.

Engine config

engine.tsts
import { GuardrailEngine } from '@blackunicorn/bonklm'

const engine = new GuardrailEngine({
  sensitivity: 'standard',     // 'strict' | 'standard' | 'permissive'
  action: 'block',             // 'block' | 'sanitize' | 'log' | 'allow'
  validators: [
    'prompt-injection',        // names or class instances
    'jailbreak',
  ],
  guards: ['pii', 'secret'],
  shortCircuit: true,          // stop at first detection (default true)
  parallel: false,             // run all layers concurrently (default false)
})

GuardrailResult

Every validator and guard returns the same shape:

GuardrailResultts
interface GuardrailResult {
  allowed: boolean
  blocked: boolean
  reason?: string                  // human-readable detection reason
  severity: 'info' | 'low' | 'medium' | 'high' | 'critical'
  risk_level: 'none' | 'low' | 'medium' | 'high' | 'critical'
  risk_score: number               // 0..100
  findings: Finding[]              // raw matches (pattern, position, snippet)
  timestamp: number                // Unix ms
}