Docs

Configuration

Every validator and guard accepts a config object with two top-level dials: how strict to match (sensitivity), and what to do on a match (action).

Sensitivity levels

strict

Aggressive matching. Lower false-negative rate, higher false-positive rate. Default for high-stakes flows (payment, account changes).

standard

Balanced. The default. Recommended for most chat and assistant surfaces.

permissive

Conservative matching. Lower false-positive rate. Good for non-critical surfaces where UX trumps caution.

Action modes

block

Reject the request. allowed=false, blocked=true.

sanitize

Redact or strip the offending content, then continue. Useful for PII / secret guards.

log

Pass through but record the finding. For shadow-deploys before flipping to block.

allow

Force-allow regardless of findings. Use sparingly — usually for whitelisted operators.

Engine config

engine.tsts

import { GuardrailEngine } from '@blackunicorn/bonklm'

const engine = new GuardrailEngine({
  sensitivity: 'standard',     // 'strict' | 'standard' | 'permissive'
  action: 'block',             // 'block' | 'sanitize' | 'log' | 'allow'
  validators: [
    'prompt-injection',        // names or class instances
    'jailbreak',
  ],
  guards: ['pii', 'secret'],
  shortCircuit: true,          // stop at first detection (default true)
  parallel: false,             // run all layers concurrently (default false)
})

GuardrailResult

Every validator and guard returns the same shape:

GuardrailResultts

enum Severity   { INFO = 'info', WARNING = 'warning', BLOCKED = 'blocked', CRITICAL = 'critical' }
enum RiskLevel  { LOW = 'LOW', MEDIUM = 'MEDIUM', HIGH = 'HIGH' }

interface GuardrailResult {
  allowed: boolean
  blocked: boolean
  reason?: string                  // human-readable detection reason
  severity: Severity               // 4 levels: info | warning | blocked | critical
  risk_level: RiskLevel            // 3 levels: LOW | MEDIUM | HIGH (>=10 MEDIUM, >=25 HIGH)
  risk_score: number               // cumulative — sum of finding weights (0–100+)
  findings: Finding[]              // raw matches (pattern, position, snippet)
  timestamp: number                // Unix ms
  subResults?: Array<{ key: string; result: GuardrailResult }>
  metadata?: Record<string, unknown>
}