# Black_Wall — full reference for LLMs and agents > A pre-action risk gate for autonomous AI agents. Before an agent takes an irreversible > or high-stakes action, it calls one endpoint and gets back a risk score, a reversibility > class, named red flags, and a verdict — in a few seconds — so it knows whether to > proceed, confirm with a human, or stop. Free tier, no credit card. This file is the complete reference. The short index is at https://blackwalltier.com/llms.txt ## What it is, and what it is not Black_Wall runs BEFORE an action, not after. It is the judgment layer that decides whether a proposed action should happen now — catching what static permission rules can't: prompt injection, anomalous amounts, PII in the wrong place, destructive SQL, prod-from-staging, irreversible deletes with no backup. It COMPLEMENTS, and does not replace: rollback, backups, audit logs, observability — all of which act AFTER the damage. It also does not replace real-time safety systems (e.g. a robot's collision controller); it is a deliberative check on an action before it is dispatched. Why it matters most for autonomous agents: a human-in-the-loop agent has a safety net (the human). A fully independent agent has none, so the gate is the only thing catching a bad action — and the CONFIRM / HUMAN_REQUIRED verdicts are how it knows the rare moment to escalate to a human. ## The forecast endpoint POST https://blackwalltier.com/api/v1/forecast Header: Authorization: Bearer (free key at https://blackwalltier.com) Request body: - action (string, required): what the agent intends to do, e.g. send_email, run_sql, make_payment, file_delete, post_content, api_call, transfer_crypto. Free-form is fine. - inputs (object, required): the concrete payload of the action (recipient, amount, SQL statement, file path, message body, etc.). - context (object, optional): - agent_role (string): the calling agent's role. - user_intent (string): what the user asked for. - prior_actions (string[]): recent prior steps. - prior_findings (object[]): optional upstream risk signals from offline analysis or a live multi-agent session (source, agent_role, tool_name, edge, risk_type, severity, blast_radius, recommendation, note). Treated as priors, not verdicts. - options.depth ("standard" | "deep"): "deep" adds a reasoning trace, counterfactuals, and mitigations (~1.5–2x tokens, ~10–13s vs ~4–8s). Response body: - recommendation: "GO" | "CAUTION" | "STOP" - risk_score: integer 0–100 - confidence: float 0–1 - reversibility: { class: "REVERSIBLE" | "RECOVERABLE" | "IRREVERSIBLE", rollback_cost 0–100, rollback_window_sec (int|null), rationale } - gate: "AUTO" | "CONFIRM" | "HUMAN_REQUIRED" — derived from risk_score AND reversibility. An irreversible action is gated at a lower risk threshold than a reversible one. - predicted_result: { outcome, side_effects[] } - red_flags: [{ severity: low|medium|high|critical, code, message }] - alternative_actions: string[] — concrete safer actions the agent could take instead. - receipt: { id, forecast_id, issued_at, algorithm: "ed25519", key_id, request_hash, response_hash, signature, verify_url } — see "Decision receipts" below. - (deep mode) decision_trace[], counterfactuals[], mitigations[] ## Decision receipts (cryptographic, verifiable offline) Every successful forecast response includes a `receipt` — an Ed25519 signature over canonical SHA-256 hashes of the request body and the response body (minus the receipt itself). Anyone with the published public key can verify offline that Black_Wall actually signed off on a specific (request, response) pair, without trusting our servers. Hashes only — no payload exposure. The raw request/response bytes are NEVER stored server-side. To verify a receipt, the caller (who already has the original bodies) re-canonicalizes + re-hashes locally; the receipt is the cryptographic fingerprint. Endpoints: - GET /.well-known/blackwall-signing-keys.json — published Ed25519 public keys + the active key_id. Stable URL; cache it. - POST /api/v1/receipts/verify — stateless, public, no API key. Body: { envelope, request_body, response_body }. Returns { valid: boolean, reason: string }. - GET /api/v1/receipts/{id} — authenticated; returns the envelope by id (Black_Wall's copy of what it signed). Cross-tenant lookups return 404. Free-tier receipts are retained 90 days; paid-tier receipts are indefinite. Key rotation: receipts carry their key_id, and retired keys remain published for verification. ## How an agent should act on the verdict - gate=AUTO / recommendation=GO → proceed with the action. - gate=CONFIRM / recommendation=CAUTION → confirm with a human before proceeding. - gate=HUMAN_REQUIRED / recommendation=STOP → do NOT take the action. It has NOT been executed — do not assume it succeeded or build downstream logic on it. Surface the red flags and pivot to one of the alternative_actions. ## Worked example Request: action=run_sql, inputs={"statement":"DELETE FROM users;"}, context={agent_role: "data cleanup bot", user_intent:"archive inactive customers"}. Response (illustrative): recommendation=STOP, risk_score=92, reversibility={class: IRREVERSIBLE, rollback_cost:98}, gate=HUMAN_REQUIRED, red_flags=[SQL_NO_WHERE, DESTRUCTIVE_VERB, IRREVERSIBLE_NO_BACKUP], alternative_actions=["Add a WHERE clause scoped to inactive customers and take a snapshot first"]. ## The red-flag taxonomy (categories) Black_Wall returns named red-flag codes grouped into: financial/commitment (e.g. AMOUNT_OUT_OF_BAND, AMOUNT_UNVERIFIED), communications (RECIPIENT_UNVERIFIED, MASS_RECIPIENT, PII_EXPOSURE), data mutation (SQL_NO_WHERE, DESTRUCTIVE_VERB, IRREVERSIBLE_NO_BACKUP, CROSS_ENVIRONMENT), authorization/access (MISSING_AUTH, PERMISSION_ESCALATION), intent integrity (INTENT_MISMATCH, AMBIGUOUS_INTENT, PROMPT_INJECTION_LIKELY), operational (RATE_ANOMALY, STALE_CONTEXT), and compliance/policy (REGULATORY_RISK, POLICY_DRIFT). Full list and per-code pages: https://blackwalltier.com/failure-modes ## MCP server Install: npx -y blackwall-mcp (npm package: blackwall-mcp). Exposes TWO tools to any MCP host (Claude Desktop/Code, Cursor, Windsurf, Goose, Google Antigravity): - `forecast` — pre-action risk check (the verdict + receipt described above). - `observe` — post-action outcome report. Call AFTER the action runs (or after deliberately not running it due to a STOP verdict) with the forecast_id. Records what actually happened so Black_Wall can track prediction accuracy over time. FREE — no tokens charged. Recommended `observe` payload: - forecast_id (required): the `id` from the matching forecast response. - outcome_class (recommended): matched | over_scope | under_scope | no_op | diverged | aborted - divergence_severity (recommended): none | low | medium | high | critical - actual_targets (optional): array of strings — IDs/paths/hashes of what was actually affected. - details (optional): free-form notes. Config needs BLACKWALL_API_KEY. BLACKWALL_MODE=observe scores and logs but never blocks (zero behavior change) — switch to enforce when ready. ## Integrations Drop-in pre-action guards: https://github.com/bluetieroperations-create/blackwall-integrations — LangChain, CrewAI, Pydantic AI, AutoGen, LlamaIndex, Vercel AI SDK, OpenAI, n8n, LiteLLM (proxy guardrail), Stripe, PayPal, Coinbase AgentKit, Shopify, Twilio, cloud/infra, MCP hosts. ## Reference - Failure modes: https://blackwalltier.com/failure-modes - Benchmark (results on a labeled set): https://blackwalltier.com/benchmark - Incidents (real, cited AI-agent disasters): https://blackwalltier.com/incidents - Stress test (run your agent's actions through the failure modes pre-deploy): https://blackwalltier.com/stress-test - Security: https://blackwalltier.com/security - OpenAPI spec: https://blackwalltier.com/openapi.yaml