# Black_Wall — full reference for LLMs and agents

> A pre-action risk gate for autonomous AI agents. Before an agent takes an irreversible
> or high-stakes action, it calls one endpoint and gets back a risk score, a reversibility
> class, named red flags, and a verdict — in a few seconds — so it knows whether to
> proceed, confirm with a human, or stop. Free tier, no credit card.

This file is the complete reference. The short index is at https://blackwalltier.com/llms.txt

## What it is, and what it is not

Black_Wall runs BEFORE an action, not after. It is the judgment layer that decides whether
a proposed action should happen now — catching what static permission rules can't: prompt
injection, anomalous amounts, PII in the wrong place, destructive SQL, prod-from-staging,
irreversible deletes with no backup.

It COMPLEMENTS, and does not replace: rollback, backups, audit logs, observability — all of
which act AFTER the damage. It also does not replace real-time safety systems (e.g. a robot's
collision controller); it is a deliberative check on an action before it is dispatched.

Why it matters most for autonomous agents: a human-in-the-loop agent has a safety net (the
human). A fully independent agent has none, so the gate is the only thing catching a bad
action — and the CONFIRM / HUMAN_REQUIRED verdicts are how it knows the rare moment to
escalate to a human.

## The forecast endpoint

POST https://blackwalltier.com/api/v1/forecast
Header: Authorization: Bearer <bw_live_... key>   (free key at https://blackwalltier.com)

Request body:
- action (string, required): what the agent intends to do, e.g. send_email, run_sql,
  make_payment, file_delete, post_content, api_call, transfer_crypto. Free-form is fine.
- inputs (object, required): the concrete payload of the action (recipient, amount, SQL
  statement, file path, message body, etc.).
- context (object, optional):
    - agent_role (string): the calling agent's role.
    - user_intent (string): what the user asked for.
    - prior_actions (string[]): recent prior steps.
    - prior_findings (object[]): optional upstream risk signals from offline analysis or
      a live multi-agent session (source, agent_role, tool_name, edge, risk_type, severity,
      blast_radius, recommendation, note). Treated as priors, not verdicts.
- options.depth ("standard" | "deep"): "deep" adds a reasoning trace, counterfactuals, and
  mitigations (~1.5–2x tokens, ~10–13s vs ~4–8s).

Response body:
- recommendation: "GO" | "CAUTION" | "STOP"
- risk_score: integer 0–100
- confidence: float 0–1
- reversibility: { class: "REVERSIBLE" | "RECOVERABLE" | "IRREVERSIBLE", rollback_cost
  0–100, rollback_window_sec (int|null), rationale }
- gate: "AUTO" | "CONFIRM" | "HUMAN_REQUIRED" — derived from risk_score AND reversibility.
  An irreversible action is gated at a lower risk threshold than a reversible one.
- predicted_result: { outcome, side_effects[] }
- red_flags: [{ severity: low|medium|high|critical, code, message }]
- alternative_actions: string[] — concrete safer actions the agent could take instead.
- receipt: { id, forecast_id, issued_at, algorithm: "ed25519", key_id, request_hash,
  response_hash, signature, verify_url } — see "Decision receipts" below.
- (deep mode) decision_trace[], counterfactuals[], mitigations[]

## Decision receipts (cryptographic, verifiable offline)

Every successful forecast response includes a `receipt` — an Ed25519 signature over
canonical SHA-256 hashes of the request body and the response body (minus the receipt
itself). Anyone with the published public key can verify offline that Black_Wall
actually signed off on a specific (request, response) pair, without trusting our
servers.

Hashes only — no payload exposure. The raw request/response bytes are NEVER stored
server-side. To verify a receipt, the caller (who already has the original bodies)
re-canonicalizes + re-hashes locally; the receipt is the cryptographic fingerprint.

Endpoints:
- GET /.well-known/blackwall-signing-keys.json — published Ed25519 public keys + the
  active key_id. Stable URL; cache it.
- POST /api/v1/receipts/verify — stateless, public, no API key. Body: { envelope,
  request_body, response_body }. Returns { valid: boolean, reason: string }.
- GET /api/v1/receipts/{id} — authenticated; returns the envelope by id (Black_Wall's
  copy of what it signed). Cross-tenant lookups return 404.

Free-tier receipts are retained 90 days; paid-tier receipts are indefinite. Key
rotation: receipts carry their key_id, and retired keys remain published for verification.

## How an agent should act on the verdict

- gate=AUTO / recommendation=GO → proceed with the action.
- gate=CONFIRM / recommendation=CAUTION → confirm with a human before proceeding.
- gate=HUMAN_REQUIRED / recommendation=STOP → do NOT take the action. It has NOT been
  executed — do not assume it succeeded or build downstream logic on it. Surface the red
  flags and pivot to one of the alternative_actions.

## Worked example

Request: action=run_sql, inputs={"statement":"DELETE FROM users;"}, context={agent_role:
"data cleanup bot", user_intent:"archive inactive customers"}.
Response (illustrative): recommendation=STOP, risk_score=92, reversibility={class:
IRREVERSIBLE, rollback_cost:98}, gate=HUMAN_REQUIRED, red_flags=[SQL_NO_WHERE,
DESTRUCTIVE_VERB, IRREVERSIBLE_NO_BACKUP], alternative_actions=["Add a WHERE clause scoped
to inactive customers and take a snapshot first"].

## The red-flag taxonomy (categories)

Black_Wall returns named red-flag codes grouped into: financial/commitment (e.g.
AMOUNT_OUT_OF_BAND, AMOUNT_UNVERIFIED), communications (RECIPIENT_UNVERIFIED, MASS_RECIPIENT,
PII_EXPOSURE), data mutation (SQL_NO_WHERE, DESTRUCTIVE_VERB, IRREVERSIBLE_NO_BACKUP,
CROSS_ENVIRONMENT), authorization/access (MISSING_AUTH, PERMISSION_ESCALATION), intent
integrity (INTENT_MISMATCH, AMBIGUOUS_INTENT, PROMPT_INJECTION_LIKELY), operational
(RATE_ANOMALY, STALE_CONTEXT), and compliance/policy (REGULATORY_RISK, POLICY_DRIFT). Full
list and per-code pages: https://blackwalltier.com/failure-modes

## MCP server

Install: npx -y blackwall-mcp  (npm package: blackwall-mcp). Exposes TWO tools to any MCP
host (Claude Desktop/Code, Cursor, Windsurf, Goose, Google Antigravity):

- `forecast` — pre-action risk check (the verdict + receipt described above).
- `observe` — post-action outcome report. Call AFTER the action runs (or after deliberately
  not running it due to a STOP verdict) with the forecast_id. Records what actually happened
  so Black_Wall can track prediction accuracy over time. FREE — no tokens charged.

Recommended `observe` payload:
- forecast_id (required): the `id` from the matching forecast response.
- outcome_class (recommended): matched | over_scope | under_scope | no_op | diverged | aborted
- divergence_severity (recommended): none | low | medium | high | critical
- actual_targets (optional): array of strings — IDs/paths/hashes of what was actually affected.
- details (optional): free-form notes.

Config needs BLACKWALL_API_KEY. BLACKWALL_MODE=observe scores and logs but never blocks
(zero behavior change) — switch to enforce when ready.

## Integrations

Drop-in pre-action guards: https://github.com/bluetieroperations-create/blackwall-integrations
— LangChain, CrewAI, Pydantic AI, AutoGen, LlamaIndex, Vercel AI SDK, OpenAI, n8n, LiteLLM
(proxy guardrail), Stripe, PayPal, Coinbase AgentKit, Shopify, Twilio, cloud/infra, MCP hosts.

## Reference

- Failure modes: https://blackwalltier.com/failure-modes
- Benchmark (results on a labeled set): https://blackwalltier.com/benchmark
- Incidents (real, cited AI-agent disasters): https://blackwalltier.com/incidents
- Stress test (run your agent's actions through the failure modes pre-deploy): https://blackwalltier.com/stress-test
- Security: https://blackwalltier.com/security
- OpenAPI spec: https://blackwalltier.com/openapi.yaml