Custodian Docs — How kernel-enforced AI authority works

Start here

What is Custodian?

Custodian is a kernel-enforced authority layer that sits between an AI agent and real money. It enforces spending rules at the operating system level — not in application code the agent could reason around — so the agent can never self-approve a payment or spend beyond its authority, no matter what it decides.

Think of a new employee with a company card. They can buy what they need — but they can't sign their own expense reports, and they can't lie about a receipt once it's in the ledger. Custodian is that card, that limit, and that ledger, for AI agents.

The core guarantee

The AI agent (Nemotron) can only ever request an action. It cannot approve its own requests. The kernel is the only thing that can authorize money moving — and it never consults the agent to make that decision.

The three decision outcomes

Outcome	Condition	What happens
AUTONOMOUS	Amount ≤ per-action cap, session budget available, kill switch off	Stripe charges directly. Zero humans involved. Real PaymentIntent created.
ESCALATE	Amount exceeds per-action cap	A real Twilio Verify SMS code is sent to the operator's phone. Money only moves after the human enters that code. The agent cannot see or guess the code.
KILL SWITCH	Kill switch engaged (by operator)	Every request is denied immediately. No band, no approval path, no exception can override an engaged kill switch.
VERIFIER DENY	Agent's factual claim is CONTRADICTED by ground truth	The kernel overrides the agent's recommendation. Even an APPROVE from the AI becomes a DENY if the claim doesn't hold against the order record.

Architecture

How the layers fit together

Every money request passes through four independent layers in sequence. Any layer can stop it. None of them trust the previous one.

Customer message / task
        │
        ▼
┌─────────────────────────────────┐
│  01 · INTELLIGENCE LAYER        │  ← Nemotron 3 Super (NVIDIA API)
│  Reads prose, extracts claims,  │
│  assigns confidence, proposes   │
│  a disposition. Cannot act.     │
└──────────────┬──────────────────┘
               │ proposal
        ▼
┌─────────────────────────────────┐
│  02 · VERIFIER                  │  ← Deterministic, zero AI
│  Checks every factual claim     │
│  against ground truth.          │
│  CONTRADICTED → overrides AI.   │
└──────────────┬──────────────────┘
               │ verified claims
        ▼
┌─────────────────────────────────┐
│  03 · KERNEL (NemoClaw)         │  ← Landlock LSM + OPA policy
│  Checks band, cap, kill switch. │
│  AUTONOMOUS / ESCALATE / DENY   │
└──────────────┬──────────────────┘
               │ if ESCALATE
        ▼
┌─────────────────────────────────┐
│  04 · HUMAN (out-of-band SMS)   │  ← Twilio Verify on operator's phone
│  Real code, real phone.         │
│  Agent cannot see or guess it.  │
└─────────────────────────────────┘
               │ all outcomes
        ▼
   OCSF AUDIT LOG (append-only, tamper-evident)

Why this order matters

The verifier runs before the kernel. A lie that fools the AI is caught before authority bands are even consulted. The kernel only decides whether the agent has authority — the verifier decides whether the claim is even true.

Core concept

The kernel (NemoClaw)

NemoClaw is NVIDIA's OpenShell kernel sandbox — a container-level enforcement boundary. Custodian's authority engine runs inside that sandbox: the deterministic Python evaluator, the OCSF audit log, and the Twilio escalation path all execute under Landlock's least-privilege policy. The two layers are complementary, not competing.

Layering: NVIDIA's boundary + Custodian's engine

NemoClaw (NVIDIA): Landlock LSM + OPA enforce what the container's processes can touch — which sockets, which files. This is the OS-level boundary Custodian runs inside.
Custodian's evaluator: Deterministic Python authority checks — band logic, per-action cap, session budget, kill switch, verifier. These run as code inside the sandbox, protected by the Landlock boundary above.

Landlock LSM

Linux Security Module enforcing least-privilege file and network access at the syscall boundary. The agent's process cannot open a socket to a payment endpoint the kernel hasn't whitelisted. The OS rejects the syscall before user-space code sees it. A prompt cannot override a syscall rejection.

OPA Policy Engine

Open Policy Agent evaluates every action request against the current authority band in real time. Per-action caps ($250), rolling session windows ($1,000 / 2 hr), and escalation thresholds are enforced as Rego rules — not application code that can be patched or reasoned around.

OCSF Audit Log

Every allow and deny emits an Open Cybersecurity Schema Framework event. The log is append-only, tamper-evident, and structured — verifiable by any SIEM. You can see the live feed on the Live Console.

Why kernel-level?

An agent running in software can, in principle, be instructed to bypass software-level controls. The kernel enforces egress at the OS — a clever prompt cannot override a Landlock rule any more than a clever argument can make a locked door open.

Core concept

Nemotron — the intelligence layer

Nemotron 3 Super is NVIDIA's reasoning model. In Custodian it fills one specific, narrow role: reading unstructured customer messages and extracting structured claims.

What Nemotron actually does:

Reads messy, ambiguous prose (emails, invoices, support tickets)
Extracts structured claims: Was it delivered? Is it in the return window? Was it defective?
Assigns a confidence level to each claim
Proposes a disposition (APPROVE / DENY / ESCALATE) that it has zero power to act on

Everything after Nemotron's output is deterministic code. The kernel does not trust Nemotron's recommendation — it verifies each claim independently before deciding anything.

Model-agnostic architecture

The enforcement layer is built against the LLMClient Protocol — a defined interface. Nemotron is one implementation. CapturedClient (for testing) is another. The kernel does not care which model is plugged in. The safety properties do not change if you swap Nemotron for Gemini, GPT, or a locally fine-tuned model on a DGX Spark.

# custodian/packs/agent.py
class LLMClient(Protocol):
    def complete(self, messages: list[dict]) -> str: ...

class NvidiaNemotronClient:          # production
    def complete(self, messages): ...

class CapturedClient:                # testing / replay
    def complete(self, messages): ...

Why Nemotron specifically?

Reading intent out of messy customer prose — and deciding what's actually being claimed — is exactly the task where domain-trained intelligence earns its keep. A script can't do it. The model can. Nemotron was chosen for payments reasoning specifically; the architecture lets it be swapped without touching the enforcement layer.

Core concept

The verifier — lie-catch

The verifier is a deterministic, zero-AI layer that runs between the intelligence layer and the kernel. It takes every factual claim Nemotron extracted and resolves each one against the real order record.

Claims it checks (examples):

"The package never arrived" — checks delivery status in the database
"It arrived defective" — checks defect records
"I was charged twice" — checks payment count
"I'm within the return window" — computes days since purchase against policy

A claim the data refutes is flagged CONTRADICTED. When a claim is contradicted, the agent's recommendation is overridden — regardless of what Nemotron said.

The key architectural property

The verifier cannot be socially engineered. It has no language model and cannot be prompted. A customer who writes a convincing lie may fool the AI — the verifier only compares text claims to database records. It doesn't "understand" the story at all.

Try it on the Lie-Catch Demo — write your own refund excuse and watch the AI vs kernel verdict split in real time.

Core concept

Authority bands

The kernel governs spending through two independent limits, both enforced by OPA rules:

Limit	Value (L2 band)	Answers the question
`per_action_cap`	$250.00	"How much can the agent spend on a single transaction without asking a human?"
`session_cap`	$1,000.00 / 2 hr	"How much total can the agent spend autonomously in a rolling 2-hour window?"

These answer different questions — never mix them

per_action_cap is the ceiling for a single transaction. autonomous_remaining is how much session budget is left. A $200 spend might be fine per-action but denied if the session budget is exhausted. Both checks are independent.

The current band and remaining session budget are visible live on the Live Console status grid.

Core concept

The kill switch

The kill switch is an absolute, unconditional block on all agent spend. When engaged:

Every spend request is denied immediately
No authority band, no approval code, no escalation path can bypass it
The agent cannot disengage it — only a human operator can
The denial is logged to the OCSF audit trail

The kill switch is the answer to "what if the agent is compromised?" — even if the model is generating malicious requests, the kill switch stops them all with a single toggle, with no dependency on what the model does next.

Proof of the property

You can verify the kill switch works on the Operator Panel: engage it, then try to run a normally-fine $40 spend (Step 5). It will be denied. The audit log shows the kill switch as the reason — the agent never gets to authority band evaluation.

Core concept

Human approval — out-of-band SMS

When a request exceeds the per-action cap, the kernel escalates to a human. The approval path is deliberately out-of-band:

The kernel triggers a real Twilio Verify SMS to the operator's registered phone
The operator receives a one-time code on their physical device
They enter the code into the operator panel to release the hold
Only then does Stripe process the payment

The agent can never see or guess the Twilio code. It lives only on Twilio's servers and the operator's phone — nothing in the agent's process or the dashboard can read it. This is what makes the approval genuinely out-of-band rather than a software checkpoint the agent could route around.

Refunds always escalate, regardless of amount. This is an intentional, asymmetric safety property — the risk profile of returning money is treated as equivalent to a large spend, not as a zero-risk reversal.

Core concept

Earning vs spending — intentional asymmetry

Earning revenue (a customer paying the business) has no band, no cap, and no approval path. This is a deliberate design choice, not an oversight.

The reasoning: receiving money carries none of the risk that spending it does. Gating earnings would add friction to every sale with no safety benefit — the danger Custodian is designed to address is an AI spending or moving money it shouldn't, not an AI accepting payment.

The net_pnl figure on the live console shows real revenue in minus net spend — the actual economic position of the system after both directions are accounted for.

Reference

API reference

All API routes are served at getcustodian.xyz/api/v1/. Cloudflare Pages proxies them to the Flask backend via _worker.js. The target backend will be reachable at api.getcustodian.xyz once DNS is live.

Status & data

GET /api/v1/hermes/summary Combined dashboard payload

Returns the current authority state, recent audit log, pending approvals, and kernel policy log in one call. This is what all dashboard pages poll.

Response shape: { authority, audit[], pending_approvals[], policy_log[] }

GET /api/v1/hermes/audit Audit log entries

Returns the most recent audit log entries (newest first). Each entry has ts, iso, event, amount, description, band, reason.

POST /api/v1/nemotron/ask Ask Nemotron a question

Body: { question: string, history?: [{role, content}], page?: string }

Response: { answer: string } — may contain [[jump:KEY|label]] and [[suggest:text]] markers parsed by the live console UI.

Rate-limited to 20 requests / minute per IP.

POST /api/v1/triage/custom Run lie-catch on any text

Body: { customer_email: string }

Response: { verdict, ai_recommendation, claims[], contradictions[], confidence }

Runs the real Nemotron model and the real deterministic verifier against sandbox order ord_6006.

POST /api/v1/operator/earn Record revenue (no cap)

Body: { amount: string, description: string }

Earning is asymmetrically unrestricted — no band, no approval, no cap.

POST /api/v1/operator/spend Request a spend (kernel-gated)

Body: { amount: string, description: string }

Goes through the full kernel decision pipeline: verifier → OPA band check → AUTONOMOUS or ESCALATE or DENIED.

Reference

FAQ

No. Structurally impossible. The escalation code lives only on Twilio's servers and is sent to the operator's physical phone via SMS. Nothing in the agent's process, the Flask backend, or the dashboard can read or predict that code. The agent would have to compromise the operator's phone to self-approve — at which point the problem is not an AI authority problem.

The verifier catches it. The verifier is a deterministic engine — it doesn't read the story, it extracts claims and checks each one against the order record. A persuasive lie still produces a factual claim that either matches the database or doesn't. If it doesn't, the recommendation is overridden before it reaches the kernel. Try it on the Lie-Catch Demo.

Nothing in the enforcement layer changes. The kernel enforces authority based on the request, not based on which model generated it. Swapping the model means updating the LLMClient implementation in agent.py — the OPA rules, Landlock policy, audit log, and kill switch all remain identical.

Real Stripe test-mode PaymentIntents. They have real IDs you can look up on the Stripe dashboard. They're not live-mode (no real money moves), but the API calls, the PaymentIntent lifecycle, and the refund flow are identical to production. The Twilio SMS codes are also real — they go to a real phone.

Intentional asymmetric safety property. Refunds are treated as inherently high-risk regardless of amount — they reverse a transaction that already happened and can't be easily undone. The cost of an autonomous refund path is that a compromised agent could silently drain revenue. The cost of always-escalating is one extra SMS per refund. That tradeoff is deliberate.

When the kill switch is engaged, the OPA policy rejects every spend request before it reaches the Landlock check or Stripe. It's an unconditional Rego rule: if kill_switch = true → deny. The agent's process cannot toggle this — it's written by the operator and evaluated by OPA on the kernel side of the trust boundary.

Yes. The NemoClaw kernel sandbox, OPA policy engine, and OCSF audit log all run on the ArgoBox host. Nemotron's inference runs on NVIDIA's hosted API — its weights are not local. But the enforcement layer that governs what Nemotron's requests can do is entirely local. The trust boundary is "the model's authority is enforced locally, no matter where the model itself runs."

Reference

Glossary

Authority band

The set of spending limits that govern an agent session. The L2 band has a $250 per-action cap and a $1,000 rolling 2-hour session cap.

per_action_cap

The maximum the agent can spend on a single transaction without human approval. Currently $250 in the L2 band.

autonomous_remaining

How much of the session budget is still available for autonomous spending. Distinct from per_action_cap — they answer different questions.

NemoClaw

The NVIDIA OpenShell kernel sandbox. Landlock LSM + OPA policy engine + OCSF audit log. The enforcement layer that physically constrains what the agent's process can do.

LLMClient Protocol

The defined interface in custodian/packs/agent.py that the enforcement layer calls. Nemotron is one implementation; any model can be plugged in without touching the kernel.

OCSF

Open Cybersecurity Schema Framework. Structured, tamper-evident audit event format used for every kernel allow and deny.

OPA

Open Policy Agent. The policy engine that evaluates authority band rules as Rego. Runs inside the kernel sandbox.

Landlock LSM

Linux Security Module. Enforces least-privilege file and network access at the syscall layer. The agent cannot open a socket the Landlock policy hasn't whitelisted.

Kill switch

An operator-controlled absolute block on all agent spend. Overrides every other policy — band, cap, approval — with no exceptions.

Verifier

The deterministic, zero-AI layer that checks Nemotron's extracted factual claims against the real order record. A contradicted claim overrides the AI's recommendation.

Twilio Verify

The out-of-band SMS approval service. Sends a one-time code to the operator's physical phone for over-cap and refund approvals. The code never passes through the agent's process.