What is Custodian?
Custodian is a kernel-enforced authority layer that sits between an AI agent and real money. It enforces spending rules at the operating system level — not in application code the agent could reason around — so the agent can never self-approve a payment or spend beyond its authority, no matter what it decides.
Think of a new employee with a company card. They can buy what they need — but they can't sign their own expense reports, and they can't lie about a receipt once it's in the ledger. Custodian is that card, that limit, and that ledger, for AI agents.
The AI agent (Nemotron) can only ever request an action. It cannot approve its own requests. The kernel is the only thing that can authorize money moving — and it never consults the agent to make that decision.
The three decision outcomes
| Outcome | Condition | What happens |
|---|---|---|
| AUTONOMOUS | Amount ≤ per-action cap, session budget available, kill switch off | Stripe charges directly. Zero humans involved. Real PaymentIntent created. |
| ESCALATE | Amount exceeds per-action cap | A real Twilio Verify SMS code is sent to the operator's phone. Money only moves after the human enters that code. The agent cannot see or guess the code. |
| KILL SWITCH | Kill switch engaged (by operator) | Every request is denied immediately. No band, no approval path, no exception can override an engaged kill switch. |
| VERIFIER DENY | Agent's factual claim is CONTRADICTED by ground truth | The kernel overrides the agent's recommendation. Even an APPROVE from the AI becomes a DENY if the claim doesn't hold against the order record. |
How the layers fit together
Every money request passes through four independent layers in sequence. Any layer can stop it. None of them trust the previous one.
Customer message / task
│
▼
┌─────────────────────────────────┐
│ 01 · INTELLIGENCE LAYER │ ← Nemotron 3 Super (NVIDIA API)
│ Reads prose, extracts claims, │
│ assigns confidence, proposes │
│ a disposition. Cannot act. │
└──────────────┬──────────────────┘
│ proposal
▼
┌─────────────────────────────────┐
│ 02 · VERIFIER │ ← Deterministic, zero AI
│ Checks every factual claim │
│ against ground truth. │
│ CONTRADICTED → overrides AI. │
└──────────────┬──────────────────┘
│ verified claims
▼
┌─────────────────────────────────┐
│ 03 · KERNEL (NemoClaw) │ ← Landlock LSM + OPA policy
│ Checks band, cap, kill switch. │
│ AUTONOMOUS / ESCALATE / DENY │
└──────────────┬──────────────────┘
│ if ESCALATE
▼
┌─────────────────────────────────┐
│ 04 · HUMAN (out-of-band SMS) │ ← Twilio Verify on operator's phone
│ Real code, real phone. │
│ Agent cannot see or guess it. │
└─────────────────────────────────┘
│ all outcomes
▼
OCSF AUDIT LOG (append-only, tamper-evident)
The verifier runs before the kernel. A lie that fools the AI is caught before authority bands are even consulted. The kernel only decides whether the agent has authority — the verifier decides whether the claim is even true.
The kernel (NemoClaw)
NemoClaw is NVIDIA's OpenShell kernel sandbox — a container-level enforcement boundary. Custodian's authority engine runs inside that sandbox: the deterministic Python evaluator, the OCSF audit log, and the Twilio escalation path all execute under Landlock's least-privilege policy. The two layers are complementary, not competing.
NemoClaw (NVIDIA): Landlock LSM + OPA enforce what the container's processes can touch — which sockets, which files. This is the OS-level boundary Custodian runs inside.
Custodian's evaluator: Deterministic Python authority checks — band logic, per-action cap, session budget, kill switch, verifier. These run as code inside the sandbox, protected by the Landlock boundary above.
Landlock LSM
Linux Security Module enforcing least-privilege file and network access at the syscall boundary. The agent's process cannot open a socket to a payment endpoint the kernel hasn't whitelisted. The OS rejects the syscall before user-space code sees it. A prompt cannot override a syscall rejection.
OPA Policy Engine
Open Policy Agent evaluates every action request against the current authority band in real time. Per-action caps ($250), rolling session windows ($1,000 / 2 hr), and escalation thresholds are enforced as Rego rules — not application code that can be patched or reasoned around.
OCSF Audit Log
Every allow and deny emits an Open Cybersecurity Schema Framework event. The log is append-only, tamper-evident, and structured — verifiable by any SIEM. You can see the live feed on the Live Console.
An agent running in software can, in principle, be instructed to bypass software-level controls. The kernel enforces egress at the OS — a clever prompt cannot override a Landlock rule any more than a clever argument can make a locked door open.
Nemotron — the intelligence layer
Nemotron 3 Super is NVIDIA's reasoning model. In Custodian it fills one specific, narrow role: reading unstructured customer messages and extracting structured claims.
What Nemotron actually does:
- Reads messy, ambiguous prose (emails, invoices, support tickets)
- Extracts structured claims: Was it delivered? Is it in the return window? Was it defective?
- Assigns a confidence level to each claim
- Proposes a disposition (APPROVE / DENY / ESCALATE) that it has zero power to act on
Everything after Nemotron's output is deterministic code. The kernel does not trust Nemotron's recommendation — it verifies each claim independently before deciding anything.
Model-agnostic architecture
The enforcement layer is built against the LLMClient Protocol — a defined interface. Nemotron is one implementation. CapturedClient (for testing) is another. The kernel does not care which model is plugged in. The safety properties do not change if you swap Nemotron for Gemini, GPT, or a locally fine-tuned model on a DGX Spark.
# custodian/packs/agent.py
class LLMClient(Protocol):
def complete(self, messages: list[dict]) -> str: ...
class NvidiaNemotronClient: # production
def complete(self, messages): ...
class CapturedClient: # testing / replay
def complete(self, messages): ...
Reading intent out of messy customer prose — and deciding what's actually being claimed — is exactly the task where domain-trained intelligence earns its keep. A script can't do it. The model can. Nemotron was chosen for payments reasoning specifically; the architecture lets it be swapped without touching the enforcement layer.
The verifier — lie-catch
The verifier is a deterministic, zero-AI layer that runs between the intelligence layer and the kernel. It takes every factual claim Nemotron extracted and resolves each one against the real order record.
Claims it checks (examples):
- "The package never arrived" — checks delivery status in the database
- "It arrived defective" — checks defect records
- "I was charged twice" — checks payment count
- "I'm within the return window" — computes days since purchase against policy
A claim the data refutes is flagged CONTRADICTED. When a claim is contradicted, the agent's recommendation is overridden — regardless of what Nemotron said.
The verifier cannot be socially engineered. It has no language model and cannot be prompted. A customer who writes a convincing lie may fool the AI — the verifier only compares text claims to database records. It doesn't "understand" the story at all.
Try it on the Lie-Catch Demo — write your own refund excuse and watch the AI vs kernel verdict split in real time.
Authority bands
The kernel governs spending through two independent limits, both enforced by OPA rules:
| Limit | Value (L2 band) | Answers the question |
|---|---|---|
per_action_cap |
$250.00 | "How much can the agent spend on a single transaction without asking a human?" |
session_cap |
$1,000.00 / 2 hr | "How much total can the agent spend autonomously in a rolling 2-hour window?" |
per_action_cap is the ceiling for a single transaction. autonomous_remaining is how much session budget is left. A $200 spend might be fine per-action but denied if the session budget is exhausted. Both checks are independent.
The current band and remaining session budget are visible live on the Live Console status grid.
The kill switch
The kill switch is an absolute, unconditional block on all agent spend. When engaged:
- Every spend request is denied immediately
- No authority band, no approval code, no escalation path can bypass it
- The agent cannot disengage it — only a human operator can
- The denial is logged to the OCSF audit trail
The kill switch is the answer to "what if the agent is compromised?" — even if the model is generating malicious requests, the kill switch stops them all with a single toggle, with no dependency on what the model does next.
You can verify the kill switch works on the Operator Panel: engage it, then try to run a normally-fine $40 spend (Step 5). It will be denied. The audit log shows the kill switch as the reason — the agent never gets to authority band evaluation.
Human approval — out-of-band SMS
When a request exceeds the per-action cap, the kernel escalates to a human. The approval path is deliberately out-of-band:
- The kernel triggers a real Twilio Verify SMS to the operator's registered phone
- The operator receives a one-time code on their physical device
- They enter the code into the operator panel to release the hold
- Only then does Stripe process the payment
The agent can never see or guess the Twilio code. It lives only on Twilio's servers and the operator's phone — nothing in the agent's process or the dashboard can read it. This is what makes the approval genuinely out-of-band rather than a software checkpoint the agent could route around.
Refunds always escalate, regardless of amount. This is an intentional, asymmetric safety property — the risk profile of returning money is treated as equivalent to a large spend, not as a zero-risk reversal.
Earning vs spending — intentional asymmetry
Earning revenue (a customer paying the business) has no band, no cap, and no approval path. This is a deliberate design choice, not an oversight.
The reasoning: receiving money carries none of the risk that spending it does. Gating earnings would add friction to every sale with no safety benefit — the danger Custodian is designed to address is an AI spending or moving money it shouldn't, not an AI accepting payment.
The net_pnl figure on the live console shows real revenue in minus net spend — the actual economic position of the system after both directions are accounted for.
API reference
All API routes are served at getcustodian.xyz/api/v1/. Cloudflare Pages proxies them to the Flask backend via _worker.js. The target backend will be reachable at api.getcustodian.xyz once DNS is live.
Status & data
Returns the current authority state, recent audit log, pending approvals, and kernel policy log in one call. This is what all dashboard pages poll.
Response shape: { authority, audit[], pending_approvals[], policy_log[] }
Returns the most recent audit log entries (newest first). Each entry has ts, iso, event, amount, description, band, reason.
Body: { question: string, history?: [{role, content}], page?: string }
Response: { answer: string } — may contain [[jump:KEY|label]] and [[suggest:text]] markers parsed by the live console UI.
Rate-limited to 20 requests / minute per IP.
Body: { customer_email: string }
Response: { verdict, ai_recommendation, claims[], contradictions[], confidence }
Runs the real Nemotron model and the real deterministic verifier against sandbox order ord_6006.
Body: { amount: string, description: string }
Earning is asymmetrically unrestricted — no band, no approval, no cap.
Body: { amount: string, description: string }
Goes through the full kernel decision pipeline: verifier → OPA band check → AUTONOMOUS or ESCALATE or DENIED.
FAQ
No. Structurally impossible. The escalation code lives only on Twilio's servers and is sent to the operator's physical phone via SMS. Nothing in the agent's process, the Flask backend, or the dashboard can read or predict that code. The agent would have to compromise the operator's phone to self-approve — at which point the problem is not an AI authority problem.
The verifier catches it. The verifier is a deterministic engine — it doesn't read the story, it extracts claims and checks each one against the order record. A persuasive lie still produces a factual claim that either matches the database or doesn't. If it doesn't, the recommendation is overridden before it reaches the kernel. Try it on the Lie-Catch Demo.
Nothing in the enforcement layer changes. The kernel enforces authority based on the request, not based on which model generated it. Swapping the model means updating the LLMClient implementation in agent.py — the OPA rules, Landlock policy, audit log, and kill switch all remain identical.
Real Stripe test-mode PaymentIntents. They have real IDs you can look up on the Stripe dashboard. They're not live-mode (no real money moves), but the API calls, the PaymentIntent lifecycle, and the refund flow are identical to production. The Twilio SMS codes are also real — they go to a real phone.
Intentional asymmetric safety property. Refunds are treated as inherently high-risk regardless of amount — they reverse a transaction that already happened and can't be easily undone. The cost of an autonomous refund path is that a compromised agent could silently drain revenue. The cost of always-escalating is one extra SMS per refund. That tradeoff is deliberate.
When the kill switch is engaged, the OPA policy rejects every spend request before it reaches the Landlock check or Stripe. It's an unconditional Rego rule: if kill_switch = true → deny. The agent's process cannot toggle this — it's written by the operator and evaluated by OPA on the kernel side of the trust boundary.
Yes. The NemoClaw kernel sandbox, OPA policy engine, and OCSF audit log all run on the ArgoBox host. Nemotron's inference runs on NVIDIA's hosted API — its weights are not local. But the enforcement layer that governs what Nemotron's requests can do is entirely local. The trust boundary is "the model's authority is enforced locally, no matter where the model itself runs."
Glossary
custodian/packs/agent.py that the enforcement layer calls. Nemotron is one implementation; any model can be plugged in without touching the kernel.