A new employee gets a company card with a spending limit. They can buy what they need — but can't approve their own expense reports, and can't lie about a receipt once it's in the ledger. Custodian is that card, that limit, and that ledger — for AI agents. Enforced at the OS level, not promised in a prompt.
Spend caps and approval flows are commodities now. The hard problem isn't limiting a number — it's that the agent can be wrong, or can lie, and that it shouldn't be trusted to route money through an approved path in the first place.
The control lives in their custodial cloud. The agent reaches money by calling their SDK, and safety rests on the assumption it'll use the approved path. They cap the dollar amount — but never check whether what the agent claims is even true.
The control lives in Landlock + kernel egress policy. The agent literally cannot open a socket to a payment endpoint the OS hasn't allowed. And a deterministic verifier checks every fact the agent asserts against ground truth — so it can't lie its way to a payout. Non-custodial, rail-agnostic, self-hosted.
The agent reads the messy real world and makes a recommendation. Then three deterministic, zero-AI layers get the final say — and any one of them can stop the money.
Nemotron reads messy, unstructured customer messages and extracts structured claims — was it delivered? in the return window? defective? — assigns confidence, and proposes a disposition it has zero power to act on. Everything after that is deterministic code.
can be wrong · can lie · doesn't matterEvery factual claim the agent made is resolved against ground truth. A claim the data refutes is flagged CONTRADICTED before anything downstream trusts it.
deterministic · zero-AIBands and caps decide AUTONOMOUS / ESCALATE / DENY. Over the cap requires a real human signature (Twilio Verify SMS). The agent never holds both keys.
enforced at OS levelThe agent can lie — and money still can't move wrong. When a customer invents a story to get a refund and the AI recommends approve, the verifier catches that the claim is contradicted by the ledger and the kernel overrides the AI. No competitor can demonstrate this, because their model is "agent asks → check the limit," not "agent asks → check if the agent is lying."
This is not a software firewall. It is not a policy check in Python. NemoClaw is a Landlock LSM + OPA enforcement layer baked into the OS itself. The agent literally cannot open a socket to a payment endpoint the kernel hasn't whitelisted — regardless of what the model decides.
Linux Security Module enforcing least-privilege file and network access at the syscall boundary. Even a compromised model cannot open a socket to an un-whitelisted endpoint — the kernel rejects it before user-space sees it.
Open Policy Agent evaluates every action request against the authority band in real time. Per-action caps ($250), rolling session windows ($1,000/2 hr), and escalation thresholds are enforced as Rego rules — not application code that can be patched around.
Every allow and deny emits an Open Cybersecurity Schema Framework event — tamper-evident, structured, and verifiable by any SIEM. The log below is the live feed from the running sandbox right now.
A real Nous Hermes agent, in a real kernel sandbox, paying real Stripe PaymentIntents — protecting ArgoBox, a production homelab. These numbers are pulled live from the running system as you read this.
The AI reads it. The verifier checks every factual claim against the real order record. When the facts don't hold, the kernel overrides the AI — even if the AI said APPROVE.
Think of it like a new employee at a company. They can fill out a purchase order and decide it makes sense — but they can't sign their own check. The signed check is a separate system, run by people with authority the employee doesn't have.
Custodian does the same thing for AI. The agent (Nemotron) can decide a payment makes sense. But the actual move of money goes through a second system — the kernel — that checks the amount, the session budget, and whether the agent has been tricked. The agent never holds both keys at once.
Payman, Skyfire, Rain, Ramp, Catena — these are real B2B fintech companies, not hackathon projects. They have card issuance, stablecoin rails, and compliance frameworks we don't. What they don't have is the bottom three rows.
| Capability | Payman · Skyfire · Rain · Ramp · Catena | Custodian |
|---|---|---|
| Spend caps · approval · audit trail | ✓ table stakes | ✓ |
| Real card issuance & payment rails | ✓ (Ramp, Rain) | ✕ not our lane |
| Stablecoin / crypto rails | ✓ (Skyfire, Rain) | ✕ not our lane |
| SOC 2 / KYC compliance | ✓ (Payman, Catena) | ✕ early stage |
| Catches the agent lying — facts vs ground truth | ✕ none | ✓ only us |
| Enforcement below the agent — kernel, not API policy | ✕ none | ✓ only us |
| Self-hosted · non-custodial · rail-agnostic | ✕ they hold the funds | ✓ only us |
| Model-agnostic enforcement — swap Gemini, GPT, or a local DGX model; kernel safety properties don't change | ✕ coupled to their stack | ✓ LLMClient Protocol |
Our differentiator isn't payment infrastructure — it's enforcement architecture. The kernel sits underneath whatever rails and whatever model you use. Plug in Stripe, a bank API, or a stablecoin; swap Nemotron for any other model — the enforcement model doesn't change.
Watch a real agent recommend approving a fraudulent refund — and watch the deterministic kernel override it, with real Stripe IDs and an append-only audit trail.
Money is just the first module. The same kernel governs any consequential action an AI agent can take — provisioning, payroll, data egress, infrastructure.