NVIDIA · Nous Research · Stripe · Hackathon 2026

Give your AI a company card.
With a kernel that actually enforces it.

A new employee gets a company card with a spending limit. They can buy what they need — but can't approve their own expense reports, and can't lie about a receipt once it's in the ledger. Custodian is that card, that limit, and that ledger — for AI agents. Enforced at the OS level, not promised in a prompt.

AGENT
Nemotron 3 Super
ROLE REQUESTS ONLY
+
AUTHORITY
Custodian Kernel
PER-ACTION CAP $250
LIMITS ENFORCED BELOW THE AGENT · OS-LEVEL · NOT A PROMPT · NOT A PROMISE
Authority band
Session budget left
Decision modules
3
Governed tools
100
Boundary
ENFORCED AT KERNEL
Built with
Why Custodian is different

Everyone gives the agent a wallet.
We give it a kernel.

Spend caps and approval flows are commodities now. The hard problem isn't limiting a number — it's that the agent can be wrong, or can lie, and that it shouldn't be trusted to route money through an approved path in the first place.

Everyone else

A constrained wallet

The control lives in their custodial cloud. The agent reaches money by calling their SDK, and safety rests on the assumption it'll use the approved path. They cap the dollar amount — but never check whether what the agent claims is even true.

◉ Custodian

A constrained kernel

The control lives in Landlock + kernel egress policy. The agent literally cannot open a socket to a payment endpoint the OS hasn't allowed. A deterministic verifier checks every fact the agent asserts against ground truth, so it can't lie its way to a payout. Non-custodial, rail-agnostic, self-hosted.

How it works

One decision, four independent layers

The agent reads the messy real world and makes a recommendation. Then three deterministic, zero-AI layers get the final say — and any one of them can stop the money.

01 · AI JUDGMENT
🤖

The intelligence layer

Nemotron reads messy, unstructured customer messages and extracts structured claims — was it delivered? in the return window? defective? — assigns confidence, and proposes a disposition it has zero power to act on. Everything after that is deterministic code.

can be wrong · can lie · doesn't matter
02 · VERIFIER
🔍

Facts get checked

Every factual claim the agent made is resolved against ground truth. A claim the data refutes is flagged CONTRADICTED before anything downstream trusts it.

deterministic · zero-AI
03 · KERNEL
🛡

The kernel decides

Bands and caps decide AUTONOMOUS / ESCALATE / DENY. Over the cap requires a real human signature (Twilio Verify SMS). The agent never holds both keys.

enforced at OS level
CLAIM CHECK

The agent can lie. Money still can't move wrong. When a customer invents a story to get a refund and the AI recommends approve, the verifier catches that the claim is contradicted by the ledger and the kernel overrides the AI. No competitor can demonstrate this because their model is "agent asks, check the limit" not "agent asks, check if the agent is lying."

NemoClaw · NVIDIA OpenShell Kernel Sandbox

The AI cannot spend what the OS won't allow

NemoClaw is NVIDIA's OpenShell kernel sandbox — a Landlock LSM + OPA enforcement layer baked into the container boundary. Custodian's authority engine runs deterministically inside that sandbox. The agent literally cannot open a socket to a payment endpoint the kernel hasn't whitelisted — regardless of what the model decides.

Layer 1
🔒

Landlock LSM

Linux Security Module enforcing least-privilege file and network access at the syscall boundary. Even a compromised model cannot open a socket to an un-whitelisted endpoint. The kernel rejects it before user-space sees it.

Layer 2
📋

OPA Policy Engine

Open Policy Agent evaluates every action request against the authority band in real time. Per-action caps ($250), rolling session windows ($1,000/2 hr), and escalation thresholds are enforced as Rego rules, not application code that can be patched around.

Layer 3
📊

OCSF Audit Log

Every allow and deny emits an Open Cybersecurity Schema Framework event: tamper-evident, structured, verifiable by any SIEM. The log below is the live feed from the running sandbox right now.

Live OCSF Kernel Log from running NemoClaw sandbox · auto-refreshes
Connecting to kernel sandbox…
Watch the live console → See the lie-catch demo →
Not a mockup

Everything here is real, and live right now

A real Nous Hermes agent, in a real kernel sandbox, paying real Stripe PaymentIntents — protecting ArgoBox, a production AI infrastructure platform. These numbers are pulled live from the running system as you read this.

Autonomous budget remaining
LIVE
Real Stripe volume processed
LIVE
Real PaymentIntents created
3
Decision modules on one kernel
  • Real kernel sandbox — least-privilege egress enforced via Landlock, verified in raw OCSF allow/deny logs.
  • Real money rail — Stripe test-mode PaymentIntents you can open on Stripe's own dashboard.
  • Real human approval — escalations send a genuine Twilio Verify SMS code.
  • Rail-agnostic — the same kernel governs refunds, payables, and NVIDIA NIM job provisioning.
  • 100 governed tools — email, SMS, GitHub, Docker, web search, NVIDIA NIM, Stripe extended, and more — every call kernel-checked.
Open the live console →
Try it live

Type any refund excuse. Watch Nemotron + the kernel process it.

The AI reads it. The verifier checks every factual claim against the real order record. When the facts don't hold, the kernel overrides the AI — even if the AI said APPROVE.

Sandbox: ord_6006 · $80 · delivered · no defect · 19 days old
Full lie-catch walkthrough with all 6 corpus cases →
Plain English

How does the kernel actually stop the AI?

Think of it like a new employee at a company. They can fill out a purchase order and decide it makes sense — but they can't sign their own check. The signed check is a separate system, run by people with authority the employee doesn't have.

Custodian does the same thing for AI. The agent (Nemotron) can decide a payment makes sense. But the actual move of money goes through a second system — the kernel — that checks the amount, the session budget, and whether the agent has been tricked. The agent never holds both keys at once.

// What happens when the AI requests $180
agent → kernel: "refund $180, order #4821"
kernel: check per_action_cap... $250 ✓
kernel: check session_spent... $340 of $1000 ✓
kernel: check kill_switch... not set ✓
kernel: verify order exists... ✓
kernel: AUTONOMOUS — stripe.charge()
// What happens when the AI requests $800
agent → kernel: "approve $800, order #4822"
kernel: check per_action_cap... $250 ✗
kernel: ESCALATE → SMS to operator
// agent waits. it cannot proceed.
Why kernel-level? Because an agent running in software can, in principle, be told to bypass software-level controls. The kernel enforces egress at the OS — the agent's process literally cannot open a socket to a payment endpoint the OS hasn't allowed. A prompt can't override that. A clever argument can't override that. The model's own output can't override that.
100 Governed Tools

The kernel for every tool — not just payments

Every tool call — whether it sends an SMS, submits an NVIDIA NIM inference job, reads a GitHub PR, or posts a Slack message — passes through the same Custodian kernel before executing. One governance layer. Every tool.

L2 · Autonomous
NVIDIA NIM
Submit inference jobs to NVIDIA's hosted API. NIM costs are tracked against the session cap like any other spend.
L3 · Escalates
Stripe Extended
Subscriptions, invoice sending, payouts. Every call kernel-gated — L3 tools always require human approval via SMS.
L1 · Free
Communication
Email, SMS (Twilio), Slack, Discord, webhooks. Logged to the OCSF audit trail like every other tool.
L0 · Read-only
GitHub + Docker + Web
Issue creation, PR listing, container logs, web search, HTTP calls — all kernel-registered, all auditable.
Browse all 100 tools →
Competitive landscape

Honest comparison

Payman, Skyfire, Rain, Ramp, Catena — these are real B2B fintech companies, not hackathon projects. They have card issuance, stablecoin rails, and compliance frameworks we don't. What they don't have is the bottom three rows.

Capability Payman · Skyfire · Rain · Ramp · Catena Custodian
Spend caps · approval · audit trail✓ table stakes
Real card issuance & payment rails✓ (Ramp, Rain)✕ not our lane
Stablecoin / crypto rails✓ (Skyfire, Rain)✕ not our lane
SOC 2 / KYC compliance✓ (Payman, Catena)✕ early stage
Catches the agent lying — facts vs ground truth✕ none✓ only us
Enforcement below the agent — kernel, not API policy✕ none✓ only us
Self-hosted · non-custodial · rail-agnostic✕ they hold the funds✓ only us
Model-agnostic enforcement — swap Gemini, GPT, or a local DGX model; kernel safety properties don't change✕ coupled to their stack✓ LLMClient Protocol

Our differentiator isn't payment infrastructure — it's enforcement architecture. The kernel sits underneath whatever rails and whatever model you use. Plug in Stripe, a bank API, or a stablecoin; swap Nemotron for any other model — the enforcement model doesn't change.

See it run

90 seconds: the agent gets lied to, and the kernel wins

Watch a real agent recommend approving a fraudulent refund — and watch the deterministic kernel override it, with real Stripe IDs and an append-only audit trail.

Demo video drops before submission
In the meantime, the live console is the real thing — every number on this page is pulled from it right now.
Launch the live console instead →

Hand your agent a wallet. Keep the keys.

Money is just the first module. The same kernel governs any consequential action an AI agent can take — provisioning, payroll, data egress, infrastructure.