NVIDIA · Nous Research · Stripe · Hackathon 2026
⬡ DGX Spark GB10 🛡 NemoClaw ◉ Nemotron Super 120B ◆ Hermes stripe

Give AI a real wallet.
Without the risk.

Most businesses won't deploy AI near money because there's no way to verify it actually followed the rules. Custodian is the enforcement kernel that changes that — define limits in code, the OS enforces them, the agent cannot lie or self-approve. Proven in a live NemoClaw sandbox on a DGX Spark GB10 — and ships as a pip package.

✓ 3/3 planted lies caught · ✓ $0 unauthorized spend · ✓ 1,245 tests pass · ✓ real Stripe PI on record
Authority band
Session budget left
AI brain
Nous Hermes (Nemotron)
Live tools
102
Boundary
ENFORCED AT KERNEL
L4 locked · L3 escalate · L2 ▶ ACTIVE · L1 free · L0 read
⬡ KERNEL BOOT
BOOT initializing…
verify_kit.py — self-verifying proof
$ python3 verify_kit.py
What this unlocks

AI that handles money. Finally.

AI is finally capable enough to handle refunds, spend, and purchasing decisions — Nemotron Super 120B makes that reasoning possible. Custodian is the deployment layer that makes it production-safe: kernel-level enforcement that gives you full control without constraining what the model can do. Set your limits. The kernel enforces them. Ship.

E-commerce · Refund automation

Handle refund requests 24/7 — without the fraud risk

Your AI reviews every refund request. Custodian's claim verifier checks the customer's story against your actual order data — if they're lying, it's CONTRADICTED before the refund executes. You only see the genuine edge cases.

✓ 3 planted lies caught · 0 false positives
SaaS · Autonomous spend

Let AI spend autonomously — within the limits you set

Set a $2 per-action cap and $50 session budget. Your agent runs inference jobs, makes API calls, spins up resources — fully autonomously. The moment it hits the limit, it escalates to you via SMS and stops. No surprise bills.

✓ Kill switch: one command halts the agent instantly
Operations · Purchasing

Automate purchasing your finance team can actually approve

The self-approval problem is why finance won't sign off on AI purchasing. Custodian makes self-approval structurally impossible — the agent submits, the kernel decides, humans approve anything above the cap. The agent never holds both sides of the decision.

✓ L3 band: all payments above threshold require human sign-off
Your rules. One file.
# policy.yaml — kernel-enforced
default_band: L2      # autonomous to $2/action
per_action_cap: 2.00
session_cap: 50.00    # SMS you at $50, then stops
bands:
  L3:
    approval_backend: twilio_verify
▶ Try the live demo — no install needed
or
curl -fsSL https://getcustodian.xyz/install.sh | bash
or clone the repo and run python3 verify_kit.py — proves the security guarantee in 90 seconds, no credentials needed
Built with
Why Custodian is different

Everyone gives the agent a wallet.
We give it a kernel.

Spend caps and approval flows are commodities now. The hard problem is that AI output is probabilistic by nature — and probabilistic recommendations should never directly authorize money movement. The kernel is deterministic. It's not a limitation on the model; it's how responsible deployment at this level works.

Everyone else

A constrained wallet

The control lives in their custodial cloud. The agent reaches money by calling their SDK, and safety rests on the assumption it'll use the approved path. They cap the dollar amount — but never check whether what the agent claims is even true.

◉ Custodian

A constrained kernel

The control lives in Landlock + kernel egress policy. The agent literally cannot open a socket to a payment endpoint the OS hasn't allowed. A deterministic verifier checks every fact the agent asserts against ground truth, so it can't lie its way to a payout. Non-custodial, rail-agnostic, self-hosted.

How it works

One decision, four independent layers

The agent reads the messy real world and makes a recommendation. Then three deterministic, zero-AI layers get the final say — and any one of them can stop the money.

01 · AI JUDGMENT
🤖

The intelligence layer

We chose Nemotron for exactly this — reading messy, unstructured customer messages and extracting structured claims: was it delivered? in the return window? defective? It assigns confidence and proposes a disposition it has zero power to act on. Everything after that is deterministic code.

best-in-class NLU · sandboxed from final execution
02 · VERIFIER
🔍

Facts get checked

Every factual claim the agent made is resolved against ground truth. A claim the data refutes is flagged CONTRADICTED before anything downstream trusts it.

deterministic · zero-AI
03 · KERNEL
🛡

The kernel decides

Bands and caps decide AUTONOMOUS / ESCALATE / DENY. Over the cap requires a real human signature (Twilio Verify SMS). The agent never holds both keys.

enforced at OS level
CLAIM CHECK

A customer's claim may be fabricated. Money still can't move wrong. When a customer invents a story to get a refund and Nemotron recommends approve, the verifier checks every extracted claim against your actual order data — contradicted claims are blocked before anything executes. No competitor can demonstrate this because their approach is "agent asks, check the limit" not "verify every claim the agent extracted before it moves money."

NemoClaw · NVIDIA OpenShell Kernel Sandbox

The AI cannot spend what the OS won't allow

NemoClaw is NVIDIA's OpenShell kernel sandbox — a Landlock LSM + OPA enforcement layer baked into the container boundary. Custodian's authority engine runs deterministically inside that sandbox. The agent literally cannot open a socket to a payment endpoint the kernel hasn't whitelisted — regardless of what the model decides.

Layer 1
🔒

Landlock LSM

Linux Security Module enforcing least-privilege file and network access at the syscall boundary. Even a compromised model cannot open a socket to an un-whitelisted endpoint. The kernel rejects it before user-space sees it.

Layer 2
📋

OPA Policy Engine

Open Policy Agent evaluates every action request against the authority band in real time. Per-action caps ($250), rolling session windows ($1,000/2 hr), and escalation thresholds are enforced as Rego rules, not application code that can be patched around.

Layer 3
📊

OCSF Audit Log

Every allow and deny emits an Open Cybersecurity Schema Framework event: tamper-evident, structured, verifiable by any SIEM. The log below is the live feed from the running sandbox right now.

Live OCSF Kernel Log from running NemoClaw sandbox · auto-refreshes
Connecting to kernel sandbox…
Watch the live console → See the lie-catch demo →
Not a mockup

Everything here is real, and live right now

A real Nous Hermes agent, in a real kernel sandbox, making real Stripe API calls in test mode — protecting ArgoBox, a production AI infrastructure platform. The enforcement is production-grade. The Stripe transactions are test-mode so no real money moves during the demo. These numbers are pulled live from the running system as you read this.

Autonomous budget remaining
LIVE
Cases governed by kernel
LIVE
Stripe test-mode PaymentIntents created
3/3
Planted lies caught (0 false positives)
1,163
Passing tests — run them yourself
  • Real kernel sandbox — least-privilege egress enforced via Landlock, verified in raw OCSF allow/deny logs.
  • Real money rail — Stripe test-mode PaymentIntents you can open on Stripe's own dashboard.
  • Real human approval — escalations send a genuine Twilio Verify SMS code.
  • Rail-agnostic — the same kernel governs refunds, payables, and NVIDIA NIM job provisioning.
  • 102 tools registered — 41 active out of the box (monitoring, security, web, storage), 61 more with your own API keys. Every call kernel-checked regardless.
Open the live console →
Try it live

Type any request. Watch Nemotron + the kernel process it.

The AI reads it. The verifier checks every factual claim against the real ledger. When the facts don't hold, the kernel overrides the AI — even if the AI said APPROVE.

Sandbox: ord_6006 · $80 · delivered · no defect · 19 days old
Full lie-catch walkthrough with all corpus cases →
Plain English

How does the kernel actually stop the AI?

Think of it like a new employee at a company. They can fill out a purchase order and decide it makes sense — but they can't sign their own check. The signed check is a separate system, run by people with authority the employee doesn't have.

Custodian does the same thing for AI. The agent (Nemotron) can decide a payment makes sense. But the actual move of money goes through a second system — the kernel — that checks the amount, the session budget, and whether the agent has been tricked. The agent never holds both keys at once.

// What happens when the AI requests $180
agent → kernel: "refund $180, order #4821"
kernel: check per_action_cap... $250 ✓
kernel: check session_spent... $340 of $1000 ✓
kernel: check kill_switch... not set ✓
kernel: verify order exists... ✓
kernel: AUTONOMOUS — stripe.charge()
// What happens when the AI requests $800
agent → kernel: "approve $800, order #4822"
kernel: check per_action_cap... $250 ✗
kernel: ESCALATE → SMS to operator
// agent waits. it cannot proceed.
Why kernel-level? Because an agent running in software can, in principle, be told to bypass software-level controls. The kernel enforces egress at the OS — the agent's process literally cannot open a socket to a payment endpoint the OS hasn't allowed. A prompt can't override that. A clever argument can't override that. The model's own output can't override that.
102 Tools · 41 active out of the box

The kernel for every tool — not just payments

Every tool call — whether it sends an SMS, submits an NVIDIA NIM inference job, reads a GitHub PR, or posts a Slack message — passes through the same Custodian kernel before executing. One governance layer. Every tool.

41 tools work immediately with no credentials — monitoring, security, web, file, and local storage tools. 61 additional tools activate when you add your own API keys (Stripe, NVIDIA, GitHub, AWS, and more).

L2 · Autonomous
NVIDIA NIM
Submit inference jobs to NVIDIA's hosted API. NIM costs are tracked against the session cap like any other spend.
L3 · Escalates
Stripe Extended
Subscriptions, invoice sending, payouts. Every call kernel-gated — L3 tools always require human approval via SMS.
L1 · Free
Communication
Email, SMS (Twilio), Slack, Discord, webhooks. Logged to the OCSF audit trail like every other tool.
L0 · Read-only
GitHub + Docker + Web
Issue creation, PR listing, container logs, web search, HTTP calls — all kernel-registered, all auditable.
Browse all 102 tools registered →
Open Source · MIT Licensed

Add the kernel to your agent in 60 seconds

One pip install. Drop decide() in front of any consequential action. The rest is policy.

INSTALL
pip install custodian-kernel
OR FROM SOURCE (contributors)
git clone https://github.com/KeyArgo/custodian-kernel
cd custodian-kernel
pip install -e ".[dev]"
PyPI ↗ GitHub ↗
USAGE — 3 LINES
from custodian import decide

result = decide(
    request="refund $200 to customer",
    state=session_state,
    policy=policy_config,
)
if result.band == "L0": run()
elif result.band == "L2": ask_human()
1,163 tests · 0 failures · 102 bundled skills
Run python3 verify_kit.py to prove it yourself — 90 seconds, no credentials
Or try: pip install custodian-kernel && custodian demo verify
Don't take this on faith

Verify it yourself in 90 seconds

Everything runs from pip — no cloning private repos, no credentials, no staging environment. Install the kernel, run four commands, and every security claim checks out against the live system.

Step 1 — Install
pip install custodian-kernel
PyPI package. No secrets. Works on any Python 3.10+ machine.
Step 2 — Live claim check
custodian demo verify
4 live cases. CONTRADICTED caught in real-time against the running system.
Step 3 — Test suite
pip install "custodian-kernel[dev]"
pytest tests/
1,163 tests. 0 failures. Includes self-approval regression proof.
Step 4 — Read the source
git clone https://github.com/KeyArgo/custodian-kernel
Every line public. No secrets committed. Real commit history.

This is the only hackathon entry where a judge can verify every security claim from a single pip install.

Competitive landscape

Honest comparison

Payman, Skyfire, Rain, Ramp, Catena — real fintech companies with card issuance, stablecoin rails, and compliance teams. They serve businesses that need payment infrastructure. Custodian serves a different problem: businesses that want to deploy AI agents that touch money but need to prove those agents can't go rogue. The bottom three rows are why we exist.

Capability Payman · Skyfire · Rain · Ramp · Catena Custodian
Spend caps · approval · audit trail✓ table stakes
Real card issuance & payment rails✓ (Ramp, Rain)✕ not our lane
Stablecoin / crypto rails✓ (Skyfire, Rain)✕ not our lane
SOC 2 / KYC compliance✓ (Payman, Catena)✕ early stage
Catches the agent lying — facts vs ground truth✕ none✓ only us
Enforcement below the agent — kernel, not API policy✕ none✓ only us
Self-hosted · non-custodial · rail-agnostic✕ they hold the funds✓ only us
Model-agnostic enforcement — swap Gemini, GPT, or a local DGX model; kernel safety properties don't change✕ coupled to their stack✓ LLMClient Protocol

Our differentiator isn't payment infrastructure — it's enforcement architecture. The kernel sits underneath whatever rails and whatever model you use. Plug in Stripe, a bank API, or a stablecoin; swap Nemotron for any other model — the enforcement model doesn't change.

Enforcement Architecture

Separate the thing that thinks from the thing that decides.

The core pattern: inference and enforcement run on different machines. The AI model recommends. The kernel decides. Those two processes have no network path between them — the agent cannot reach the enforcement layer, talk its way past it, or instruct it directly. This separation is what makes the guarantee structural rather than assumed.

The kernel runs on whatever you have. A $20/mo cloud VM. Your existing server. A laptop. The same pip install custodian-kernel that runs on a Raspberry Pi also runs on our DGX Spark GB10 — which is exactly how we know it handles anything you throw at it.

Inference Layer
Any AI Model
Cloud API or local · swap freely
Nemotron, GPT-4, Claude, Llama
The kernel doesn't care which model
● Makes recommendations only
Cannot touch money or take action directly
request only
no return path
Your Server
Enforcement Layer
Custodian Kernel
OS-level enforcement · any Linux host
Runs on a VPS, bare metal, or on-device
Config-driven policy · audit log append-only
● Agent cannot reach this layer
● Every spend decision runs here
pip install custodian-kernel
Our reference deployment: NVIDIA DGX Spark GB10 — We run the enforcement kernel on a DGX Spark (Grace Blackwell, 128 GB unified memory) because it's the hardware we have — and because running it there proves it handles any load you can throw at it. If the kernel holds on a supercomputer under real production traffic, it holds on your server. The redundancy layer automatically fails over to a secondary node if the primary is unreachable — same policy, same audit trail, zero downtime.
0
network paths from agent to kernel
any
Linux host — VPS to supercomputer
1
command to deploy the kernel
2-layer
enforcement redundancy
See it run

90 seconds: the agent gets lied to, and the kernel wins

Watch a real agent recommend approving a fraudulent refund — and watch the deterministic kernel override it, with real Stripe IDs and an append-only audit trail.

Demo video drops before submission
In the meantime, the live console is the real thing — every number on this page is pulled from it right now.
Launch the live console instead →
Powered by NVIDIA
DGX Spark GB10
Grace Blackwell · 128 GB unified memory
Primary enforcement trust anchor
NemoClaw
NVIDIA OpenShell kernel sandbox
Linux Landlock LSM · OPA enforcement
Nemotron Super 120B
120B · 1M context · agentic reasoning
Fast enough for real-time decisions
Three layers of NVIDIA technology. One enforcement boundary the agent cannot cross.
STAY IN THE LOOP

Get updates as we ship

Early access, new enforcement packs, and architecture notes — no spam.

Hand your agent a wallet. Keep the keys.

Money is just the first module. The same kernel governs any consequential action an AI agent can take — provisioning, payroll, data egress, infrastructure.