NVIDIA · Nous Research · Stripe · Hackathon 2026
⬡ DGX Spark GB10 🛡 NemoClaw ◉ Nemotron Super 120B ◆ Hermes stripe

Let AI handle refunds, spend, and purchasing.
Without the risk.

Most businesses won't deploy AI near money because there's no way to verify it actually followed the rules. Custodian is the enforcement kernel that changes that — set limits in a YAML file, the OS enforces them, the agent cannot exceed them, lie about them, or approve its own escalation. This is proven, not promised. A Nous Hermes agent on NVIDIA Nemotron Super 120B runs inside a NemoClaw kernel sandbox on a DGX Spark GB10 — and the same enforcement ships as a pip package you can run anywhere.

HERMES AGENT
Nemotron Super 120B
ROLE REQUESTS ONLY
+
NEMOCLAW KERNEL
Custodian Authority
PER-ACTION CAP $250
LIMITS ENFORCED BELOW THE AGENT · OS-LEVEL · NOT A PROMPT · NOT A PROMISE
✓ 3/3 planted lies caught · ✓ $0 unauthorized spend · ✓ 1,163 tests pass · ✓ real Stripe PI on record
Authority band
Session budget left
AI brain
Nous Hermes (Nemotron)
Decision modules
3
Live tools
102
Boundary
ENFORCED AT KERNEL
verify_kit.py — self-verifying proof
$ python3 verify_kit.py
What this unlocks

AI that handles money. Finally.

You've been holding back on AI because you can't trust it with refunds, spend, or purchasing decisions. Custodian is the missing piece — a kernel-level enforcement layer that makes deploying AI near money actually viable. Set your rules. The kernel enforces them. Ship.

E-commerce · Refund automation

Handle refund requests 24/7 — without the fraud risk

Your AI reviews every refund request. Custodian's claim verifier checks the customer's story against your actual order data — if they're lying, it's CONTRADICTED before the refund executes. You only see the genuine edge cases.

✓ 3 planted lies caught · 0 false positives
SaaS · Autonomous spend

Let AI spend autonomously — within the limits you set

Set a $2 per-action cap and $50 session budget. Your agent runs inference jobs, makes API calls, spins up resources — fully autonomously. The moment it hits the limit, it escalates to you via SMS and stops. No surprise bills.

✓ Kill switch: one command halts the agent instantly
Operations · Purchasing

Automate purchasing your finance team can actually approve

The self-approval problem is why finance won't sign off on AI purchasing. Custodian makes self-approval structurally impossible — the agent submits, the kernel decides, humans approve anything above the cap. The agent never holds both sides of the decision.

✓ L3 band: all payments above threshold require human sign-off
Your rules. One file.
# policy.yaml — kernel-enforced
default_band: L2      # autonomous to $2/action
per_action_cap: 2.00
session_cap: 50.00    # SMS you at $50, then stops
bands:
  L3:
    approval_backend: twilio_verify
▶ Try the live demo — no install needed
or
curl -fsSL https://getcustodian.xyz/install.sh | bash
or clone and run python3 verify_kit.py — proves the security guarantee in 90 seconds
Built with
Why Custodian is different

Everyone gives the agent a wallet.
We give it a kernel.

Spend caps and approval flows are commodities now. The hard problem isn't limiting a number — it's that the agent can be wrong, or can lie, and that it shouldn't be trusted to route money through an approved path in the first place.

Everyone else

A constrained wallet

The control lives in their custodial cloud. The agent reaches money by calling their SDK, and safety rests on the assumption it'll use the approved path. They cap the dollar amount — but never check whether what the agent claims is even true.

◉ Custodian

A constrained kernel

The control lives in Landlock + kernel egress policy. The agent literally cannot open a socket to a payment endpoint the OS hasn't allowed. A deterministic verifier checks every fact the agent asserts against ground truth, so it can't lie its way to a payout. Non-custodial, rail-agnostic, self-hosted.

How it works

One decision, four independent layers

The agent reads the messy real world and makes a recommendation. Then three deterministic, zero-AI layers get the final say — and any one of them can stop the money.

01 · AI JUDGMENT
🤖

The intelligence layer

Nemotron reads messy, unstructured customer messages and extracts structured claims — was it delivered? in the return window? defective? — assigns confidence, and proposes a disposition it has zero power to act on. Everything after that is deterministic code.

can be wrong · can lie · doesn't matter
02 · VERIFIER
🔍

Facts get checked

Every factual claim the agent made is resolved against ground truth. A claim the data refutes is flagged CONTRADICTED before anything downstream trusts it.

deterministic · zero-AI
03 · KERNEL
🛡

The kernel decides

Bands and caps decide AUTONOMOUS / ESCALATE / DENY. Over the cap requires a real human signature (Twilio Verify SMS). The agent never holds both keys.

enforced at OS level
CLAIM CHECK

The agent can lie. Money still can't move wrong. When a customer invents a story to get a refund and the AI recommends approve, the verifier catches that the claim is contradicted by the ledger and the kernel overrides the AI. No competitor can demonstrate this because their model is "agent asks, check the limit" not "agent asks, check if the agent is lying."

NemoClaw · NVIDIA OpenShell Kernel Sandbox

The AI cannot spend what the OS won't allow

NemoClaw is NVIDIA's OpenShell kernel sandbox — a Landlock LSM + OPA enforcement layer baked into the container boundary. Custodian's authority engine runs deterministically inside that sandbox. The agent literally cannot open a socket to a payment endpoint the kernel hasn't whitelisted — regardless of what the model decides.

Layer 1
🔒

Landlock LSM

Linux Security Module enforcing least-privilege file and network access at the syscall boundary. Even a compromised model cannot open a socket to an un-whitelisted endpoint. The kernel rejects it before user-space sees it.

Layer 2
📋

OPA Policy Engine

Open Policy Agent evaluates every action request against the authority band in real time. Per-action caps ($250), rolling session windows ($1,000/2 hr), and escalation thresholds are enforced as Rego rules, not application code that can be patched around.

Layer 3
📊

OCSF Audit Log

Every allow and deny emits an Open Cybersecurity Schema Framework event: tamper-evident, structured, verifiable by any SIEM. The log below is the live feed from the running sandbox right now.

Live OCSF Kernel Log from running NemoClaw sandbox · auto-refreshes
Connecting to kernel sandbox…
Watch the live console → See the lie-catch demo →
Not a mockup

Everything here is real, and live right now

A real Nous Hermes agent, in a real kernel sandbox, making real Stripe API calls in test mode — protecting ArgoBox, a production AI infrastructure platform. The enforcement is production-grade. The Stripe transactions are test-mode so no real money moves during the demo. These numbers are pulled live from the running system as you read this.

Autonomous budget remaining
LIVE
Cases governed by kernel
LIVE
Stripe test-mode PaymentIntents created
3/3
Planted lies caught (0 false positives)
1,163
Passing tests — run them yourself
  • Real kernel sandbox — least-privilege egress enforced via Landlock, verified in raw OCSF allow/deny logs.
  • Real money rail — Stripe test-mode PaymentIntents you can open on Stripe's own dashboard.
  • Real human approval — escalations send a genuine Twilio Verify SMS code.
  • Rail-agnostic — the same kernel governs refunds, payables, and NVIDIA NIM job provisioning.
  • 102 tools registered — 41 active out of the box (monitoring, security, web, storage), 61 more with your own API keys. Every call kernel-checked regardless.
Open the live console →
Try it live

Type any refund excuse. Watch Nemotron + the kernel process it.

The AI reads it. The verifier checks every factual claim against the real order record. When the facts don't hold, the kernel overrides the AI — even if the AI said APPROVE.

Sandbox: ord_6006 · $80 · delivered · no defect · 19 days old
Full lie-catch walkthrough with all 6 corpus cases →
Plain English

How does the kernel actually stop the AI?

Think of it like a new employee at a company. They can fill out a purchase order and decide it makes sense — but they can't sign their own check. The signed check is a separate system, run by people with authority the employee doesn't have.

Custodian does the same thing for AI. The agent (Nemotron) can decide a payment makes sense. But the actual move of money goes through a second system — the kernel — that checks the amount, the session budget, and whether the agent has been tricked. The agent never holds both keys at once.

// What happens when the AI requests $180
agent → kernel: "refund $180, order #4821"
kernel: check per_action_cap... $250 ✓
kernel: check session_spent... $340 of $1000 ✓
kernel: check kill_switch... not set ✓
kernel: verify order exists... ✓
kernel: AUTONOMOUS — stripe.charge()
// What happens when the AI requests $800
agent → kernel: "approve $800, order #4822"
kernel: check per_action_cap... $250 ✗
kernel: ESCALATE → SMS to operator
// agent waits. it cannot proceed.
Why kernel-level? Because an agent running in software can, in principle, be told to bypass software-level controls. The kernel enforces egress at the OS — the agent's process literally cannot open a socket to a payment endpoint the OS hasn't allowed. A prompt can't override that. A clever argument can't override that. The model's own output can't override that.
102 Tools · 41 active out of the box

The kernel for every tool — not just payments

Every tool call — whether it sends an SMS, submits an NVIDIA NIM inference job, reads a GitHub PR, or posts a Slack message — passes through the same Custodian kernel before executing. One governance layer. Every tool.

41 tools work immediately with no credentials — monitoring, security, web, file, and local storage tools. 61 additional tools activate when you add your own API keys (Stripe, NVIDIA, GitHub, AWS, and more).

L2 · Autonomous
NVIDIA NIM
Submit inference jobs to NVIDIA's hosted API. NIM costs are tracked against the session cap like any other spend.
L3 · Escalates
Stripe Extended
Subscriptions, invoice sending, payouts. Every call kernel-gated — L3 tools always require human approval via SMS.
L1 · Free
Communication
Email, SMS (Twilio), Slack, Discord, webhooks. Logged to the OCSF audit trail like every other tool.
L0 · Read-only
GitHub + Docker + Web
Issue creation, PR listing, container logs, web search, HTTP calls — all kernel-registered, all auditable.
Browse all 102 tools registered →
Open Source · MIT Licensed

Add the kernel to your agent in 60 seconds

One pip install. Drop decide() in front of any consequential action. The rest is policy.

INSTALL
pip install custodian-kernel
OR FROM SOURCE
git clone https://github.com/KeyArgo/custodian-kernel
cd custodian-kernel && pip install -e .
PyPI ↗ GitHub ↗
USAGE — 3 LINES
from custodian import decide

result = decide(
    request="refund $200 to customer",
    state=session_state,
    policy=policy_config,
)
if result.band == "L0": run()
elif result.band == "L2": ask_human()
1,163 tests · 0 failures · 102 bundled skills
Run python3 verify_kit.py to prove it yourself — 90 seconds, no credentials
Or try: pip install custodian-kernel && custodian demo-verify
Don't take this on faith

Verify it yourself in 90 seconds

Everything runs from pip — no cloning private repos, no credentials, no staging environment. Install the kernel, run four commands, and every security claim checks out against the live system.

Step 1 — Install
pip install custodian-kernel
PyPI package. No secrets. Works on any Python 3.10+ machine.
Step 2 — Live claim check
custodian demo-verify
4 live cases. CONTRADICTED caught in real-time against the running system.
Step 3 — Test suite
pip install custodian-kernel[dev]
pytest tests/
1,163 tests. 0 failures. Includes self-approval regression proof.
Step 4 — Read the source
git clone https://github.com/KeyArgo/custodian-kernel
Every line public. No secrets committed. Real commit history.

This is the only hackathon entry where a judge can verify every security claim from a single pip install.

Competitive landscape

Honest comparison

Payman, Skyfire, Rain, Ramp, Catena — real fintech companies with card issuance, stablecoin rails, and compliance teams. They serve businesses that need payment infrastructure. Custodian serves a different problem: businesses that want to deploy AI agents that touch money but need to prove those agents can't go rogue. The bottom three rows are why we exist.

Capability Payman · Skyfire · Rain · Ramp · Catena Custodian
Spend caps · approval · audit trail✓ table stakes
Real card issuance & payment rails✓ (Ramp, Rain)✕ not our lane
Stablecoin / crypto rails✓ (Skyfire, Rain)✕ not our lane
SOC 2 / KYC compliance✓ (Payman, Catena)✕ early stage
Catches the agent lying — facts vs ground truth✕ none✓ only us
Enforcement below the agent — kernel, not API policy✕ none✓ only us
Self-hosted · non-custodial · rail-agnostic✕ they hold the funds✓ only us
Model-agnostic enforcement — swap Gemini, GPT, or a local DGX model; kernel safety properties don't change✕ coupled to their stack✓ LLMClient Protocol

Our differentiator isn't payment infrastructure — it's enforcement architecture. The kernel sits underneath whatever rails and whatever model you use. Plug in Stripe, a bank API, or a stablecoin; swap Nemotron for any other model — the enforcement model doesn't change.

Hardware Architecture · Powered by NVIDIA

The agent thinks in the cloud. The law runs on NVIDIA hardware.

Custodian physically separates inference from enforcement. The inference model thinks. The enforcement kernel decides. Those two processes run on separate machines — the agent has no network path to the enforcement layer. The Custodian deployment here runs its enforcement kernel on a DGX Spark GB10 as proof-of-concept; the kernel itself is hardware-agnostic and ships as pip install custodian-kernel.

Inference Layer
Nemotron Super 120B
Reasoning model · 1M context
Fast enough to run at agent speed
Smart enough to catch lies · NVIDIA cloud API
● Model makes recommendations
Agent calls here
Tailscale
one-way
On-Device
Enforcement Layer
Custodian Kernel · DGX Spark GB10
NemoClaw sandbox · OS-level enforcement
Enforcement decisions on NVIDIA hardware
Grace Blackwell · 128 GB unified memory
● Agent cannot reach this layer
● Every decision runs here
Kernel enforces · Spark executes
🔄
Enforcement redundancy — DGX Spark GB10 is the primary trust anchor. If the local network is unreachable, enforcement fails over to a secondary node automatically — same kernel, same policy, same audit trail. The enforcement layer never goes dark.
128 GB
DGX Spark unified memory
1M
Nemotron context window
0
network paths from agent to kernel
2-layer
enforcement redundancy
See it run

90 seconds: the agent gets lied to, and the kernel wins

Watch a real agent recommend approving a fraudulent refund — and watch the deterministic kernel override it, with real Stripe IDs and an append-only audit trail.

Demo video drops before submission
In the meantime, the live console is the real thing — every number on this page is pulled from it right now.
Launch the live console instead →
Powered by NVIDIA
DGX Spark GB10
Grace Blackwell · 128 GB unified memory
Primary enforcement trust anchor
NemoClaw
NVIDIA OpenShell kernel sandbox
Linux Landlock LSM · OPA enforcement
Nemotron Super 120B
120B · 1M context · agentic reasoning
Fast enough for real-time decisions
Three layers of NVIDIA technology. One enforcement boundary the agent cannot cross.
STAY IN THE LOOP

Get updates as we ship

Early access, new enforcement packs, and architecture notes — no spam.

Hand your agent a wallet. Keep the keys.

Money is just the first module. The same kernel governs any consequential action an AI agent can take — provisioning, payroll, data egress, infrastructure.