Most businesses won't deploy AI near money because there's no way to verify it actually followed the rules. Custodian is the enforcement kernel that changes that — define limits in code, the OS enforces them, the agent cannot lie or self-approve. Every action produces a SHA-256 fingerprinted receipt — tamper with it, the kernel knows. Deployed with the kernel on a separate physical host, enforcing in real time. Ships as a pip package.
Need the fast version? Start in the console and let the AI walk you through what Custodian is, then hand you into the operator flow when it matters.
It won’t auto-cover the page. Open Console if you want the guided tour, or keep browsing and ask questions only when you need them.
$ python3 verify_kit.py
AI is finally capable enough to handle refunds, spend, and purchasing decisions — Nemotron Super 120B makes that reasoning possible. Custodian is the deployment layer that makes it production-safe: kernel-level enforcement that gives you full control without constraining what the model can do. Set your limits. The kernel enforces them. Ship.
Your AI reviews every refund request. Custodian's claim verifier checks the customer's story against your actual order data — if they're lying, it's CONTRADICTED before the refund executes. You only see the genuine edge cases.
Set a $2 per-action cap and $50 session budget. Your agent runs inference jobs, makes API calls, spins up resources — fully autonomously. The moment it hits the limit, it escalates to you via SMS and stops. No surprise bills.
The self-approval problem is why finance won't sign off on AI purchasing. Custodian makes self-approval structurally impossible — the agent submits, the kernel decides, humans approve anything above the cap. The agent never holds both sides of the decision.
# policy.yaml — kernel-enforced default_band: L2 # autonomous to $2/action per_action_cap: 2.00 session_cap: 50.00 # SMS you at $50, then stops bands: L3: approval_backend: twilio_verify
curl -fsSL https://getcustodian.xyz/install.sh | bash
python3 verify_kit.py — proves the security guarantee in 90 seconds, no credentials neededSpend caps and approval flows are commodities now. The hard problem is that AI output is probabilistic by nature — and probabilistic recommendations should never directly authorize money movement. The kernel is deterministic. It's not a limitation on the model; it's how responsible deployment at this level works.
The control lives in their custodial cloud. The agent reaches money by calling their SDK, and safety rests on the assumption it'll use the approved path. They cap the dollar amount — but never check whether what the agent claims is even true.
The control lives in Landlock + kernel egress policy. The agent literally cannot open a socket to a payment endpoint the OS hasn't allowed. A deterministic verifier checks every fact the agent asserts against ground truth, so it can't lie its way to a payout. Non-custodial, rail-agnostic, self-hosted.
The agent reads the messy real world and makes a recommendation. Then three deterministic, zero-AI layers get the final say — and any one of them can stop the money.
We chose Nemotron for exactly this — reading messy, unstructured customer messages and extracting structured claims: was it delivered? in the return window? defective? It assigns confidence and proposes a disposition it has zero power to act on. Everything after that is deterministic code.
best-in-class NLU · sandboxed from final executionEvery factual claim the agent made is resolved against ground truth. A claim the data refutes is flagged CONTRADICTED before anything downstream trusts it.
deterministic · zero-AIBands and caps decide AUTONOMOUS / ESCALATE / DENY. Over the cap requires a real human signature (Twilio Verify SMS). The agent never holds both keys.
enforced at OS levelA customer's claim may be fabricated. Money still can't move wrong. When a customer invents a story to get a refund and Nemotron recommends approve, the verifier checks every extracted claim against your actual order data — contradicted claims are blocked before anything executes. No competitor can demonstrate this because their approach is "agent asks, check the limit" not "verify every claim the agent extracted before it moves money."
NemoClaw is NVIDIA's OpenShell kernel sandbox — a Landlock LSM + OPA enforcement layer baked into the container boundary. Custodian's authority engine runs deterministically inside that sandbox. The agent literally cannot open a socket to a payment endpoint the kernel hasn't whitelisted — regardless of what the model decides.
Linux Security Module enforcing least-privilege file and network access at the syscall boundary. Even a compromised model cannot open a socket to an un-whitelisted endpoint. The kernel rejects it before user-space sees it.
Open Policy Agent evaluates every action request against the authority band in real time. Per-action caps ($250), rolling session windows ($1,000/2 hr), and escalation thresholds are enforced as Rego rules, not application code that can be patched around.
Every allow and deny emits an Open Cybersecurity Schema Framework event: tamper-evident, structured, verifiable by any SIEM. The log below is the live feed from the running sandbox right now.
A real Nous Hermes agent, in a real kernel sandbox, making real Stripe API calls in test mode — protecting ArgoBox, a production AI infrastructure platform. The enforcement is production-grade. The Stripe transactions are test-mode so no real money moves during the demo. These numbers are pulled live from the running system as you read this.
The AI reads it. The verifier checks every factual claim against the real ledger. When the facts don't hold, the kernel overrides the AI — even if the AI said APPROVE.
Think of it like a new employee at a company. They can fill out a purchase order and decide it makes sense — but they can't sign their own check. The signed check is a separate system, run by people with authority the employee doesn't have.
Custodian does the same thing for AI. The agent (Nemotron) can decide a payment makes sense. But the actual move of money goes through a second system — the kernel — that checks the amount, the session budget, and whether the agent has been tricked. The agent never holds both keys at once.
Every tool call — whether it sends an SMS, submits an NVIDIA NIM inference job, reads a GitHub PR, or posts a Slack message — passes through the same Custodian kernel before executing. One governance layer. Every tool.
41 tools work immediately with no credentials — monitoring, security, web, file, and local storage tools. 61 additional tools activate when you add your own API keys (Stripe, NVIDIA, GitHub, AWS, and more).
One pip install. Drop decide() in front of any consequential action. The rest is policy.
pip install custodian-kernel
git clone https://github.com/KeyArgo/custodian-kernel cd custodian-kernel pip install -e ".[dev]"
from custodian import decide result = decide( request="refund $200 to customer", state=session_state, policy=policy_config, ) if result.band == "L0": run() elif result.band == "L2": ask_human()
python3 verify_kit.py to prove it yourself — 90 seconds, no credentials
pip install custodian-kernel && custodian demo verify
Everything runs from pip — no cloning private repos, no credentials, no staging environment. Install the kernel, run four commands, and every security claim checks out against the live system.
pip install custodian-kernel
custodian demo verify
pip install "custodian-kernel[dev]"
pytest tests/
git clone https://github.com/KeyArgo/custodian-kernel
This is the only hackathon entry where a judge can verify every security claim from a single pip install.
Payman, Skyfire, Rain, Ramp, Catena — real fintech companies with card issuance, stablecoin rails, and compliance teams. They serve businesses that need payment infrastructure. Custodian serves a different problem: businesses that want to deploy AI agents that touch money but need to prove those agents can't go rogue. The bottom three rows are why we exist.
| Capability | Payman · Skyfire · Rain · Ramp · Catena | Custodian |
|---|---|---|
| Spend caps · approval · audit trail | ✓ table stakes | ✓ |
| Real card issuance & payment rails | ✓ (Ramp, Rain) | ✕ not our lane |
| Stablecoin / crypto rails | ✓ (Skyfire, Rain) | ✕ not our lane |
| SOC 2 / KYC compliance | ✓ (Payman, Catena) | ✕ early stage |
| Catches the agent lying — facts vs ground truth | ✕ none | ✓ only us |
| Enforcement below the agent — kernel, not API policy | ✕ none | ✓ only us |
| Self-hosted · non-custodial · rail-agnostic | ✕ they hold the funds | ✓ only us |
| Model-agnostic enforcement — swap Gemini, GPT, or a local DGX model; kernel safety properties don't change | ✕ coupled to their stack | ✓ LLMClient Protocol |
Our differentiator isn't payment infrastructure — it's enforcement architecture. The kernel sits underneath whatever rails and whatever model you use. Plug in Stripe, a bank API, or a stablecoin; swap Nemotron for any other model — the enforcement model doesn't change.
The core pattern: inference and enforcement run on different machines. The AI model recommends. The kernel decides. Those two processes have no network path between them — the agent cannot reach the enforcement layer, talk its way past it, or instruct it directly. This separation is what makes the guarantee structural rather than assumed.
The kernel runs on whatever you have. A $20/mo cloud VM. Your existing server. A laptop. The same pip install custodian-kernel that runs on a Raspberry Pi also runs on our DGX Spark GB10 — which is exactly how we know it handles anything you throw at it.
Watch a real agent recommend approving a fraudulent refund — and watch the deterministic kernel override it, with real Stripe IDs and an append-only audit trail.
Early access, new enforcement packs, and architecture notes — no spam.
Money is just the first module. The same kernel governs any consequential action an AI agent can take — provisioning, payroll, data egress, infrastructure.