Most businesses won't deploy AI near money because there's no way to verify it actually followed the rules. Custodian is the enforcement kernel that changes that — set limits in a YAML file, the OS enforces them, the agent cannot exceed them, lie about them, or approve its own escalation. This is proven, not promised. A Nous Hermes agent on NVIDIA Nemotron Super 120B runs inside a NemoClaw kernel sandbox on a DGX Spark GB10 — and the same enforcement ships as a pip package you can run anywhere.
You've been holding back on AI because you can't trust it with refunds, spend, or purchasing decisions. Custodian is the missing piece — a kernel-level enforcement layer that makes deploying AI near money actually viable. Set your rules. The kernel enforces them. Ship.
Your AI reviews every refund request. Custodian's claim verifier checks the customer's story against your actual order data — if they're lying, it's CONTRADICTED before the refund executes. You only see the genuine edge cases.
Set a $2 per-action cap and $50 session budget. Your agent runs inference jobs, makes API calls, spins up resources — fully autonomously. The moment it hits the limit, it escalates to you via SMS and stops. No surprise bills.
The self-approval problem is why finance won't sign off on AI purchasing. Custodian makes self-approval structurally impossible — the agent submits, the kernel decides, humans approve anything above the cap. The agent never holds both sides of the decision.
# policy.yaml — kernel-enforced default_band: L2 # autonomous to $2/action per_action_cap: 2.00 session_cap: 50.00 # SMS you at $50, then stops bands: L3: approval_backend: twilio_verify
curl -fsSL https://getcustodian.xyz/install.sh | bash
python3 verify_kit.py — proves the security guarantee in 90 secondsSpend caps and approval flows are commodities now. The hard problem isn't limiting a number — it's that the agent can be wrong, or can lie, and that it shouldn't be trusted to route money through an approved path in the first place.
The control lives in their custodial cloud. The agent reaches money by calling their SDK, and safety rests on the assumption it'll use the approved path. They cap the dollar amount — but never check whether what the agent claims is even true.
The control lives in Landlock + kernel egress policy. The agent literally cannot open a socket to a payment endpoint the OS hasn't allowed. A deterministic verifier checks every fact the agent asserts against ground truth, so it can't lie its way to a payout. Non-custodial, rail-agnostic, self-hosted.
The agent reads the messy real world and makes a recommendation. Then three deterministic, zero-AI layers get the final say — and any one of them can stop the money.
Nemotron reads messy, unstructured customer messages and extracts structured claims — was it delivered? in the return window? defective? — assigns confidence, and proposes a disposition it has zero power to act on. Everything after that is deterministic code.
can be wrong · can lie · doesn't matterEvery factual claim the agent made is resolved against ground truth. A claim the data refutes is flagged CONTRADICTED before anything downstream trusts it.
deterministic · zero-AIBands and caps decide AUTONOMOUS / ESCALATE / DENY. Over the cap requires a real human signature (Twilio Verify SMS). The agent never holds both keys.
enforced at OS levelThe agent can lie. Money still can't move wrong. When a customer invents a story to get a refund and the AI recommends approve, the verifier catches that the claim is contradicted by the ledger and the kernel overrides the AI. No competitor can demonstrate this because their model is "agent asks, check the limit" not "agent asks, check if the agent is lying."
NemoClaw is NVIDIA's OpenShell kernel sandbox — a Landlock LSM + OPA enforcement layer baked into the container boundary. Custodian's authority engine runs deterministically inside that sandbox. The agent literally cannot open a socket to a payment endpoint the kernel hasn't whitelisted — regardless of what the model decides.
Linux Security Module enforcing least-privilege file and network access at the syscall boundary. Even a compromised model cannot open a socket to an un-whitelisted endpoint. The kernel rejects it before user-space sees it.
Open Policy Agent evaluates every action request against the authority band in real time. Per-action caps ($250), rolling session windows ($1,000/2 hr), and escalation thresholds are enforced as Rego rules, not application code that can be patched around.
Every allow and deny emits an Open Cybersecurity Schema Framework event: tamper-evident, structured, verifiable by any SIEM. The log below is the live feed from the running sandbox right now.
A real Nous Hermes agent, in a real kernel sandbox, making real Stripe API calls in test mode — protecting ArgoBox, a production AI infrastructure platform. The enforcement is production-grade. The Stripe transactions are test-mode so no real money moves during the demo. These numbers are pulled live from the running system as you read this.
The AI reads it. The verifier checks every factual claim against the real order record. When the facts don't hold, the kernel overrides the AI — even if the AI said APPROVE.
Think of it like a new employee at a company. They can fill out a purchase order and decide it makes sense — but they can't sign their own check. The signed check is a separate system, run by people with authority the employee doesn't have.
Custodian does the same thing for AI. The agent (Nemotron) can decide a payment makes sense. But the actual move of money goes through a second system — the kernel — that checks the amount, the session budget, and whether the agent has been tricked. The agent never holds both keys at once.
Every tool call — whether it sends an SMS, submits an NVIDIA NIM inference job, reads a GitHub PR, or posts a Slack message — passes through the same Custodian kernel before executing. One governance layer. Every tool.
41 tools work immediately with no credentials — monitoring, security, web, file, and local storage tools. 61 additional tools activate when you add your own API keys (Stripe, NVIDIA, GitHub, AWS, and more).
One pip install. Drop decide() in front of any consequential action. The rest is policy.
pip install custodian-kernel
git clone https://github.com/KeyArgo/custodian-kernel cd custodian-kernel && pip install -e .
from custodian import decide result = decide( request="refund $200 to customer", state=session_state, policy=policy_config, ) if result.band == "L0": run() elif result.band == "L2": ask_human()
python3 verify_kit.py to prove it yourself — 90 seconds, no credentials
pip install custodian-kernel && custodian demo-verify
Everything runs from pip — no cloning private repos, no credentials, no staging environment. Install the kernel, run four commands, and every security claim checks out against the live system.
pip install custodian-kernel
custodian demo-verify
pip install custodian-kernel[dev]
pytest tests/
git clone https://github.com/KeyArgo/custodian-kernel
This is the only hackathon entry where a judge can verify every security claim from a single pip install.
Payman, Skyfire, Rain, Ramp, Catena — real fintech companies with card issuance, stablecoin rails, and compliance teams. They serve businesses that need payment infrastructure. Custodian serves a different problem: businesses that want to deploy AI agents that touch money but need to prove those agents can't go rogue. The bottom three rows are why we exist.
| Capability | Payman · Skyfire · Rain · Ramp · Catena | Custodian |
|---|---|---|
| Spend caps · approval · audit trail | ✓ table stakes | ✓ |
| Real card issuance & payment rails | ✓ (Ramp, Rain) | ✕ not our lane |
| Stablecoin / crypto rails | ✓ (Skyfire, Rain) | ✕ not our lane |
| SOC 2 / KYC compliance | ✓ (Payman, Catena) | ✕ early stage |
| Catches the agent lying — facts vs ground truth | ✕ none | ✓ only us |
| Enforcement below the agent — kernel, not API policy | ✕ none | ✓ only us |
| Self-hosted · non-custodial · rail-agnostic | ✕ they hold the funds | ✓ only us |
| Model-agnostic enforcement — swap Gemini, GPT, or a local DGX model; kernel safety properties don't change | ✕ coupled to their stack | ✓ LLMClient Protocol |
Our differentiator isn't payment infrastructure — it's enforcement architecture. The kernel sits underneath whatever rails and whatever model you use. Plug in Stripe, a bank API, or a stablecoin; swap Nemotron for any other model — the enforcement model doesn't change.
Custodian physically separates inference from enforcement. The inference model thinks. The enforcement kernel decides. Those two processes run on separate machines — the agent has no network path to the enforcement layer. The Custodian deployment here runs its enforcement kernel on a DGX Spark GB10 as proof-of-concept; the kernel itself is hardware-agnostic and ships as pip install custodian-kernel.
Watch a real agent recommend approving a fraudulent refund — and watch the deterministic kernel override it, with real Stripe IDs and an append-only audit trail.
Early access, new enforcement packs, and architecture notes — no spam.
Money is just the first module. The same kernel governs any consequential action an AI agent can take — provisioning, payroll, data egress, infrastructure.