Now selecting design partners Founder-built

The triage queue and kill switch for production AI agents.

Catches what per-action checks miss. Pauses the gray-area calls for a human. Cites every decision — so you can ship more agents into regulated workflows.

Talk to founders Run the live demo How it works

Sample session · session-7e3a4b9c · Action 2 of 2

Paused for review

Pay $1,840.00 to Northstar Office Supply · Inv. INV-DEMO-001-DUP

Why paused · TEAM-2 / v1 · Duplicate Invoice

Pause when an action closely matches a prior action in the same session — same vendor, same amount, same payment method, different invoice id. Operator decides: legitimate re-send, or accidental duplicate?

Evidence · matched against prior action a-3e9c1f

prior_invoice_similarity_max	0.96	1.00
vendor	match	1.00
amount	match	1.00
payment_method	match (ACH)	1.00

Static sample — run the live version → Talk to founders ↑

Founders: Built by Sid Vemuri and Sandra Ho. Read more ›
Design partners: Selecting partners running agents in production. Apply

Why this exists

Three patterns every per-action rule misses.

Examples below use an accounts-payable agent — the dollar amounts make the stakes obvious. Same patterns show up in medical-order, support-reply, and code-commit agents.

Hidden policy

Your rules. The agent doesn’t know them.

Most company policy lives outside the agent’s training data. For an AP agent: an invoice from a vendor in your master DB, exact domain match, $4,200 — well under any approval threshold — from a routine email thread. Every per-action check returns clean. But your Finance team requires an EXC-NNNN exception ticket for any payment outside the standard cycle, and there isn’t one in the notes. The agent has no way to know that rule exists.

Novarch enforces the rule your team wrote in plain English — block any off-cycle payment without a valid EXC-NNNN exception ticket — before the action commits. Decision: blocked. Citation: TEAM-4, the missing-ticket signal, one-paragraph rationale. The policy gap is the load-bearing finding — not a harder-to-explain combinational pattern.

Per-action checks pass

Clean to every rule. Wrong in context.

A single action passes every per-action rule, but its context doesn’t. For an AP agent: an invoice from a vendor in your master DB, whose bank routing changed four days ago, for $24,800 — just under your $25K threshold — from a domain one character off the real one. Every individual check still passes.

Novarch’s session judge applies a rule your team wrote in plain English — block payments where vendor bank changed within 14 days AND amount is within 15% of the approval threshold AND source-domain similarity is suspicious — to the signals and the agent’s own reasoning. Decision: blocked. Citation: TEAM-1, three signal IDs, one-paragraph rationale.

Defensible record

“The LLM said so” is not an audit trail.

Free-form LLM rationales drift between runs. They cite no specific evidence. They can't be replayed. Your CRO has to defend agent decisions to a regulator with a document, not a chat log.

Every Novarch decision cites the rule that fired, the signals it weighed, and the rationale — pinned to an exact model version and prompt template, replayable on demand. The audit document is rendered from database rows, not written by an LLM. That’s what makes it credible.

How it works

Signals, then a judge, then a human.

No magic. Each stage is open, inspectable, and replayable.

01 · Per action

Signals.

Deterministic measurements pulled before the judge runs: vendor bank-change recency, amount-to-threshold ratio, source-domain similarity to canonical, prior-action similarity within the session, the agent’s own captured reasoning. Computable. Replayable. No LLM.

What the judge can cite.

02 · Per session

Judge.

One LLM call per action. Applies your team’s plain-English rules to the signals plus the agent’s reasoning plus prior actions in the session. Pinned model, temperature zero, structured output with rule and signals cited on every decision — pass, pause, or block.

Decides in one call.

03 · Per pause

Operator.

When the judge pauses an action, the operator sees one card. The rule that fired. The cited signals. The agent’s reasoning. The vendor’s history. Approve, deny, or skip. The audit row records who decided, when, and against what version of the rule.

Decides what shouldn’t be automated.

Adding a rule can only tighten — the most-restrictive rule wins. Rules are versioned plain English; every decision cites the rule_id, the rule version, the signals weighed, the judge’s model SHA, and the prompt template that produced it. Replayable on demand.

Watch this

Each individual check passes. The session judge kills it anyway.

A BEC pattern from the demo. Scroll to play.

14:32:01 ● Receive invoice · Helix Logistics · $24,800.00 passes
14:32:02 ● Match PO · #PO-22041 passes
14:32:03 ● Verify vendor in master DB · Helix Logistics passes
14:32:04 ◯ vendor_bank_change_days_ago = 4 signal
14:32:05 ◯ amount_to_threshold_ratio = 0.992 signal
14:32:06 ◯ domain_similarity_to_canonical = 0.94 signal
Team rule fires · TEAM-1 / v1 · BEC Pattern

Block payments where vendor bank changed within 14 days AND amount is within 15% of the approval threshold AND source-domain similarity to the canonical vendor domain is suspicious. All three conjuncts required.
14:32:08 ■ Session judge: BLOCKED — payment never executes. blocked

The audit row this just produced renders below ↓.

The artifact your CRO defends

This is what your CRO hands a regulator.

The judge’s pinned model and prompt-template version are stamped on every decision. Replayable on demand.

Decision · committed 14:32:08 UTC

Payment blocked.

session-7e3a4b9c · action-2f1e · model claude-haiku-4-5-20251001

Rule cited · TEAM-1 / v1 · BEC Pattern

Block payments where vendor bank changed within 14 days AND amount is within 15% of the approval threshold AND source-domain similarity to the canonical vendor domain is suspicious. All three conjuncts required.

Signals weighed · 3

Signal	Value	Confidence
vendor_bank_change_days_ago	4	0.98
amount_to_threshold_ratio	0.992	1.00
domain_similarity_to_canonical	0.94	0.91

Judge rationale

The combination of a recent bank-routing change (4 days ago), a payment amount within 0.8% of the $25,000 approval threshold, and a source-domain similarity of 0.94 against the canonical vendor domain indicates a high-likelihood business-email-compromise pattern. None of the per-action checks failed; the conjunction of these three signals satisfies the cited rule’s test.

Who’s behind it

Founders

Sid Vemuri — product manager on the Microsoft Fabric Consumer AI experience; previously PM lead for Power BI Copilot evaluations. MS in Machine Learning, Georgia Tech.

Sandra Ho — applied AI engineer on Microsoft’s Security and AI Research team. Builds eval and observability for Microsoft Security Copilot; co-author of CTI-REALM, an open-source benchmark for AI in detection engineering. Carnegie Mellon University.

Full team ›

Design partners

Working with a small number of design partners running production AI agents in regulated workflows — accounts payable, claims, support reply, code commit. Free during the program in exchange for weekly product feedback and the right to write up the integration. Apply if you’ve got an agent in production and need defensible decision records before you scale further. Apply ›