The team

Two engineers building runtime governance, from the inside out.

We’ve spent the last two years building the evaluation and observability infrastructure that decides whether production Microsoft Copilot products are safe to ship. Novarch is what happens when you turn that discipline — pinned models, cited rules, structured evidence, operator‑in‑the‑loop on borderline cases — into a runtime layer your customers can put in front of their own agents.

Co-founder · Product

Sid Vemuri

Product manager on the Microsoft Fabric Consumer AI experience.

Previously the PM lead for evaluation on Microsoft Power BI Copilot. The work: defining the metrics, building the test frameworks, and figuring out whether the quality signals the team relied on actually reflected what users experienced. Those evaluations informed architecture decisions and shipped measurable quality improvements into a product millions of analysts use.

Background: MS in Machine Learning, Georgia Institute of Technology. Lead author on a CogSci 2024 paper studying how AI models capture human-like concepts.

Now: Microsoft · Fabric Consumer AI
Prior: Microsoft · Power BI Copilot Evals (PM lead)
Education: Georgia Tech · MS, Machine Learning
Research: Lead author, CogSci 2024

Co-founder · Engineering

Sandra Ho

Applied AI engineer on Microsoft’s Security and AI Research team.

Builds the observability and evaluation harnesses for Microsoft Security Copilot, and runs evaluations against Security Copilot and frontier models across a wide range of security tasks. The practical question her work answers: under realistic adversary conditions, how does a given model actually behave inside a detection-engineering workflow?

Co-author of CTI-REALM, an open-source benchmark that evaluates AI on realistic attack emulations, real telemetry, and the full detection-engineering workflow. Microsoft’s EVP of Security, Igor Tsyganskiy, cited the benchmark publicly when announcing Microsoft’s Project Glasswing collaboration with Anthropic.

Now: Microsoft · Security and AI Research
Ships: Eval & observability for Security Copilot + frontier-model security evals
Open source: CTI-REALM · co-author
Education: Carnegie Mellon University

Why this team

Eval discipline first. Then a product around it.

Most of the load-bearing decisions in Novarch — one LLM call per action, pinned model SHA, structured output with a rule and signals cited on every decision, an audit document rendered from database rows rather than written by a model, an operator who decides on borderline cases instead of trusting a free-form rationale — came from the same place: spending day-jobs inside Microsoft asking whether a given AI system is actually good enough to ship into a workflow that matters.

Runtime governance is downstream of that question. Novarch is the version of the answer your customers can run on agents you didn’t train.

Two engineers building runtime governance, from the inside out.

Sid Vemuri

Sandra Ho

Eval discipline first. Then a product around it.

Talk to the founders building this.