Runtime safety
for AI agents.

AI agents reason in natural language. The consequences are real money, real emails, real production code. Chimera is the deterministic layer that sits between an agent and its tools, so "mostly safe" becomes "provably safe."

The problem

Today's agent safety is probabilistic.

Modern AI agents call tools, move money, send emails, write code. The safety stack of RLHF, instruction hierarchies, and prompt filters catches most attacks. Most isn't enough when an agent is approving wire transfers or filing compliance reports.

Our research benchmarked GPT-4o, Claude Sonnet 4, and Gemini 2.0 Flash against 73 adversarial scenarios across 12 categories. Frontier models scored between 75% and 86% attack resistance. Six attacks bypassed every model on every run. Same input, three different providers, identical failures. The vulnerabilities are structural, not provider-specific.

Adversarial scenarios
73
Attack categories
12
API calls in study
1,062
Universal bypasses
6
What we build

Three pieces, one system.

Why now

The compliance window is open.

The EU AI Act enforces in 2026. AI agents are moving from labs into production. Companies that figure out runtime safety first will own the category, the way Sentry owned errors, Vercel owned deploys, and Snyk owned dependencies.

We're building from a research foundation, not a vibes-based one: peer-reviewable benchmarks, formally verifiable policies, signed audit trails. The kind of infrastructure that holds up under regulatory scrutiny five years from now.

Who built this

From neuro-symbolic research to runtime infrastructure.

Project Chimera began as a research effort in neuro-symbolic causal AI: an attempt to give learned models the kind of structured, verifiable reasoning that pure deep learning couldn't guarantee. Working on it, we kept hitting the same wall. The interesting failures weren't in the model. They were in what the model decided to do at runtime.

That insight reshaped the project. The same deterministic principles we used for reasoning became the basis for a runtime policy layer. Today Chimera is the safety infrastructure between AI agents and the consequences of their decisions.

We're early. We're shipping fast. If you're building agents in production and runtime safety is on your mind, we want to hear from you.