# The AI Model Is Rented. TandemTrace's Ontology Isn't.

**Category:** Architecture · **Author:** TandemTrace research · **Date:** 2026-06-07 · **Reading time:** 12 min · **Tags:** ontology, enrichment, correlation, moat, architecture

> Why every serious "AI SOC" product lives or dies on the layer nobody demos.

Every AI security vendor shows you the same thing: a chat box over a SIEM. You type a question, an LLM answers, the room nods. It looks like the product.

It isn't. The chat box is the last 5% of the work, and it's the easy 5%. The product — the part that took years, the part competitors can't clone in a weekend — is the **ontology layer** that sits between your raw telemetry and the model. It's the part nobody puts on a slide, because "we translate billions of noisy events into a few security-meaningful entities the AI can actually reason about" doesn't demo as well as a blinking cursor.

This post is about that layer in TandemTrace: what it is, why an AI SOC can't function without it, where it's genuinely hard — and what it looks like when one alert moves through it end to end.

> **~2 min** to run a full alert investigation, end to end · **~$0.10** model cost per full investigation.

## TL;DR

- The chat box is the easy 5%. The **ontology layer** underneath it — extract, enrich, correlate, score — is the product.
- Point an LLM at a raw SIEM and it drowns in volume, pattern-matches noise into confident false findings, and misses the one fact that decides the verdict — because that fact **isn't in the logs**.
- Five layers run before the model: **normalize → enrich → correlate → score → reason**. The LLM only ever sees the refined tip.
- The model is rented and commoditized. The ontology — pre-built and tuned for SOC telemetry — is the **moat**.

## What an ontology layer actually is

An ontology is a shared model of the things in your world and how they connect. In security operations, those "things" are hosts, identities, IP addresses, processes, alerts, and the behaviors that link them. The ontology layer is the machinery that turns `4625 from 10.2.3.4` into:

> *This is a failed logon against a **domain controller**, from a **service account** with a known lockout history, from an IP that is a **known authorized scanner**, and it's the 4,000th identical event this hour when the 30-day baseline is zero.*

The raw event is four fields. The ontology output is a *situation* — and that situation is what a reasoning engine, human or AI, actually needs.

TandemTrace structures this as a pipeline of layers, each one killing a specific failure mode before the LLM ever runs. Each stage narrows the firehose — and the LLM only ever sees the tip:

```
THE PIPELINE FUNNEL — billions in, one verdict out

1  Normalize    ████████████████████  ~1B raw events / day, every vendor dialect
2  Enrich       ███████████████       + roles · baselines · MITRE · reputation
3  Correlate    █████████             ~ dozens of correlated behaviors
4  Score        █████                 benign suppressed · severity-ranked
5  AI Reasoning ██                    1 cited verdict

   in: noise  ───────────────────────────────►  out: signal an analyst can act on
```

Layer by layer, what each one does and the failure mode it removes:

| # | Layer | What it does | What it kills |
|---|-------|--------------|---------------|
| 1 | **Extract / Normalize** | SIEM-agnostic adapters (Elasticsearch, CrowdStrike, Sentinel, Splunk, Coralogix) translate every vendor's dialect into a common entity model | "Every source speaks a different language" |
| 2 | **Enrich** | Asset roles, service accounts, authorized tools, network topology, IP reputation, MITRE mapping, per-entity historical baselines | "The AI doesn't know what it's looking at" |
| 3 | **Cluster / Correlate** | Temporal windows, entity-overlap grouping, TTP-chain correlation collapse event storms into a handful of behaviors | Volume + noise |
| 4 | **Score / Prioritize** | Severity scoring, suppression of known-benign patterns, per-customer false-positive floors | "AI treats everything as equally suspicious" |
| 5 | **AI Reasoning** | Claude investigates, hypothesizes, verifies, and explains — over clean, contextualized input, with citations back to source events | Hallucination |

Those five are the layers we can put in a blog post. They are not the layers that win. Beneath them, TandemTrace runs a stack of proprietary enrichers and scorers — the part that took years, and the part a competitor can't clone from a diagram:

```
// Layers 6 – N · proprietary · NDA only
  6   [ ████████████████████████████ ]   Locked
  7   [ ███████████████████████████████ ] Locked
  8   [ ██████████████████████████ ]      Locked
  …   and several more we don't name in public
```

The five public layers turn raw logs into a *defensible* verdict. These turn a defensible verdict into a *great* one — the behavioral models, the suppression learning, and the per-environment tuning that are the actual moat. So they don't go in a blog post. (See them under NDA: hello@tandemtrace.ai.)

The order matters. The LLM is the *last* step, not the first. Everything before it exists to make its answer correct.

## One alert, end to end

Abstract layers are easy to nod along to and hard to trust. So here is the same `4625` from the top of this post, walked through all five layers — the way it actually moves through the pipeline:

- **IN — Raw event.** A Windows `4625` failed logon: account `svc-backup`, host `dc01`, source `10.2.3.4`. One of roughly a billion events the SIEM ingested today.
- **1 — Normalize.** Mapped to the common entity model: actor = identity `svc-backup`, target = host `dc01`, action = auth.failure. Same shape whether it arrived from Splunk, Sentinel, or CrowdStrike.
- **2 — Enrich.** `dc01` resolves to role: domain controller (tier 0). `svc-backup` resolves to a service account with prior lockout history. `10.2.3.4` is on the authorized-scanner allowlist. The 30-day baseline for this triple: zero.
- **3 — Correlate.** 4,000 identical failures this hour, all the same actor/target/source — collapsed into one behavior, not 4,000 alerts. No follow-on successful logon, no lateral movement in the window.
- **4 — Score.** Authorized scanner + service account + no successful auth + no follow-on = a known-benign pattern. The per-customer false-positive floor applies; severity drops below paging threshold.
- **5 — Reason.** Claude writes the verdict over the enriched, correlated input — *not* the raw logs — and every claim cites the events above.

**Verdict:** Benign — an authorized scanner hammering a service account with a stale credential. Recommend rotating the credential to stop the lockouts. **4,000 raw events → one explained, cited verdict. No analyst woken at 3 a.m.**

Hand those same 4,000 raw events to a bare LLM and the best case is it summarizes them; the worst case is it spots "4,000 failed logons against a domain controller" and pages someone for a brute-force attack that was never happening. The difference isn't the model. It's everything that ran before it.

## Why it's not optional for an AI SOC

The naive pitch — "just point an LLM at your SIEM" — fails in four ways, and each one is fatal in a security context.

**Volume.** A mid-size SIEM produces millions to billions of events a day. No context window holds that. Truncate or sample, and you throw away the one outlier event that mattered — and in security, the signal *is* the outlier you can't afford to drop.

**Noise.** 99.9%+ of telemetry is benign. An LLM handed raw logs spends its reasoning budget on irrelevance, then pattern-matches noise into a confident, plausible-sounding false finding. In a SOC, a hallucinated incident burns analyst trust as fast as a missed one.

**Missing context.** The fact that decides the verdict — *this host is a DC, this account is a service account, this source is your authorized pentest, this is the first time in 30 days* — **is not in the logs at all.** It lives in the ontology. Without it, the model invents a narrative. Plausible-but-wrong is the default output of an LLM reasoning over context-free data.

**Cost and latency.** Even if the raw data fit, reasoning over it token-by-token is slow and expensive — for a worse answer. Tokens spent re-deriving "is this IP internal?" are tokens not spent on the actual investigation.

And the obvious objections don't hold:

| The shortcut | Why it doesn't save you |
|--------------|-------------------------|
| "Just use a **bigger context window**." | A 10-million-token window full of benign events is a 10-million-token distraction. It solves volume and nothing else — the noise and missing-context problems are untouched. |
| "Just **RAG** the logs." | Similarity search retrieves documents that *look like* the question. Investigation needs entities related **by behavior** — temporal chains, identity pivots, TTP progressions. Correlation is a graph problem, not a retrieval problem. |
| "Just **point the LLM at the SIEM**." | It drowns in volume, burns its budget on noise, and invents a narrative for the context that was never in the logs. Confident, fluent, wrong. |

This is the same lesson Palantir spent two decades proving for general enterprise data: raw data is useless to a reasoning engine until it's mapped into an ontology. Their entire AIP thesis is "the model is a commodity; the ontology is the moat." We applied the identical principle to security operations — except the ontology ships pre-built and tuned for SOC telemetry, deployable in days rather than as a forward-deployed engineering engagement.

## Why it works

- **Grounded answers.** Because the ontology preserves links back to source events, every AI conclusion can be cited and verified. Grounding isn't a prompt trick — it's an architectural property.
- **Hallucination is constrained by construction.** The model reasons over vetted, correlated facts, not a firehose. It can't confidently attribute traffic to a threat actor that doesn't exist if the actor was never in the curated input.
- **Cheaper and faster inference.** A clean, enriched summary is a fraction of the tokens of raw logs — and produces a *better* answer.
- **Vendor-agnostic.** One hunting rule, one entity model, across seven SIEM dialects. Swap the SIEM underneath; the reasoning layer doesn't notice.
- **Customer-specific without being hardcoded.** Asset roles, baselines, authorized tools, and tuning all live in configuration and resolve at runtime. The same pipeline understands every environment it's deployed into.
- **Composable autonomy.** Once entities and relationships are first-class, agents can hunt, correlate across days, and escalate with consistent semantics — instead of re-parsing strings on every pass.

And it's not just cleaner — it's measurably cheaper and faster than reasoning over raw logs:

- **99.9%+** of telemetry filtered as benign before the model ever sees it.
- **~2 min** to run a full alert investigation, end to end.
- **~$0.10** model cost per investigated verdict.

That last number is worth sitting with, because the rest of the industry is an order of magnitude north of it. Public estimates put a single Claude-based AI-SOC investigation at **$1–$3** — call it ~$2. TandemTrace runs a *full* investigation for about **$0.10**, at comparable latency. That's not a discount; it's the ontology paying for itself — the model reasons over a refined situation, not a firehose of raw tokens.

| Cost per full investigation | |
|---|---|
| Typical AI SOC | ~$2.00 ($1–$3 range) |
| **TandemTrace** | **~$0.10** (sub-2-minute) |

*Industry figure: RunReveal, ["The cost of an AI SOC investigation"](https://blog.runreveal.com/ai-soc-investigation-cost-token-pricing/) (2026) — $1–$3 per investigation on Claude Sonnet/Opus, ~90s in their worked example. The two run at comparable speed — the gap is cost, and cost follows directly from how few tokens a refined situation takes to reason over.*

At 100 alerts a day that's the difference between roughly **$200** and **$10** — every day, on the work an autonomous SOC does most.

## Where it's hard

None of this is free, and the costs shape the architecture. The honest version — the part most vendor blogs skip:

- **Identity resolution.** `t.nguyen`, `t.nguyen@acme.example`, and `ACME\t.nguyen` are one person to three different sources. Get it wrong and your "cross-source correlation" is silently joining on luck. Deterministic normalization covers the easy cases; the hard ones need an authoritative identity bridge that may not even exist in a given deployment. Identity quietly eats the most engineering.
- **Garbage in, honest out.** The ontology can't manufacture context the source never captured. If the SIEM aggregates credential sprays by ASN instead of by IP, IP-keyed enrichment has nothing to key on — so it returns INCONCLUSIVE rather than a confident guess. We choose to fail honestly.
- **It's never done.** SIEM schemas, identity providers, and attacker tradecraft all change underneath you. The ontology is the highest-maintenance part of an AI SOC — and every layer is a place a verdict can flip, so nothing that moves a live verdict ships without a flag defaulting off and a shadow-mode harness that measures the change first. That discipline is a cost. It's also exactly why it's a moat.

## The verdict

An AI SOC without an ontology layer is a chat box that confidently makes things up. An AI SOC *with* one is slower to build, harder to maintain, and full of tradeoffs you have to engineer around — and it's also the only version that produces answers a SOC analyst can trust enough to act on.

The model is rented. Anyone can call Claude. The understanding you feed it — the extraction, enrichment, correlation, and scoring that turn ore into refined metal before the model ever runs — is the part that's yours. In an AI SOC, the ontology layer isn't preprocessing for the product — **it is the product.**

---

*See it run on your own alerts — [request a demo](https://tandemtrace.ai/#demo). Disagree? Send the counter-argument: hello@tandemtrace.ai.*