The AI model is rented. TandemTrace's ontology isn't.

Why every serious "AI SOC" product lives or dies on the layer nobody demos.

Every AI security vendor shows you the same thing: a chat box over a SIEM. You type a question, an LLM answers, the room nods. It looks like the product.

It isn't. The chat box is the last 5% of the work, and it's the easy 5%. The product — the part that took years, the part competitors can't clone in a weekend — is the ontology layer that sits between your raw telemetry and the model. It's the part nobody puts on a slide, because "we translate billions of noisy events into a few security-meaningful entities the AI can actually reason about" doesn't demo as well as a blinking cursor.

This post is about that layer in TandemTrace: what it is, why an AI SOC can't function without it, where it's genuinely hard — and what it looks like when one alert moves through it end to end.

~2 min

to run a full
alert investigation,
end to end

~$0.10

model cost
per full
investigation

// TL;DR

The chat box is the easy 5%. The ontology layer underneath it — extract, enrich, correlate, score — is the product.
Point an LLM at a raw SIEM and it drowns in volume, pattern-matches noise into confident false findings, and misses the one fact that decides the verdict — because that fact isn't in the logs.
Five layers run before the model: normalize → enrich → correlate → score → reason. The LLM only ever sees the refined tip.
The model is rented and commoditized. The ontology — pre-built and tuned for SOC telemetry — is the moat.

What an ontology layer actually is

An ontology is a shared model of the things in your world and how they connect. In security operations, those "things" are hosts, identities, IP addresses, processes, alerts, and the behaviors that link them. The ontology layer is the machinery that turns 4625 from 10.2.3.4 into:

// The transform4 fields → a situation

Raw event

event_id: 4625
src_ip: 10.2.3.4
account: svc-backup
host: dc01

→

Ontology output

Failed logon against a domain controller, from a service account with a known lockout history, from a known authorized scanner — and it's the 4,000th identical event this hour when the 30-day baseline is zero.

The raw event is four fields. The ontology output is a situation — and that situation is what a reasoning engine, human or AI, actually needs.

TandemTrace structures this as a pipeline of layers, each one killing a specific failure mode before the LLM ever runs. Each stage narrows the firehose — and the LLM only ever sees the tip:

// The pipeline funnelbillions in → one verdict out

1Normalize

~1B raw events / day, every vendor dialect

2Enrich

+ roles · baselines · MITRE · reputation

3Correlate

~ dozens of correlated behaviors

4Score

benign suppressed · severity-ranked

5AI Reasoning

1 cited verdict

In · noise Out · signal an analyst can act on

Layer by layer, what each one does and the failure mode it removes:

#	Layer	What it does	What it kills
1	Extract / Normalize	SIEM-agnostic adapters (Elasticsearch, CrowdStrike, Sentinel, Splunk, Coralogix) translate every vendor's dialect into a common entity model	"Every source speaks a different language"
2	Enrich	Asset roles, service accounts, authorized tools, network topology, IP reputation, MITRE mapping, per-entity historical baselines	"The AI doesn't know what it's looking at"
3	Cluster / Correlate	Temporal windows, entity-overlap grouping, TTP-chain correlation collapse event storms into a handful of behaviors	Volume + noise
4	Score / Prioritize	Severity scoring, suppression of known-benign patterns, per-customer false-positive floors	"AI treats everything as equally suspicious"
5	AI Reasoning	Claude investigates, hypothesizes, verifies, and explains — over clean, contextualized input, with citations back to source events	Hallucination

Those five are the layers we can put in a blog post. They are not the layers that win. Beneath them, TandemTrace runs a stack of proprietary enrichers and scorers — the part that took years, and the part a competitor can't clone from a diagram:

// Layers 6 – N · proprietaryNDA only

6Cross-environment behavioral correlation engineLocked
7Adaptive false-positive learning from analyst feedbackLocked
8Per-entity risk-trajectory and drift modelingLocked
…and several more we don't name in publicLocked

The five public layers turn raw logs into a defensible verdict. These turn a defensible verdict into a great one — the behavioral models, the suppression learning, and the per-environment tuning that are the actual moat. So they don't go in a blog post. See them under NDA →

The order matters. The LLM is the last step, not the first. Everything before it exists to make its answer correct.

One alert, end to end

Abstract layers are easy to nod along to and hard to trust. So here is the same 4625 from the top of this post, walked through all five layers — the way it actually moves through the pipeline:

// INRaw event

A Windows 4625 failed logon: account svc-backup, host dc01, source 10.2.3.4. One of roughly a billion events the SIEM ingested today.

// 1Normalize

Mapped to the common entity model: actor = identity svc-backup, target = host dc01, action = auth.failure. Same shape whether it arrived from Splunk, Sentinel, or CrowdStrike.

// 2Enrich

dc01 resolves to role: domain controller (tier 0). svc-backup resolves to a service account with prior lockout history. 10.2.3.4 is on the authorized-scanner allowlist. The 30-day baseline for this triple: zero.

// 3Correlate

4,000 identical failures this hour, all the same actor/target/source — collapsed into one behavior, not 4,000 alerts. No follow-on successful logon, no lateral movement in the window.

// 4Score

Authorized scanner + service account + no successful auth + no follow-on = a known-benign pattern. The per-customer false-positive floor applies; severity drops below paging threshold.

// 5Reason

Claude writes the verdict over the enriched, correlated input — not the raw logs — and every claim cites the events above.

Verdict: Benign — an authorized scanner hammering a service account with a stale credential. Recommend rotating the credential to stop the lockouts. 4,000 raw events → one explained, cited verdict. No analyst woken at 3 a.m.

Hand those same 4,000 raw events to a bare LLM and the best case is it summarizes them; the worst case is it spots "4,000 failed logons against a domain controller" and pages someone for a brute-force attack that was never happening. The difference isn't the model. It's everything that ran before it.

Why it's not optional for an AI SOC

The naive pitch — "just point an LLM at your SIEM" — fails in four ways, and each one is fatal in a security context.

Volume. A mid-size SIEM produces millions to billions of events a day. No context window holds that. Truncate or sample, and you throw away the one outlier event that mattered — and in security, the signal is the outlier you can't afford to drop.

Noise. 99.9%+ of telemetry is benign. An LLM handed raw logs spends its reasoning budget on irrelevance, then pattern-matches noise into a confident, plausible-sounding false finding. In a SOC, a hallucinated incident burns analyst trust as fast as a missed one.

Missing context. The fact that decides the verdict — this host is a DC, this account is a service account, this source is your authorized pentest, this is the first time in 30 days — is not in the logs at all. It lives in the ontology. Without it, the model invents a narrative. Plausible-but-wrong is the default output of an LLM reasoning over context-free data.

Cost and latency. Even if the raw data fit, reasoning over it token-by-token is slow and expensive — for a worse answer. Tokens spent re-deriving "is this IP internal?" are tokens not spent on the actual investigation.

And the obvious objections don't hold:

The shortcut

Why it doesn't save you

"Just use a bigger context window."

A 10-million-token window full of benign events is a 10-million-token distraction. It solves volume and nothing else — the noise and missing-context problems are untouched.

"Just RAG the logs."

Similarity search retrieves documents that look like the question. Investigation needs entities related by behavior — temporal chains, identity pivots, TTP progressions. Correlation is a graph problem, not a retrieval problem.

"Just point the LLM at the SIEM."

It drowns in volume, burns its budget on noise, and invents a narrative for the context that was never in the logs. Confident, fluent, wrong.

This is the same lesson Palantir spent two decades proving for general enterprise data: raw data is useless to a reasoning engine until it's mapped into an ontology. Their entire AIP thesis is "the model is a commodity; the ontology is the moat." We applied the identical principle to security operations — except the ontology ships pre-built and tuned for SOC telemetry, deployable in days rather than as a forward-deployed engineering engagement.

Why it works

Grounded answers. Because the ontology preserves links back to source events, every AI conclusion can be cited and verified. Grounding isn't a prompt trick — it's an architectural property.
Hallucination is constrained by construction. The model reasons over vetted, correlated facts, not a firehose. It can't confidently attribute traffic to a threat actor that doesn't exist if the actor was never in the curated input.
Cheaper and faster inference. A clean, enriched summary is a fraction of the tokens of raw logs — and produces a better answer.
Vendor-agnostic. One hunting rule, one entity model, across seven SIEM dialects. Swap the SIEM underneath; the reasoning layer doesn't notice.
Customer-specific without being hardcoded. Asset roles, baselines, authorized tools, and tuning all live in configuration and resolve at runtime. The same pipeline understands every environment it's deployed into.
Composable autonomy. Once entities and relationships are first-class, agents can hunt, correlate across days, and escalate with consistent semantics — instead of re-parsing strings on every pass.

And it's not just cleaner — it's measurably cheaper and faster than reasoning over raw logs:

99.9%+

of telemetry filtered
as benign before the
model ever sees it

~2 min

to run a full
alert investigation,
end to end

~$0.10

model cost
per full
investigation

That last number is worth sitting with, because the rest of the industry is an order of magnitude north of it. Public estimates put a single Claude-based AI-SOC investigation at $1–$3 — call it ~$2. TandemTrace runs a full investigation for about $0.10, at comparable latency. That's not a discount; it's the ontology paying for itself — the model reasons over a refined situation, not a firehose of raw tokens.

// Cost per full investigation≈ 20× cheaper

Typical AI SOC

~$2.00$1–$3 range

TandemTrace

~$0.10sub-2-minute

Industry figure: RunReveal, "The cost of an AI SOC investigation" (2026) — $1–$3 per investigation on Claude Sonnet/Opus, ~90s in their worked example. TandemTrace figure: full alert investigation, sub-2-minute median. The two run at comparable speed — the gap is cost, and cost follows directly from how few tokens a refined situation takes to reason over.

At 100 alerts a day that's the difference between roughly $200 and $10 — every day, on the work an autonomous SOC does most.

Where it's hard

None of this is free, and the costs shape the architecture. The honest version — the part most vendor blogs skip:

Identity resolution. t.nguyen, [email protected], and ACME\t.nguyen are one person to three different sources. Get it wrong and your "cross-source correlation" is silently joining on luck. Deterministic normalization covers the easy cases; the hard ones need an authoritative identity bridge that may not even exist in a given deployment. Identity quietly eats the most engineering.
Garbage in, honest out. The ontology can't manufacture context the source never captured. If the SIEM aggregates credential sprays by ASN instead of by IP, IP-keyed enrichment has nothing to key on — so it returns INCONCLUSIVE rather than a confident guess. We choose to fail honestly.
It's never done. SIEM schemas, identity providers, and attacker tradecraft all change underneath you. The ontology is the highest-maintenance part of an AI SOC — and every layer is a place a verdict can flip, so nothing that moves a live verdict ships without a flag defaulting off and a shadow-mode harness that measures the change first. That discipline is a cost. It's also exactly why it's a moat.

The verdict

An AI SOC without an ontology layer is a chat box that confidently makes things up. An AI SOC with one is slower to build, harder to maintain, and full of tradeoffs you have to engineer around — and it's also the only version that produces answers a SOC analyst can trust enough to act on.

The model is rented. Anyone can call Claude. The understanding you feed it — the extraction, enrichment, correlation, and scoring that turn ore into refined metal before the model ever runs — is the part that's yours. In an AI SOC, the ontology layer isn't preprocessing for the product — it is the product.

See it run on your own alerts.

TandemTrace deploys with the ontology pre-built and tuned for SOC telemetry — live in days, not a forward-deployed engagement. Point it at your noisiest detector and watch the event storm become a handful of cited verdicts.

Request a demo ↗ How alert triage works

Disagree? Send the counter-argument: [email protected].