Why every serious "AI SOC" product lives or dies on the layer nobody demos.
Every AI security vendor shows you the same thing: a chat box over a SIEM. You type a question, an LLM answers, the room nods. It looks like the product.
It isn't. The chat box is the last 5% of the work, and it's the easy 5%. The product — the part that took years, the part competitors can't clone in a weekend — is the ontology layer that sits between your raw telemetry and the model. It's the part nobody puts on a slide, because "we translate billions of noisy events into a few security-meaningful entities the AI can actually reason about" doesn't demo as well as a blinking cursor.
This post is about that layer in TandemTrace: what it is, why an AI SOC can't function without it, where it's genuinely hard — and what it looks like when one alert moves through it end to end.
alert investigation,
end to end
per full
investigation
// TL;DR
- The chat box is the easy 5%. The ontology layer underneath it — extract, enrich, correlate, score — is the product.
- Point an LLM at a raw SIEM and it drowns in volume, pattern-matches noise into confident false findings, and misses the one fact that decides the verdict — because that fact isn't in the logs.
- Five layers run before the model: normalize → enrich → correlate → score → reason. The LLM only ever sees the refined tip.
- The model is rented and commoditized. The ontology — pre-built and tuned for SOC telemetry — is the moat.
What an ontology layer actually is
An ontology is a shared model of the things in your world and how they connect. In security
operations, those "things" are hosts, identities, IP addresses, processes, alerts, and the
behaviors that link them. The ontology layer is the machinery that turns
4625 from 10.2.3.4 into:
src_ip: 10.2.3.4
account: svc-backup
host: dc01
The raw event is four fields. The ontology output is a situation — and that situation is what a reasoning engine, human or AI, actually needs.
TandemTrace structures this as a pipeline of layers, each one killing a specific failure mode before the LLM ever runs. Each stage narrows the firehose — and the LLM only ever sees the tip:
Layer by layer, what each one does and the failure mode it removes:
| # | Layer | What it does | What it kills |
|---|---|---|---|
| 1 | Extract / Normalize | SIEM-agnostic adapters (Elasticsearch, CrowdStrike, Sentinel, Splunk, Coralogix) translate every vendor's dialect into a common entity model | "Every source speaks a different language" |
| 2 | Enrich | Asset roles, service accounts, authorized tools, network topology, IP reputation, MITRE mapping, per-entity historical baselines | "The AI doesn't know what it's looking at" |
| 3 | Cluster / Correlate | Temporal windows, entity-overlap grouping, TTP-chain correlation collapse event storms into a handful of behaviors | Volume + noise |
| 4 | Score / Prioritize | Severity scoring, suppression of known-benign patterns, per-customer false-positive floors | "AI treats everything as equally suspicious" |
| 5 | AI Reasoning | Claude investigates, hypothesizes, verifies, and explains — over clean, contextualized input, with citations back to source events | Hallucination |
Those five are the layers we can put in a blog post. They are not the layers that win. Beneath them, TandemTrace runs a stack of proprietary enrichers and scorers — the part that took years, and the part a competitor can't clone from a diagram:
- 6Cross-environment behavioral correlation engineLocked
- 7Adaptive false-positive learning from analyst feedbackLocked
- 8Per-entity risk-trajectory and drift modelingLocked
- …and several more we don't name in publicLocked
The five public layers turn raw logs into a defensible verdict. These turn a defensible verdict into a great one — the behavioral models, the suppression learning, and the per-environment tuning that are the actual moat. So they don't go in a blog post. See them under NDA →
The order matters. The LLM is the last step, not the first. Everything before it exists to make its answer correct.
One alert, end to end
Abstract layers are easy to nod along to and hard to trust. So here is the same
4625 from the top of this post, walked through all five layers — the way it
actually moves through the pipeline:
4625 failed logon: account svc-backup, host dc01, source 10.2.3.4. One of roughly a billion events the SIEM ingested today.svc-backup, target = host dc01, action = auth.failure. Same shape whether it arrived from Splunk, Sentinel, or CrowdStrike.dc01 resolves to role: domain controller (tier 0). svc-backup resolves to a service account with prior lockout history. 10.2.3.4 is on the authorized-scanner allowlist. The 30-day baseline for this triple: zero.Hand those same 4,000 raw events to a bare LLM and the best case is it summarizes them; the worst case is it spots "4,000 failed logons against a domain controller" and pages someone for a brute-force attack that was never happening. The difference isn't the model. It's everything that ran before it.
Why it's not optional for an AI SOC
The naive pitch — "just point an LLM at your SIEM" — fails in four ways, and each one is fatal in a security context.
Volume. A mid-size SIEM produces millions to billions of events a day. No context window holds that. Truncate or sample, and you throw away the one outlier event that mattered — and in security, the signal is the outlier you can't afford to drop.
Noise. 99.9%+ of telemetry is benign. An LLM handed raw logs spends its reasoning budget on irrelevance, then pattern-matches noise into a confident, plausible-sounding false finding. In a SOC, a hallucinated incident burns analyst trust as fast as a missed one.
Missing context. The fact that decides the verdict — this host is a DC, this account is a service account, this source is your authorized pentest, this is the first time in 30 days — is not in the logs at all. It lives in the ontology. Without it, the model invents a narrative. Plausible-but-wrong is the default output of an LLM reasoning over context-free data.
Cost and latency. Even if the raw data fit, reasoning over it token-by-token is slow and expensive — for a worse answer. Tokens spent re-deriving "is this IP internal?" are tokens not spent on the actual investigation.
And the obvious objections don't hold:
This is the same lesson Palantir spent two decades proving for general enterprise data: raw data is useless to a reasoning engine until it's mapped into an ontology. Their entire AIP thesis is "the model is a commodity; the ontology is the moat." We applied the identical principle to security operations — except the ontology ships pre-built and tuned for SOC telemetry, deployable in days rather than as a forward-deployed engineering engagement.
Why it works
- Grounded answers. Because the ontology preserves links back to source events, every AI conclusion can be cited and verified. Grounding isn't a prompt trick — it's an architectural property.
- Hallucination is constrained by construction. The model reasons over vetted, correlated facts, not a firehose. It can't confidently attribute traffic to a threat actor that doesn't exist if the actor was never in the curated input.
- Cheaper and faster inference. A clean, enriched summary is a fraction of the tokens of raw logs — and produces a better answer.
- Vendor-agnostic. One hunting rule, one entity model, across seven SIEM dialects. Swap the SIEM underneath; the reasoning layer doesn't notice.
- Customer-specific without being hardcoded. Asset roles, baselines, authorized tools, and tuning all live in configuration and resolve at runtime. The same pipeline understands every environment it's deployed into.
- Composable autonomy. Once entities and relationships are first-class, agents can hunt, correlate across days, and escalate with consistent semantics — instead of re-parsing strings on every pass.
And it's not just cleaner — it's measurably cheaper and faster than reasoning over raw logs:
as benign before the
model ever sees it
alert investigation,
end to end
per full
investigation
That last number is worth sitting with, because the rest of the industry is an order of magnitude north of it. Public estimates put a single Claude-based AI-SOC investigation at $1–$3 — call it ~$2. TandemTrace runs a full investigation for about $0.10, at comparable latency. That's not a discount; it's the ontology paying for itself — the model reasons over a refined situation, not a firehose of raw tokens.
At 100 alerts a day that's the difference between roughly $200 and $10 — every day, on the work an autonomous SOC does most.
Where it's hard
None of this is free, and the costs shape the architecture. The honest version — the part most vendor blogs skip:
- Identity resolution.
t.nguyen,[email protected], andACME\t.nguyenare one person to three different sources. Get it wrong and your "cross-source correlation" is silently joining on luck. Deterministic normalization covers the easy cases; the hard ones need an authoritative identity bridge that may not even exist in a given deployment. Identity quietly eats the most engineering. - Garbage in, honest out. The ontology can't manufacture context the source never captured. If the SIEM aggregates credential sprays by ASN instead of by IP, IP-keyed enrichment has nothing to key on — so it returns INCONCLUSIVE rather than a confident guess. We choose to fail honestly.
- It's never done. SIEM schemas, identity providers, and attacker tradecraft all change underneath you. The ontology is the highest-maintenance part of an AI SOC — and every layer is a place a verdict can flip, so nothing that moves a live verdict ships without a flag defaulting off and a shadow-mode harness that measures the change first. That discipline is a cost. It's also exactly why it's a moat.
The verdict
An AI SOC without an ontology layer is a chat box that confidently makes things up. An AI SOC with one is slower to build, harder to maintain, and full of tradeoffs you have to engineer around — and it's also the only version that produces answers a SOC analyst can trust enough to act on.
The model is rented. Anyone can call Claude. The understanding you feed it — the extraction, enrichment, correlation, and scoring that turn ore into refined metal before the model ever runs — is the part that's yours. In an AI SOC, the ontology layer isn't preprocessing for the product — it is the product.
See it run on your own alerts.
TandemTrace deploys with the ontology pre-built and tuned for SOC telemetry — live in days, not a forward-deployed engagement. Point it at your noisiest detector and watch the event storm become a handful of cited verdicts.
Disagree? Send the counter-argument: [email protected].