# The handoff packet: a proposed schema for AI→human escalations

**Category:** Spec · **Author:** TandemTrace research · **Date:** 2026-05-17 · **Reading time:** 7 min · **Tags:** spec, handoff, escalation, schema, autonomy

> *Hidden draft. Not yet published.*

Every autonomous SOC vendor has a triage agent. Almost none of them have agreed on what the agent should *hand the human* when it decides to escalate. The product UIs all look different, the JSON looks different, the field names look different, and the analyst on the receiving end pays the cost in lost context every time they switch tools.

This is the thing that should be standard. We're proposing a schema for it.

## Why the packet matters more than the UI

Most of the agentic-SOC pitches focus on the agent's decision. The harder problem is what the agent says when it can't decide. An escalation that arrives as a paragraph of summary plus a link to the original alert is, structurally, a worse experience than the original alert was — the human now has to read the agent's reasoning *and* re-derive the underlying evidence.

A correctly-shaped escalation is the opposite. It gives the human the verdict first, the confidence calibration second, the evidence graph third, and the agent's failed pivots fourth — so that the first 30 seconds of human attention go to the conclusion, the next 30 go to verifying it, and the next 60 go to extending the investigation rather than restarting it.

The packet is the contract.

## Required fields

```json
{
  "packet_version": "0.1",
  "case_id": "tt-2026-05-17-00481",
  "verdict": {
    "decision": "escalate",
    "label": "suspicious",
    "confidence": 0.62,
    "confidence_floor": 0.85,
    "rationale": "Identity baseline mismatch + atypical geo + sensitive scope; insufficient evidence to close.",
    "reversibility": "soft"
  },
  "evidence": {
    "primary_signals": [...],
    "supporting_signals": [...],
    "ruled_out": [...]
  },
  "pivots": {
    "attempted": [...],
    "succeeded": [...],
    "failed": [...]
  },
  "recommended_next_action": {
    "action": "review_with_user",
    "owner_role": "tier_2_analyst",
    "expected_duration_seconds": 180
  },
  "audit": {
    "agent_version": "trace-7.4.1",
    "model": "claude-opus-4-7",
    "tool_calls": 11,
    "total_runtime_ms": 4380,
    "cost_usd": 0.043
  }
}
```

Five blocks, six if you count the wrapper. Everything below is a description of *why each block exists*, not how it should be rendered.

## `verdict`

The single field with the most leverage. `decision` is one of `close`, `suspend`, or `escalate` — and yes, three is enough. `suspend` is the case where the agent has a verdict it isn't allowed to act on (because the policy threshold isn't met) but it would be a mistake to bury it in the closed-cases archive.

`confidence` is the calibrated probability the agent assigns to its own decision. `confidence_floor` is the policy threshold for the action. If `confidence < confidence_floor`, the agent must escalate; this is non-negotiable and should be enforced at the schema level, not the prompt level.

`reversibility` is `hard`, `soft`, or `none`. A `hard`-reversibility action (e.g., disabling a user, isolating an endpoint) requires a much higher confidence floor than a `soft` one (closing an alert as benign). The packet must carry this so the receiver can audit it.

## `evidence`

Three buckets, not one. `primary_signals` are the artifacts the verdict was built on. `supporting_signals` are corroborating artifacts the agent pulled but the verdict doesn't depend on. `ruled_out` is the most valuable block and the one nobody publishes — the hypotheses the agent considered and discarded, with the reason.

A human reviewing an escalation needs to know *what the agent already eliminated*, otherwise they'll redo the same work. This is the single biggest source of wasted analyst time in human-AI SOC handoffs today. If the packet doesn't carry `ruled_out`, the agent's investigation is effectively invisible.

## `pivots`

The agent's trace through the data. `attempted` is the chronological list of queries, integrations called, lookups performed. `succeeded` is the subset that returned non-empty answers; `failed` is the subset that timed out, returned errors, or hit empty results.

Failed pivots are diagnostic gold. An agent that escalated because *the EDR API was rate-limited* is a fundamentally different escalation than one that escalated because *the EDR returned data that didn't support a verdict.* The first is an infrastructure problem; the second is a real ambiguity. The packet has to distinguish them.

## `recommended_next_action`

Optional, but high-value. The agent has more state than the human; it should commit to a specific next step rather than hand off an open-ended "please review." `owner_role` is a role, not a person — the routing layer assigns the actual analyst.

`expected_duration_seconds` is the agent's estimate of how long the human work will take, which is the input to capacity planning. Without it, SOC managers are guessing at how many escalations per hour the human queue can absorb.

## `audit`

The fields a regulator, a customer's security team, or a future incident reviewer will want. `agent_version` and `model` are required for reproducibility. `tool_calls`, `runtime_ms`, and `cost_usd` are how a CISO budgets the autonomous layer; without them, the agent is a black box with a flat invoice.

## What we're not including

We deliberately omitted three things people will ask for.

**No free-text narrative summary.** The summary is a rendering of the packet, not a field in it. If you put narrative text in the schema, every downstream tool gets a different summary and the packet stops being a contract.

**No screenshots or rendered HTML.** Same reason. The packet is data; rendering belongs in the tool that receives it.

**No "agent's reasoning chain."** Chain-of-thought is a property of the model, not the case. If it's useful to ship, it goes under `audit` as a debug field, not as evidence.

## How to use this

If you're building an AI SOC product, you can adopt this schema directly. The fields are deliberately small in number and high in leverage — six top-level blocks, none of them controversial, all of them recoverable from a well-structured agent. If you implement it and find a block we missed, tell us.

If you're a SOC lead evaluating vendors, ask each one for an example escalation packet. If it doesn't have `ruled_out` and `failed_pivots`, the agent's investigation isn't visible to your team. You will pay for that in re-derivation cost on every escalation, forever.

A schema is not a product. But a missing schema is a tax — and the SOC has paid that tax for a decade.

---

*Comments and counter-proposals welcome: hello@tandemtrace.ai. We'll version the schema in public.*
