"Autonomous" is doing a lot of work in security marketing right now. The word has come to mean two different things, and the more popular meaning is the dangerous one.
Automatic vs. autonomous
// AUTOMATIC
Always takes the same action given the same trigger. No state, no introspection, no decision about whether to act. SOAR playbooks are automatic. So is most signature-based blocking.
// AUTONOMOUS
Makes the decision, including the decision to escalate. Knows when its confidence is below the threshold required to act. Carries the evidence forward when it doesn't.
Automatic is fine when the action is cheap to reverse and the trigger is precise. It breaks when either of those assumptions does. An autonomous system has to handle the cases automatic can't: the ones where the right call depends on whether you trust your own answer.
The confidence-boundary problem
The hard design question is where to draw the line.
Set the boundary too cautiously and you're paying for SOAR with an extra UI. Every uncertain case becomes a human task; the autonomous layer adds friction instead of capacity.
Set it too aggressively and the system acts on cases it shouldn't. Production outages. Blocked legitimate users. Pages for nothing. The cost of an over-confident autonomous decision is higher than the cost of an over-confident automatic one, because the autonomous one is harder to explain after the fact, harder to undo, and harder to trust again afterwards.
This is mostly a calibration problem. The model can be excellent; if the threshold is wrong, the system still misbehaves. A merely good model with a well-chosen boundary is more useful than a perfect model that doesn't know when to stop.
What we chose
TandemTrace's autonomous triage resolves into one of three tiers, not two:
Close with full evidence
The system has read the alert, asked its pivots, reached a verdict, and is confident enough to act. Verdict is logged. Containment, if any, is executed.
Suspend with evidence
A verdict was reached, but not with confidence enough to act on. Containment is prepared, not executed. The case waits with the evidence chain attached for a human to confirm or reject.
Escalate
No verdict was reached. The case goes to the analyst with the autonomous reasoning trail attached, so the analyst doesn't restart from zero.
The default ratios per deployment are anchored to the customer's risk tolerance rather than ours. A bank with regulatory exposure runs a more conservative threshold than a media company, and both deployments are still autonomous.
The metric that matters
A real autonomous system distinguishes itself by the quality of its handoffs more than by the volume of its closures. Anyone can build a system that closes 90% of alerts. The question is what happens to the 10% it shouldn't have, and how cleanly the system gives those cases back to a human.
I've seen autonomous stacks that close 95% of alerts. The 5% they hand back are the ones that tell you whether the rest were right.
In our deployments the autonomous closure rate is around 78%. The other 22% lands in the analyst queue. Of that 22%, more than 95% arrive with a complete reasoning trail: the pivots the system asked, the evidence it found, the verdict it landed on, and the reason it chose not to act on its own.
The analyst's job isn't to redo the triage. It's to validate the decision and to decide whatever the autonomous layer couldn't.
If you want a deeper look at the design choices, or to compare notes on how to draw the boundary in your own environment, get in touch.