How network security moved from state machines to learned behavior
Before transformers entered cybersecurity, some of the most effective intrusion detection systems were already behavioral. Long before the term “foundation model” existed, researchers recognized that signatures and static indicators were insufficient, what mattered was how systems behaved over time.
One of the most influential approaches from that era modeled network traffic as chains of behavioral states, using probabilistic techniques such as Markov models. These systems deserve credit: they were principled, explainable, and within the constraints of their time, remarkably effective.
But they also reveal, very clearly, where the ceiling was.
State-based IDS: behavior as symbols
State-based behavioral IDS systems begin with a crucial assumption:
Network behavior can be discretized into a small number of meaningful states.
Raw NetFlow records are first compressed into a handful of hand-engineered features typically things like size, duration, and periodicity. Each flow is then mapped to a discrete state, producing a symbolic sequence such as:
a → a → b → c → a → …
Once behavior has been reduced to a sequence of symbols, classical probabilistic tools become viable. Most notably, Markov chains are used to model the probability of transitioning from one state to the next.
Mathematically, the system assumes:
[ P(S_t \mid S_{t-1}) ]
Meaning: the future depends only on the present state.
This is the defining property of Markovian behavior.
Why Markov models worked and why they stopped scaling
Markov models were a smart choice at the time:
- They are computationally cheap
- They are interpretable
- They work well when behavior is highly regular (as early botnet C2 traffic often was)
- They allow long sequences to be summarized statistically
But they also impose a hard ceiling.
Once behavior is reduced to:
- a fixed set of states
- coarse thresholds
- first-order transitions
the model can no longer reason about:
- long-range dependencies
- multi-phase attacks
- delayed causality
- behavior that does not neatly repeat
Everything beyond the immediately previous state is either lost or approximated away.
In practice, this meant that defenders had to predefine what kinds of behavior mattered and accept that anything outside those definitions would be invisible.
Transformers and LogLM: behavior without Markov assumptions
Transformer-based models including DeepTempo’s LogLM remove this assumption entirely.
Instead of asking:
“What state is this flow in?”
LogLM asks:
“What does this sequence of events mean in context?”
There are no predefined states.
There is no fixed alphabet.
There is no assumption that behavior is Markovian.
A transformer models with learned relevance, not fixed memory.
Any event can attend to any other, whether it occurred seconds ago or hours ago. Periodicity, silence, bursts, role changes, and phase shifts are all learned implicitly as part of the representation.
From state transitions to behavioral meaning
This shift mirrors what happened in natural language processing:
- State-based IDS ≈ hand-crafted grammars and n-grams
- LogLM ≈ learned embeddings and attention
Where older systems reasoned over state transitions, LogLM reasons over behavioral similarity:
- Does this sequence look like known command-and-control behavior?
- Is this relationship drifting toward known malicious campaigns?
- Which prior behaviors, even distant ones, are relevant now?
These questions simply cannot be answered under a Markov assumption.
The key difference, stated plainly
Markov IDS systems assume behavior is memoryless beyond the last step.
LogLM learns how much memory behavior actually requires.
Sometimes behavior is close to Markovian and the model learns that.
Often, especially with modern attackers, it is not and the model adapts.
That flexibility is the real breakthrough.
A respectful conclusion
State-based and Markov IDS systems were not wrong. They represented the best possible abstraction given the data, compute, and learning theory available at the time.
LogLM and transformer-based behavioral models don’t reject that lineage, they complete it.
They remove the need to define behavior in advance, and instead let behavior reveal itself.
That’s the difference between modeling states and learning behavior.