When should each approach be adopted?
Ending 2025 we had a number of predictions of where AI is headed. Many of these projected that something called a “context layer” would be crucial for the successful use of general purpose agents for specific use cases by enterprises. A few projections argue for the rise of vertical foundation models, which are purpose built for a domain.
So which is it? Welp - it depends. In this blog I’ll attempt to give practitioners and investors a guide towards which pattern to choose.
If you are interested in the arguments for each side of the discussion, here are a few explanations of why the context layer is crucial:
- Ed Sim at Boldstart Ventures
- Aaron Levie of Box fame
- Foundation Capital’s write up on context graphs
- And there are many more….
Ashu Garg from Foundation Capital both signs onto his context graph thesis AND suggests that in some cases vertical foundation models will win. His blog is worth a read and an inspiration for this more focused discussion of when which pattern will win.
Stanford University’s human centric AI institute published a set of predictions from their researchers. Amongst the nuggets in this thoughtful blog is the following sentence: “self-supervised learning from somewhat smaller datasets has shown promise in radiology, pathology, ophthalmology, dermatology, oncology, cardiology, and many other areas of biomedicine.” Definitely worth a read.
ARM’s excellent blog - always worth keeping an eye on what the makers of compute see on the horizon - touches both on the importance of context AND on the rise of vertical models. Incidentally they also touch upon the use of purpose built models near the edge which sounds a lot like a telco deployment we have going right now, but I digress….
So which is it?
Where context engineering reaches its limits
Aaron Levie identifies several open challenges for enterprise context engineering: choosing between narrow and general agents, getting data into agent-ready systems, accessing the right data for each task, and balancing deterministic and non-deterministic behavior. These are real problems.
Cybersecurity adds a layer of difficulty that makes even sophisticated context architectures insufficient.
Imagine the context engineering challenge. An attack might represent something like 0.0001% of total events. To provide enough context for a general purpose model to distinguish signal from noise you might need an ongoing feed of hundreds of billions of raw events, to sift the haystack again, and again, and again second by second.
And - here’s the kicker - even if you figured out how to cram the entire haystack again and again into the context window, it still wouldn’t work. Your needle would remain unseen.
The reason is that today’s foundation models that power agentics lack the compressed understanding needed.
Generalization uses compressed understanding
Models that generalize to novel situations have compressed enormous amounts of training data into internal representations that capture the underlying structure of the problem space.
A general-purpose LLM has learned linguistic structure, factual knowledge, and reasoning patterns from trillions of tokens of human text. That's why it can write coherent prose about topics it's never explicitly seen. It has internalized how language works.
Cybersecurity requires the same principle applied to a different domain: the model must internalize how IT environments work, how users behave, and how attacks unfold. This cannot happen through context injection for the same reason you cannot make a model fluent in a new language by putting a dictionary in its prompt.
This is not a criticism of context engineering. It's a recognition that context engineering and world modeling solve different problems, and some problems require the latter.
Lessons from production systems at scale
Let’s take a quick excursion into the real world.
Netflix's recommendation system doesn't work by loading your viewing history into a context window and asking a general-purpose model what you should watch next. It operates using learned embeddings, meaning compressed representations of both content and user preferences, trained on hundreds of billions of interactions. These embeddings encode deep structural knowledge: the relationship between genres, the evolution of taste over time, the subtle signals that distinguish genuine interest from passive viewing. This understanding cannot be injected as context. It must be learned and pretrained from massive observational data.
Stripe's fraud detection system similarly relies on models pretrained on transaction patterns, merchant behaviors, and fraud signals across their entire network. When evaluating a transaction in milliseconds, Stripe isn't constructing an elaborate prompt with context about payment history and merchant risk factors. The model has already internalized what normal transaction flows look like, what deviations matter, and how fraud patterns evolve across geography, time, and payment methods.
Both systems operate in domains with similar characteristics to cybersecurity. These may be useful criteria for practitioners to consider when deciding which approach to apply to which domain of problem. The three cases of cybersecurity, payments fraud, and Netflix recommendations share at least the following characteristics, requiring the use of embedded context or focused world models:
- extreme class imbalance - i.e. fraud and attacks are exceedingly rare
- adversarial dynamics - i.e. bad actors actively evade detection
- non-local correlations are crucial - for example useful patterns span users, systems, and time
- Plus of course real-time requirements - milliseconds or seconds to decide whether to allow this transaction, or make this recommendation, or respond to this potential attack with immediate isolation or otherwise
- AND - truly massive scale, including billions of events that need to be continuously evaluated; hundreds of billions of events an hour are not uncommon amongst our design partners
In each case, the architecture that works is to embed meaning that compresses domain structure into the model's weights. The model must bring its own context, it cannot rely upon a context layer to continuously reassess context.
A LogLM is an embedded context or mini world model
This is why at DeepTempo we built LogLM as a world model, not a context layer.
LogLM is trained on massive volumes of log data across diverse environments, learning to predict what should happen next given everything that has already occurred. Through this process, it compresses weeks or months of activity into dense internal representations that already reflect typical baseline behavior for users, systems, and workflows; rare but benign anomalies versus rare and concerning ones; cross-system dependencies and cascading effects; temporal patterns that span hours, days, or weeks.
By the time our decision layer is evaluating a potential incident, the world is already understood. The model reasons from learned structure across many environments; the model brings its own context.
When context layers are necessary but insufficient
The context layer architecture is powerful for many enterprise use cases. Customer support benefits from recent tickets, product documentation, and account history. Sales intelligence improves with CRM data, communication history, and deal stage. Business analytics gains from dashboards, reports, and trend data.
These applications benefit from context layers because the relevant context is local in time and scope, the reasoning required is primarily synthesis and summarization, and the foundation model already has sufficient world knowledge to interpret the facts.
Cybersecurity fails all three criteria. Relevant context is non-local, spread across systems, users, and time. The reasoning required is pattern recognition in high-dimensional, adversarial space.
The same is true for recommendations, fraud detection, and other domains where decisions must be made continuously, at scale, in highly dynamic environments.
Horses and courses
Hopefully this blog helped to illustrate what different architectures can and cannot do.
Context layers excel at augmenting reasoning with facts. They make foundation reasoning models more useful by giving them access to proprietary data and recent events. For many enterprise applications, this is the correct architecture.
World or embedded context models excel at domains where understanding cannot be retrieved, only learned. They make AI systems capable of reasoning in complex, dynamic environments where the structure itself must be discovered from data.
For cybersecurity, this isn't a minor implementation detail. It's the difference between a system that can only answer questions about yesterday’s threats and one that can detect novel attacks as they unfold.
The context window is not the world. Sometimes, the model must already contain the world within it.
DeepTempo's LogLM is purpose-built to understand IT environments at scale. If you're interested in how world models enable real-time threat detection and superior isolation and investigation, please register for a demo or for a free assessment of your environment.