Foundation models for cybersecurity

Foundation models are reshaping how security teams detect threats, triage alerts, and write code. They are also being used in ways that do not work, and being avoided in ways that should be reconsidered. This page is the practical reference.

What is a foundation model?

A foundation model is a large neural network trained on a broad dataset that can be adapted to many downstream tasks without retraining from scratch. The term was coined by Stanford's HAI in 2021 to describe models like GPT-3, BERT, and CLIP.

Three properties define a foundation model: pretraining at scale on a large dataset using self-supervised objectives, transferable representations across many tasks, and adaptability via prompting, fine-tuning, or classifier heads.

LLMs are foundation models for natural language. Vision foundation models are foundation models for images. LogLMs are foundation models for operational telemetry: same architectural family, applied to a different domain.

The four ways foundation models are used in security

Analyst copilots

LLMs help analysts triage alerts, summarize incidents, write detection logic, and query data lakes in natural language. Microsoft Security Copilot, Splunk AI Assistant, and a half-dozen vendor-specific equivalents sit in this category. Where this works: scaling junior analysts, accelerating investigation, reducing alert fatigue. Where this struggles: hallucination on technical specifics, slow on bulk data, and importantly does not actually detect anything new.

Code analysis

Foundation models for code read source and binaries to find vulnerabilities, suggest fixes, or write exploits. Where this works: static analysis, code review, exploit-pattern recognition. Where this struggles: production scale, runtime context, novel vulnerability classes.

Phishing classification

Specialized text classifiers detect phishing, BEC, and adversarial content inside email security gateways. Where this works: high-volume email triage, zero-day phishing campaigns where linguistic structure betrays the attempt. Where this struggles: targeted spear phishing, multi-channel attacks, localization gaps.

Threat detection from telemetry

This is the newest and highest-leverage use case: foundation models trained on operational telemetry to detect attacks directly. Instead of helping a human read logs, the model reads logs and produces detections. DeepTempo's LogLM is the first production-scale example of this category.

Where general-purpose LLMs fail in security

There is a recurring temptation: We already have GPT or Claude, can we point it at our logs? You can. It usually does not work, for several specific reasons.

Logs are out-of-distribution. General-purpose LLMs are trained on human text. Logs are structured, repetitive, full of identifiers, and look like noise to a text-trained model.

Cost at scale. A petabyte-scale log volume run through a frontier LLM is economically infeasible. Cost-per-detection is one to three orders of magnitude too high.

Latency. Frontier LLMs measure latency in seconds. Production detection needs sub-second response across millions of events per minute.

Inconsistency. LLMs are non-deterministic. Detection systems need stable, classifiable signal, embeddings, not free-text guesses.

Lack of domain grounding. General-purpose models do not know what NetFlow records mean in context. They can guess. They will sometimes guess wrong with high confidence.

Vertical foundation models

A vertical foundation model is purpose-built for one domain. Vertical models trade breadth for depth. They do not know everything humans have written, but they know their domain better than any generalist can.

Examples across industries: BioGPT for biomedical literature, CodeLlama for code, BloombergGPT for finance, Prithvi for geospatial, LogLM for security telemetry.

Vertical foundation models tend to be significantly smaller, faster at inference, cheaper to run, more accurate within their domain, and easier to ground because domain experts can validate behavior against ground truth.

Why a foundation model for logs makes sense

Operational telemetry has the same three properties that make natural language a good fit for foundation models: sequence and context, sparse meaningful signal, and a domain-specific vocabulary that benefits from purpose-built tokenization.

The output of a LogLM is not generated text. It is embeddings. Lightweight classifiers on top of those embeddings map activity to MITRE ATT&CK techniques and produce detections in real time.

How DeepTempo applies this

DeepTempo built its detection product around a vertical foundation model rather than around a frontier LLM because the architecture matches the problem. The engineering investments behind LogLM (training corpus curation, custom tokenization, classifier head training against MITRE ATT&CK, evaluation discipline, adaptation pipeline) compound over time, which is what makes a vertical model a defensible product rather than a research artifact.

Table of contents

Sample H2