What is a LogLM?

A LogLM is a Log Language Model, a foundation model trained on operational telemetry (logs, flow records, identity events, cloud activity) instead of natural language. It learns the behavioral grammar of an environment the way a large language model learns the grammar of human text. DeepTempo's LogLM is the core technology behind its prediction and detection layer. This page is the canonical reference.

The short answer

Foundation models learn rich, transferable representations of a domain by training on large volumes of data from that domain. ChatGPT learns from human text. A vision foundation model learns from images. A LogLM learns from logs.

Because it learns the structure and meaning of operational telemetry, not just statistical norms, a LogLM can recognize patterns of attacker behavior across activity it has never seen before. That makes it qualitatively different from rules, signatures, baselines, or anomaly detectors.

Why language models work for logs

Logs share three properties with natural language that make them well-suited to language-model architectures.

Sequence. Logs occur in order, and order carries meaning. A login followed by a privilege escalation followed by an outbound connection is a story; the same three events in a different sequence may be benign.

Context. Each log line means different things depending on what surrounds it. A 443 connection from a workstation to a SaaS endpoint is mundane; the same connection from a domain controller after a credential dump is not.

Sparseness of meaningful events. Most logs are uneventful. Like the meaningful sentences in a book, the meaningful sequences in a telemetry stream are rare and depend on context to be recognized.

Transformer architectures are very good at learning sequence and context from sparse signal. Applying them to logs is the core insight behind LogLM.

How a LogLM differs from a general-purpose LLM

A general-purpose LLM is trained on human text. Its vocabulary is words and tokens from natural language. Its output is generated text. Its optimal use is reasoning, generation, and summarization. It is large, expensive, and limited in accuracy on security telemetry because logs are out-of-distribution for it.

A LogLM is trained on operational telemetry. Its vocabulary preserves the structure of fields, IPs, ports, and identifiers. Its output is embeddings consumed by classifiers and mapped to MITRE ATT&CK. Its optimal use is detection and classification in security telemetry. It is significantly smaller, faster, cheaper, and more accurate within the domain it was built for.

Retrofitting a general-purpose LLM to security telemetry is the equivalent of asking a generalist who has read every book to suddenly debug a network. They can guess. They do not speak the language.

What it took to build LogLM

Building a foundation model for operational telemetry was not a quick exercise. The work spanned several specific engineering investments worth describing.

Training corpus curation. Most of the engineering effort went into collecting, cleaning, and aligning a substantial corpus of operational telemetry across cloud, data center, and OT environments. Generic log datasets do not carry the diversity needed to learn the difference between legitimate administration and attacker behavior. The corpus had to include both, with enough volume in each category to learn distinguishing structure.

Domain-specific tokenization. Natural-language tokenizers shred IP addresses, GUIDs, command-line arguments, and file paths into nonsense subwords. We built a tokenization scheme that preserves the structure of operational telemetry so the model can learn meaningful tokens rather than fragments of identifiers.

Self-supervised pretraining at scale. The base LogLM was trained with self-supervised objectives appropriate for sequence data. This step alone consumed significant compute and required careful ablation work to land on training objectives that produced stable embeddings for downstream classification.

Classifier head training against MITRE ATT&CK. On top of the foundation model, we trained classifiers on labeled examples of activity mapped to MITRE techniques. This labeling work is ongoing and is where domain expertise compounds. Each round of labels improves the classifier and surfaces edge cases that inform the next round of training.

Continuous evaluation against held-out data. False-positive rate above 5 percent is not viable in a SOC. Getting under that threshold required dozens of evaluation rounds against held-out telemetry, with explicit attention to high-noise categories like routine administration, automated CI/CD activity, and security tooling itself.

None of this is a wrap-around-an-LLM exercise. A vertical foundation model takes domain data, domain expertise, and substantial engineering work to build.

How LogLM detects an attack in practice

Consider C2 beaconing. A traditional NDR looks for known beacon signatures or unusual periodicity. An attacker who beacons every 47 seconds, jitters the interval, and uses HTTPS to a CDN evades both checks.

A LogLM does not look at any single property of the beacon. It produces an embedding of the flow in the context of every other flow that workstation has produced over the last week, including DNS lookups, certificate fingerprints, byte-pattern asymmetry, and timing relative to user activity. Even when no individual property is abnormal, the combination sits in a region of embedding space the classifier has learned corresponds to T1071.001. DeepTempo surfaces it as a Command and Control detection.

This is intent-level detection. The system is not asking whether a packet is bad. It is asking what an actor is trying to accomplish across an entire pattern of activity.

How DeepTempo applies this

DeepTempo's prediction and detection layer is the LogLM in production. The platform deploys against operational telemetry your environment already produces, runs the LogLM and its classifiers in real time, and outputs MITRE-mapped detections that flow into your existing SOC workflow. The accuracy figures published on the Platform page (99 percent on common TTPs, 85 percent zero-shot improving to 94 percent after adaptation, sub-five-percent false positives, sub-second latency) reflect the engineering tracks described above.

Table of contents

Sample H2