General Foundation Models and Their General Problems

Mayank Kumar

October 15, 2025

In my previous blog, I explored how the emergence of foundation models is revolutionizing vertical SaaS. These models enhance the intelligence of systems and transform user interactions. Industries are increasingly integrating domain-specific foundation models into their workflows. For instance, Stripe has developed an AI foundation model tailored for payments, Netflix for recommendation systems, and DeepTempo has built Tempo LogLM, which is being utilized in cybersecurity. Moreover, user interfaces are evolving towards language-based interactions, making software feel more like a collaborative partner than a mere tool.

As a builder of Tempo LogLM, a vertical foundation model for cybersecurity, I face this question in almost all of my external interactions: Why invest in training a domain-specific model when one could fine-tune a general-purpose model with domain-specific data? My fellow builders at Netflix and Stripe would have faced similar questions. I am writing this blog to address this specific question.

Understanding General-purpose Foundation Models

We have all interacted with ChatGPT, Perplexity, and Grok. These are leading examples of applications directly powered by general-purpose foundation models. Models like GPT-4, LLaMA, and Claude are trained on vast and diverse datasets encompassing a wide range of topics. This large-scale training enables them to perform various tasks, from drafting emails, generating images, to generating code. Now, the State-of-the-art models like OpenAI’s O1, DeepSeek-R1, and Gemini 2.5 Pro can reason through advanced reinforcement learning techniques. However, this versatility comes with limitations, especially when applied to specialized domains.

Limitations of General-purpose Models in Specialized Domains

Every domain comes with different types of edge cases. Different things can have different meanings as the domain changes. One of the popular examples is “MI” — for general people, it might be Michigan, or for someone in healthcare, it means Myocardial Infarction, which is commonly known as heart attack.

Hallucinations
Most of the general-purpose models are generative. These generate text by predicting the most likely next token, they can produce confident-sounding but incorrect or entirely fabricated information. This is especially problematic in high-stakes domains like healthcare, finance, or cybersecurity, where every wrong decision can lead to severe damages, including loss of life.
High Computational Costs
The general understanding of these models stems from two aspects: a large number of parameters (in trillions) and a large amount of training data (PetaBytes). Training these models costs hundreds of millions of dollars. In addition, inference also requires significant computational resources. Efficient inference is key to cost control, but even with optimizations like quantization and parallelism, serving these models at scale remains expensive and challenging. One of our POCs is testing our model’s ability to detect attacks in over 10 billion records per hour. A generic LLM would cost around $10 million an hour just for input, with no guarantee it’s as accurate as our Tempo LogLM.
Lack of Domain Expertise (Swiss Knife vs Chef’s Knife)
This brings in the discussion around Generalists and Specialists. General-purpose models are generalists; they can become great consultants, but you always need someone who deeply cares about the problem that you are solving, i.e., specialists. In critical fields like medicine, finance, and cybersecurity, this lack of deep domain expertise can lead to plausible-sounding but factually incorrect or even dangerous outcomes.
Limited Explainability
The decision-making process of large language models is largely opaque, making it difficult to trace how a specific answer was generated or verify if the reasoning was correct. This lack of transparency undermines the trust and becomes challenging in a highly regulated domain like cybersecurity. From my experience working with different users of our Tempo LogLM, I’ve noticed they’re especially excited when we mention that our Tempo LogLM enriches alerts with MITRE [ATT&CK and other] context.

The Case for Vertical Foundation Models

Financial transactions don’t need an understanding of movie recommendations, or movie recommendations don’t need an understanding of financial transactions. I know this is very counterintuitive to the widely discussed idea of Artificial General Intelligence, but it makes sense for businesses to build a foundation model, especially curated for their domain. That’s why Netflix built a foundation model for recommendations, Stripe for payments, and Tempo LogLM for cybersecurity.

While building our Tempo LogLM, I get the first-person view of general-purpose vs domain-specific datasets and challenges. I see industry joining the entourage of vertical foundation intelligence. Because it makes sense for the business and addresses everything that customers are looking for, including:

Highly Accurate: tailored training allows for capturing the intricacies of the domain-specific language and context.
Cost-effective: much smaller footprint, less computational power, lower operational costs.
Greater Trust: Models are still a black box, but now it is easier to connect the outputs to the inputs and bring in better domain-specific insights.
Fast adaptability: These models are easy to fine-tune and adapt quickly to your environment. Even without fine-tuning, we’re seeing impressive zero-shot learning results — accuracy in the 90s on similar domains.

As we navigate the rapidly evolving landscape of AI, it’s clear that general-purpose models like GPT-4 have brought remarkable capabilities. I often use them as brainstorming partners or pair programmers. Agents are enhancing the capabilities of these models. However, when it comes to deploying AI for specific tasks — like recommending movies or processing payments — the question arises: do we need a trillion-parameter model for that? The answer, increasingly, is no.

This sentiment is echoed by innovators at companies like Netflix, Stripe, and Bloomberg. I am seeing a clear shift towards vertical foundation models — AI systems tailored to specific domains. It is easier said than done. In my next blog, I’ll delve into a case study exploring how one of these companies built its domain-specific foundation model.

Follow along for more!

‍

General Foundation Models and Their General Problems

Understanding General-purpose Foundation Models

Limitations of General-purpose Models in Specialized Domains

The Case for Vertical Foundation Models

See the threats your tools can’t.