If you’ve been following the rise of Large Language Models (LLMs), you might have noticed how “powerful” these models are in terms of giving answers to all our questions. Though this looks and sounds very impressive, there are some truths to uncover, given the fact that a lot of enterprises and businesses are encountering a lack of usefulness in those models for their specific context.

1. Genesis of “Attention.”

This whole idea of “ChatGPT” (mentioned it to help non-technical readers to have an idea) was introduced in the 2017 research paper Attention Is All You Need by Vaswani et al. (Vaswani et al., 2017), where the authors proposed the “Transformer architecture”, an architecture built entirely around “attention.”

This mechanism allows the model to look at all words in a sentence at once and dynamically decide which ones matter most for the meaning it’s “trying to understand”. It’s essentially a large architecture of sophisticated AI models capable of capturing the context of long-form content (primarily text, in its original context) — something previous architectures struggled with (Bahdanau et al., 2014; Cho et al., 2014).

This breakthrough laid the foundation for today’s LLMs, as it scaled remarkably well with data and compute. The Transformer’s attention layers make it possible for models like GPT, Llama, Claude, Gemini, and others to learn rich patterns of language, reasoning, and even multi-modal information (Brown et al., 2020). More importantly, the architecture aligned perfectly with the needs of industry: faster training, higher accuracy, and the flexibility to adapt to countless business use cases. In many ways, the paper didn’t just introduce a new model, but it set the stage for the entire AI product ecosystem we know today (Bommasani et al., 2021).

2. Exaggerated Q&A Optimistic Algorithms

In simpler and practical terms, since those (large) models were originally trained on large amounts of data about almost everything found on the internet, you might have noticed how optimistic those models, through chatbots like ChatGPT, are when asked certain (if not most of the) questions; they always look like “they know everything” ans always try to justify their arguments with examples, wordings, illustrations.

Why?

Because that's how they are wired and what they are supposed to do, given the system instructions provided in the backend of the app/chatbot.

The goal was to build a large virtual brain fed with large amounts of information to simulate an “Artificial General Intelligence” (AGI, sigh 😮‍💨) capable of answering all the problems presented by users.

But the problem is, though a simulated general intelligence, using its training data, can coherently (ish) generate an output as we are seeing with all these chatbots, we will notice that, when hit with a question that requires more specific “unseen” data, it will always pretend to know, and therefore sometimes ends up lying or faking the response. Hence, the limits.

3. Limits and Uselessness

LLMs are technically limited, not necessarily in terms of the quantity of data, but in terms of the specificity of data — ironically, we thought it would be by feeding AI with large amounts of data that specific small problems would be easy to solve, but that’s the case…

It’s made of large data, but can’t solve some data-specific problems
It has a huge brain, but misses the context of a complex, deep, data-specific problem

This is a common problem we noticed in some enterprise applications. Since enterprise data is always specific, private, hidden from the public, and accessible only to the company, AI companies that train those models can’t access that data; hence, the disability factor.

An LLM is a recorder that can generate coherent outputs to make it look “smart and useful”; it’s almost like a mere and fake simulation of how the human brain works. Hence, theoretically, it can generate and not “create”.

4. Data-specific LLM Transfer Learning

In short, LLMs are not spiritually dead as a system; it's a potential master brain that, at least, can learn, adapt, and help in a domain-specific scenario. That's where we have the concept of transfer learning.

In Machine Learning generally, transfer learning means reusing a pre-trained model (trained on large, generic data) as a starting point for a different but related and specific task or domain. In this case, it can be very beneficial and advantageous for a business to make use of generic LLMs as a foundation for their AI solutions, because they will come with an intellectual function capable of adapting, learning, and progressively providing enterprise-focused solutions.

LLM_Enterprise_Adaptation = Generic_LLM + (Domain_Specific_Data × Adaptation) + Continuous_Update

5. Final something

Though by reading this article, you might think I’m being too pessimistic about LLMs, but let me assure you that i might be one of the most excited users of this technology (from a user experience standpoint)… and the truth is, as an Engineer and a daily user of these solutions, I’m just being theoretically foundational as we are all working with/on these technologies. See the following articles, as I will dive deeper into "Transfer Learning" when time is provided.

References

Vaswani, A., et al. (2017). Attention Is All You Need.
Bahdanau, D., Cho, K., Bengio, Y. (2014). Neural Machine Translation by Jointly Learning to Align and Translate.
Cho, K., et al. (2014). Learning Phrase Representations Using RNN Encoder–Decoder for Statistical Machine Translation.
Brown, T. B., et al. (2020). Language Models are Few-Shot Learners (GPT-3).
Bommasani, R., et al. (2021). On the Opportunities and Risks of Foundation Models.

📝philonote

An LLM without Context is a disability