Reasoning Models

Reasoning models were the stars of the show in the Gen AI world in the second half of 2024. OpenAI GPT-o1, o1 mini and o3, Google Gemini 2, Anthropic Claude 3.7, and DeepSeek R2, to name a few.

Rather than making larger and larger base models, like the trajectory from GPT-1 to GPT-3.5, AI companies are increasingly focusing on better reasoning capability through reinforcement learning on Chain-of-Thought (CoT) datasets. For anyone who is not new to Gen AI, CoT is not a new concept. It has been widely used in AI application development as a Prompt Engineering technique since the inception of LLMs.

This is a reasonable segue to better AI applications. Adoption has been hard; not everyone is an expert in prompting and fine-tuning LLMs. When equipped with better reasoning capabilities, LLMs are more useful out-of-the-box for the general public and downstream wrapper applications.

However, packaging too many ‘capabilities’ upstream has its consequences. CoT reasoning at the service level and integrated tool use (e.g., web search) make the model appear slower, as completing a list of sequential tasks inevitably takes longer. It also uses more tokens (OpenAI hides the reasoning steps from users but changes the token usage for reasoning steps), resulting in a more expensive, slower service for everyone.

Although DeepSeek R1 demonstrated that CoT can be part of training datasets, I believe (from the information released) that many other closed-reasoning models use extra processing layers. The extra reasoning power is not built into the text-predicting mechanism of vanilla LLMs.

Some argue that reasoning is the last step of human intelligence, and this may be the start of LLMs building so-called world views. It is a philosophical argument, and I do not dispute it.

However, I could not help but feel that this new direction is partly because AI companies are under pressure to (understandably) continuously release ‘better’ models. They are drifting away from the difficult task of creating fundamentally advanced foundation models. Sam Atman is right that AI adoption might not be transformative until AGI is achieved.

Besides, apart from complex tasks such as scientific research, factor checking and maths, not many tasks require tremendous reasoning capability out-of-the-box, and none can justify the eye-watering costs. What they need is a way to monetise their proprietary data. Suppose a firm does not have a robust data pipeline and data strategy. In that case, they are probably not ready to harness the power of reasoning models and to create value for themselves or their customers.

Even when high reasoning ability is crucial to a use case, other techniques, such as prompt engineering or an agentic workflow, can achieve the same with more tailored control and better performance at far lower costs. Besides, with the inherent shortcomings of the current LLMs’ inability to understand the real world with facts, reliable reasoning is very challenging. Instead of adding complex layers on top of existing LLMs, creating new architectures that can give future AI models world views might be far better for the future of AI.

Liwen's Notes

Explorer

Recent Notes

How to Read Effectively

LangChain

Mixture of Expert (MoE)

The `NextRequest` and `NextResponse` Objects

LLM Explainability & Interpretability

Accountability & Motivation