Reasoning Models
Reasoning models were the main actors in the second half of 2024: OpenAI GPT-o1, o1 mini and o3, Google Gemini 2, Anthropic Claude 3.7, and DeepSeek R2, to name a few. Rather than making larger and larger base models, like the path from GPT-1 to GPT-3.5, AI companies are increasingly focusing on better reasoning capability through reinforcement learning on Chain-of-Thought (CoT) datasets. CoT is not a new concept. It has been widely used in AI application development as a Prompt Engineering technique since the inception of LLMs.
This is a reasonable segue to better AI applications for AI companies. Adoption has been hard, and not everyone is an expert in prompting and fine-turning. Equipped with better reasoning capabilities, LLMs are more useful out-of-the-box for the general public and downstream applications.
However, packaging too many โcapabilitiesโ upstream has its consequences. CoT reasoning at the service level and integrated tool use (e.g., web search) make the model appear slower, as completing a list of sequential tasks will inevitably take longer. It also uses more tokens (OpenAI hides the reasoning steps from users but changes the token usage in reasoning steps), resulting in a more expensive, slower service for everyone.
Although DeepSeek R1 demonstrated that CoT can be part of training datasets, I believe (from the information released) that many other closed reasoning models use extra processing layers. The extra reasoning power is not built into the text-predicting mechanism of vanilla LLMs.
Some argue that reasoning is the last step of human intelligence, and this may be the start of LLMs building so-called world views. It is a philosophical argument, and I do not dispute.
However, I could not help but feel that this new direction is partly because AI companies are under pressure to (understandably) continuously release โbetterโ models. They are drifting away from the difficult task of creating fundamentally advanced foundation models. Sam Atman is right that AI adoption might not be transformative until AGI is achieved.
Besides, apart from complex tasks such as scientific research, factor checking and maths, not many tasks require tremendous reasoning capability out-of-the-box, and none can justify the eye-watering costs. What they need is a way to monetise their proprietary data. If a firm does not have a robust data pipeline and data strategy, they are probably not ready to truly harness the power of reasoning models to create value for themselves or their customers.
Even when high reasoning ability is crucial to a use case, other techniques, such as prompt engineering or an agentic workflow, can achieve the same with more tailored design and better performance at far lower costs. Besides, with the inherent shortcomings of the current LLMsโ inability to understand the real world with facts, reliable reasoning is a challenging endeavour. Instead of adding complex layers on top of existing LLMs, creating new architectures that can give future AI models world views might be far better for the future of AI.