Be part of our each day and weekly newsletters for the most recent updates and unique content material on industry-leading AI protection. Be taught Extra
OpenAI is now displaying extra particulars of the reasoning technique of o3-mini, its newest reasoning mannequin. The change was introduced on OpenAI’s X account and comes because the AI lab is beneath elevated strain by DeepSeek-R1, a rival open mannequin that absolutely shows its reasoning tokens.
![](https://venturebeat.com/wp-content/uploads/2025/02/image_ce3f8f.png?w=477)
Fashions like o3 and R1 endure a prolonged “chain of thought” (CoT) course of by which they generate additional tokens to interrupt down the issue, purpose about and take a look at totally different solutions and attain a ultimate resolution. Beforehand, OpenAI’s reasoning fashions hid their chain of thought and solely produced a high-level overview of reasoning steps. This made it troublesome for customers and builders to know the mannequin’s reasoning logic and alter their directions and prompts to steer it in the fitting route.
OpenAI thought-about chain of thought a aggressive benefit and hid it to forestall rivals from copying to coach their fashions. However with R1 and different open fashions displaying their full reasoning hint, the shortage of transparency turns into a drawback for OpenAI.
The brand new model of o3-mini reveals a extra detailed model of CoT. Though we nonetheless don’t see the uncooked tokens, it offers far more readability on the reasoning course of.
![](https://venturebeat.com/wp-content/uploads/2025/02/image_264b8a.png?w=418)
Why it issues for functions
In our earlier experiments on o1 and R1, we discovered that o1 was barely higher at fixing information evaluation and reasoning issues. Nevertheless, one of many key limitations was that there was no means to determine why the mannequin made errors — and it usually made errors when confronted with messy real-world information obtained from the net. Then again, R1’s chain of thought enabled us to troubleshoot the issues and alter our prompts to enhance reasoning.
For instance, in certainly one of our experiments, each fashions failed to offer the proper reply. However due to R1’s detailed chain of thought, we have been capable of finding out that the issue was not with the mannequin itself however with the retrieval stage that gathered data from the net. In different experiments, R1’s chain of thought was in a position to present us with hints when it did not parse the knowledge we offered it, whereas o1 solely gave us a really tough overview of the way it was formulating its response.
We examined the brand new o3-mini mannequin on a variant of a earlier experiment we ran with o1. We offered the mannequin with a textual content file containing costs of varied shares from January 2024 by January 2025. The file was noisy and unformatted, a combination of plain textual content and HTML parts. We then requested the mannequin to calculate the worth of a portfolio that invested $140 within the Magnificent 7 shares on the primary day of every month from January 2024 to January 2025, distributed evenly throughout all shares (we used the time period “Magazine 7” within the immediate to make it a bit more difficult).
o3-mini’s CoT was actually useful this time. First, the mannequin reasoned about what the Magazine 7 was, filtered the info to solely preserve the related shares (to make the issue difficult, we added a couple of non–Magazine 7 shares to the info), calculated the month-to-month quantity to put money into every inventory, and made the ultimate calculations to offer the proper reply (the portfolio could be price round $2,200 on the newest time registered within the information we offered to the mannequin).
![](https://venturebeat.com/wp-content/uploads/2025/02/image_133321.png?w=800)
It should take much more testing to see the boundaries of the brand new chain of thought, since OpenAI continues to be hiding lots of particulars. However in our vibe checks, evidently the brand new format is far more helpful.
What it means for OpenAI
When DeepSeek-R1 was launched, it had three clear benefits over OpenAI’s reasoning fashions: It was open, low cost and clear.
Since then, OpenAI has managed to shorten the hole. Whereas o1 prices $60 per million output tokens, o3-mini prices simply $4.40, whereas outperforming o1 on many reasoning benchmarks. R1 prices round $7 and $8 per million tokens on U.S. suppliers. (DeepSeek gives R1 at $2.19 per million tokens by itself servers, however many organizations will be unable to make use of it as a result of it’s hosted in China.)
With the brand new change to the CoT output, OpenAI has managed to considerably work across the transparency downside.
It stays to be seen what OpenAI will do about open sourcing its fashions. Since its launch, R1 has already been tailored, forked and hosted by many various labs and corporations probably making it the popular reasoning mannequin for enterprises. OpenAI CEO Sam Altman not too long ago admitted that he was “on the improper facet of historical past” in open supply debate. We’ll must see how this realization will present itself in OpenAI’s future releases.