OpenAI saved its biggest announcement for the last day of its 12-day "shipmas" event.
On Friday, the company unveiled o3, the successor to the o1 "reasoning" model it released earlier in the year. o3 is a model family, to be more precise -- as was the case with o1. There's o3 and o3-mini, a smaller, distilled model fine-tuned for particular tasks.
Why call the new model o3, not o2? Well, trademarks may be to blame. According to The Information, OpenAI skipped o2 to avoid a potential conflict with British telecom provider O2. Strange world we live in, isn't it?
Neither o3 nor o3-mini are widely available yet, but safety researchers can sign up for a preview starting later today. The o3 family may not be generally available for some time -- at least, if OpenAI CEO Sam Altman sticks to his word. In a recent interview, Altman said that, before OpenAI releases new reasoning models, he'd prefer a federal testing framework to guide monitoring and mitigating the risks of such models.
And there are risks. AI safety testers have found that o1's reasoning abilities make it try to deceive human users at a higher rate than conventional, "non-reasoning" models -- or, for that matter, leading AI models from Meta, Anthropic, and Google. It's possible that o3 attempts to deceive at an even higher rate than its predecessor; we'll find out once OpenAI's red-teaming partners release their test results.
Unlike most AI, reasoning models such as o3 effectively fact-check themselves, which helps them to avoid some of the pitfalls that normally trip up models.
This fact-checking process incurs some latency. o3, like o1 before it, takes a little longer -- usually seconds to minutes longer -- to arrive at solutions compared to a typical non-reasoning model. The upside? It tends to be more reliable in domains such as physics, science, and mathematics.
o3 was trained to "think" before responding via what OpenAI calls a "private chain of thought." The model can can reason through a task and plan ahead, performing a series of actions over an extended period that help it figure out a solution.
In practice, given a prompt, o3 pauses before responding, considering a number of related prompts and "explaining" its reasoning along the way. After a while, the model summarizes what it considers to be the most accurate response.
New for o3 is the ability to "adjust" the reasoning time. The models can be set to low, medium, or high thinking time.
One big question leading up to today was, might OpenAI claim that its newest models are approaching AGI? AGI, short for "artificial general intelligence," refers broadly speaking to AI that can perform any task a human can. OpenAI has its own definition: "highly autonomous systems that outperform humans at most economically valuable work."