OpenAI’s latest experimental model is a math whiz, performing so well on an insanely difficult math exam that everyone’s now talking about it.
“I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition — the International Math Olympiad (IMO),” Alexander Wei, a member of OpenAI’s technical staff, said on X.
The International Math Olympiad is a global competition that began in 1959 in Romania and is now considered one of the hardest in the world. It’s divided into two days, during which participants are given a four-and-a-half-hour exam, each with three questions. Some famous winners include Grigori Perelman, who helped advance geometry, and Terence Tao, recipient of the Fields Medal, the highest honor in mathematics.
In June, Tao predicted on Lex Fridman’s podcast that AI would not score high on the IMO. He suggested researchers shoot a bit lower. “There are smaller competitions. There are competitions where the answer is a number rather than a long-form proof,” he said.
Yet OpenAI’s latest model solved five out of six of the problems correctly, working under the same testing conditions as humans, Wei said.
Wei’s colleague, Noam Brown, said the model displayed a new level of endurance during the exam.
“IMO problems demand a new level of sustained creative thinking compared to past benchmarks,” he said. “This model thinks for a long time.”
Wei said the model is an upgrade in general intelligence. The model’s performance is “breaking new ground in general-purpose reinforcement learning,” he said. DeepMind’s AlphaGeometry, by contrast, is specifically designed just to do math.
“This is an LLM doing math and not a specific formal math system; it is part of our main push towards general intelligence,” Altman said on X.
“When we first started openai, this was a dream but not one that felt very realistic to us; it is a significant marker of how far AI has come over the past decade,” Altman wrote, referring to the model’s performance at IOM.
Altman added that a model with a “gold level of capability” will not be available to the public for “many months.”
The achievement is an example of how fast the technology is developing. Just last year, “AI labs were using grade school math” to evaluate models, Brown said. And tech billionaire Peter Thiel said last year it would take at least another three years before AI could solve US Math Olympiad problems.
Still, there are always skeptics.
Gary Marcus, a well-known critic of AI hype, called the model’s performance “genuinely impressive” on X. But he also posed several questions about how the model was trained, the scope of its “general intelligence,” the utility for the general population, and the cost per problem. Marcus also said that the IMO has not independently verified these results.
Read the full article here