A new benchmark pitting AI against previously unseen maths problems shows systems still fall short of top human expertise.
Last week, OpenAI shocked the mathematical community by revealing that one of its internal artificial intelligence (AI) ...
Top artificial intelligence systems now ace many textbook-style math questions, yet they still fall apart on genuinely new problems. The gap between polished performance on familiar benchmarks and ...
You're currently following this author! Want to unfollow? Unsubscribe via the link in your email. Chinese AI lab DeepSeek recently released AI models that match or exceed some of Silicon Valley's top ...
A Google DeepMind researcher and OpenAI’s former CTO are posing questions about the validity of OpenAI’s claim about its gold-medal score. OpenAI’s latest model has achieved a gold-level score at the ...
DeepSeek made waves in early 2025, launching one of the world's first free-to-access thinking models. Now, the Chinese firm has just released DeepSeekMath-V2 with the objective of achieving ...
Microsoft has introduced a new set of small language models called Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning, which are described as "marking a new era for efficient AI." These ...
Mathematician Will Sawin discusses his experience reviewing and refining a mathematical proof devised by OpenAI's internal model—and what that could mean for mathematics. Reading time 10 minutes Will ...
Over the weekend, Neel Somani, who is a software engineer, former quant researcher, and a startup founder, was testing the math skills of OpenAI’s new model when he made an unexpected discovery. After ...