OpenAI o1 is a new large language model trained with reinforcement learning to perform complex reasoning. o1 thinks before it answers—it can produce a long internal chain of thought before responding ...
A marriage of formal methods and LLMs seeks to harness the strengths of both.
Microsoft has introduced a new set of small language models called Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning, which are described as "marking a new era for efficient AI." These ...
DeepSeek released DeepSeek-Math-V2, an AI model specialized for mathematical reasoning, on November 27, 2025. DeepSeek-Math-V2 focuses on theorem proving and self-verification capabilities, and ...
While DeepSeek's powerful new AI model, R1, has been grabbing headlines, the Chinese AI lab also quietly released a lighter, more efficient version of it. This smaller model, called ...
HANGZHOU -- Chinese AI firm DeepSeek has launched DeepSeekMath-V2, a groundbreaking mathematical reasoning model that sets new performance benchmarks and pushes the frontiers of AI-powered ...
You're currently following this author! Want to unfollow? Unsubscribe via the link in your email. Follow Alistair Barr Every time Alistair publishes a story, you’ll get an alert straight to your inbox ...
This study introduces MathEval, a comprehensive benchmarking framework designed to systematically evaluate the mathematical reasoning capabilities of large language models (LLMs). Addressing key ...
Identifying vulnerabilities is good for public safety, industry, and the scientists making these models.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results