Math Modeling and Reasoning Examples

OpenAI o1 Model Sets New Math and Complex Reasoning Records

OpenAI o1 is a new large language model trained with reinforcement learning to perform complex reasoning. o1 thinks before it answers—it can produce a long internal chain of thought before responding ...

Communications of the ACM

Formal Reasoning Meets LLMs: Toward AI for Mathematics and Verification

A marriage of formal methods and LLMs seeks to harness the strengths of both.

Hosted on MSN

Microsoft's Phi-4-Reasoning Models Bring AI Math and Logic Skills to Smaller Devices

Microsoft has introduced a new set of small language models called Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning, which are described as "marking a new era for efficient AI." These ...

GIGAZINE

DeepSeek releases AI model 'DeepSeek-Math-V2' specialized for mathematical reasoning, achieving a gold medal-level accuracy rate at the International Mathematical Olympiad

DeepSeek released DeepSeek-Math-V2, an AI model specialized for mathematical reasoning, on November 27, 2025. DeepSeek-Math-V2 focuses on theorem proving and self-verification capabilities, and ...

Hosted on MSN

DeepSeek debuts lighter R1 AI model with better math reasoning

While DeepSeek's powerful new AI model, R1, has been grabbing headlines, the Chinese AI lab also quietly released a lighter, more efficient version of it. This smaller model, called ...

中国日报网

DeepSeek AI mathematical reasoning model pioneering self-verifying reasoning

HANGZHOU -- Chinese AI firm DeepSeek has launched DeepSeekMath-V2, a groundbreaking mathematical reasoning model that sets new performance benchmarks and pushes the frontiers of AI-powered ...

Business Insider

This DeepSeek demo shows how good the Chinese AI model is at math and reasoning

You're currently following this author! Want to unfollow? Unsubscribe via the link in your email. Follow Alistair Barr Every time Alistair publishes a story, you’ll get an alert straight to your inbox ...

EurekAlert!

MathEval: a comprehensive benchmark for evaluating large language models on mathematical reasoning capabilities

This study introduces MathEval, a comprehensive benchmarking framework designed to systematically evaluate the mathematical reasoning capabilities of large language models (LLMs). Addressing key ...

2don MSN

Scientists Found AI’s Fatal Flaw—The Most Advanced Models Are Failing Basic Logic Tests

Identifying vulnerabilities is good for public safety, industry, and the scientists making these models.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results