More than 900 students at UC San Diego needed catch-up math classes in the fall of 2025 compared to 32 five years earlier.
Abstract: Large language models (LLMs) can achieve superior results through iterative refinement based on internal or external signals, compared to the unstable outputs from a single pass. However, ...
A marriage of formal methods and LLMs seeks to harness the strengths of both.
We propose a two-dual MathForge framework to improve mathematical reasoning by targeting harder questions from both perspectives, which comprises a Difficulty-Aware Group Policy Optimization (DGPO) ...
This repo contains the resources for the paper "From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning." In this work, we take mathematical reasoning as a ...