OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification Paper • 2512.10756 • Published 25 days ago • 34
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning Paper • 2512.05111 • Published Dec 4, 2025 • 47
view article Article OpenEvolve: An Open Source Implementation of Google DeepMind's AlphaEvolve May 20, 2025 • 55
EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning Paper • 2510.17928 • Published Oct 20, 2025 • 2
EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning Paper • 2510.17928 • Published Oct 20, 2025 • 2 • 2
Confidence as a Reward: Transforming LLMs into Reward Models Paper • 2510.13501 • Published Oct 15, 2025 • 1
DevBench: A Comprehensive Benchmark for Software Development Paper • 2403.08604 • Published Mar 13, 2024 • 2
SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution Paper • 2501.05040 • Published Jan 9, 2025 • 15
Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation Paper • 2502.06563 • Published Feb 10, 2025
Confidence as a Reward: Transforming LLMs into Reward Models Paper • 2510.13501 • Published Oct 15, 2025 • 1