Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making Paper • 2602.06570 • Published 13 days ago • 59
RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents Paper • 2602.02486 • Published 17 days ago • 18
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking Paper • 2601.06487 • Published Jan 10 • 52
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem Paper • 2512.24873 • Published Dec 31, 2025 • 105
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models Paper • 2512.24618 • Published Dec 31, 2025 • 151
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling Paper • 2511.11793 • Published Nov 14, 2025 • 187
Scaling LLM Multi-turn RL with End-to-end Summarization-based Context Management Paper • 2510.06727 • Published Oct 8, 2025 • 5
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds Paper • 2511.08892 • Published Nov 12, 2025 • 209
Reinforcement Learning Foundations for Deep Research Systems: A Survey Paper • 2509.06733 • Published Sep 8, 2025 • 32
DeepAgent: A General Reasoning Agent with Scalable Toolsets Paper • 2510.21618 • Published Oct 24, 2025 • 101
DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search Paper • 2510.12801 • Published Oct 14, 2025 • 13
Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels Paper • 2510.06499 • Published Oct 7, 2025 • 33
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents Paper • 2509.06501 • Published Sep 8, 2025 • 82
Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL Paper • 2508.07976 • Published Aug 11, 2025 • 52
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search Paper • 2509.25454 • Published Sep 29, 2025 • 146