On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral
-
SEGAgentRL/LLDS-A-GRPO-Qwen2.5-7B-Base
Reinforcement Learning • 8B • Updated • 56 • 2 -
SEGAgentRL/LLDS-A-GRPO-Qwen2.5-7B-Ins
Reinforcement Learning • 8B • Updated • 81 • 1 -
SEGAgentRL/LLDS-A-GRPO-Qwen2.5-3B-Base-MA
Reinforcement Learning • 3B • Updated • 28 -
SEGAgentRL/LLDS-R-GSPO-Qwen2.5-3B-Ins
Reinforcement Learning • 3B • Updated • 26