Papers
arxiv:2512.16649

JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

Published on Dec 18
· Submitted by
Bingxiang He
on Dec 19
Authors:
,
,
,
,
,
,
,
,
,
,
,

Abstract

JustRL achieves state-of-the-art performance on reasoning models with minimal complexity, using single-stage training and fixed hyperparameters, outperforming sophisticated approaches in terms of compute and stability.

AI-generated summary

Recent advances in reinforcement learning for large language models have converged on increasing complexity: multi-stage training pipelines, dynamic hyperparameter schedules, and curriculum learning strategies. This raises a fundamental question: Is this complexity necessary? We present JustRL, a minimal approach using single-stage training with fixed hyperparameters that achieves state-of-the-art performance on two 1.5B reasoning models (54.9\% and 64.3\% average accuracy across nine mathematical benchmarks) while using 2times less compute than sophisticated approaches. The same hyperparameters transfer across both models without tuning, and training exhibits smooth, monotonic improvement over 4,000+ steps without the collapses or plateaus that typically motivate interventions. Critically, ablations reveal that adding ``standard tricks'' like explicit length penalties and robust verifiers may degrade performance by collapsing exploration. These results suggest that the field may be adding complexity to solve problems that disappear with a stable, scaled-up baseline. We release our models and code to establish a simple, validated baseline for the community.

Community

Paper submitter

✨What if the simplest RL recipe is all you need?

Introducing JustRL: new SOTA among 1.5B reasoning models with 2× less compute.

Stable improvement over 4,000+ steps. No multi-stage pipelines. No dynamic schedules. Just simple RL at scale.

·

arXiv lens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/justrl-scaling-a-1-5b-llm-with-a-simple-rl-recipe-5543-f15a4aa2

  • Key Findings
  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

Sign up or log in to comment

Models citing this paper 2

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2512.16649 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2512.16649 in a Space README.md to link it from this page.

Collections including this paper 2