Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation Paper • 2510.22115 • Published Oct 25, 2025 • 83
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning Paper • 2510.19338 • Published Oct 22, 2025 • 114
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training Paper • 2507.17634 • Published Jul 23, 2025 • 2