paper chasing
updated
Language Models are Few-Shot Learners
Paper
• 2005.14165
• Published
• 19
Evaluating Large Language Models Trained on Code
Paper
• 2107.03374
• Published
• 8
Training language models to follow instructions with human feedback
Paper
• 2203.02155
• Published
• 24
Paper
• 2303.08774
• Published
• 7
Paper
• 2410.21276
• Published
• 87
Paper
• 2412.16720
• Published
• 37
gpt-oss-120b & gpt-oss-20b Model Card
Paper
• 2508.10925
• Published
• 14
Gemma 2: Improving Open Language Models at a Practical Size
Paper
• 2408.00118
• Published
• 78
Paper
• 2503.19786
• Published
• 55
Gemini: A Family of Highly Capable Multimodal Models
Paper
• 2312.11805
• Published
• 49
Gemini 1.5: Unlocking multimodal understanding across millions of tokens
of context
Paper
• 2403.05530
• Published
• 65
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality,
Long Context, and Next Generation Agentic Capabilities
Paper
• 2507.06261
• Published
• 67
LLaMA: Open and Efficient Foundation Language Models
Paper
• 2302.13971
• Published
• 20
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
• 2307.09288
• Published
• 250
The Llama 3 Herd of Models
Paper
• 2407.21783
• Published
• 117
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper
• 2401.02954
• Published
• 53
DeepSeekMoE: Towards Ultimate Expert Specialization in
Mixture-of-Experts Language Models
Paper
• 2401.06066
• Published
• 59
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
• 2501.12948
• Published
• 441
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts
Language Model
Paper
• 2405.04434
• Published
• 25
DeepSeek-V3 Technical Report
Paper
• 2412.19437
• Published
• 76
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
Paper
• 2512.02556
• Published
• 256
Paper
• 2505.09388
• Published
• 334
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper
• 2508.06471
• Published
• 206
Training Compute-Optimal Large Language Models
Paper
• 2203.15556
• Published
• 11
Emergent Abilities of Large Language Models
Paper
• 2206.07682
• Published
• 3
Muon is Scalable for LLM Training
Paper
• 2502.16982
• Published
• 11