paper chasing - a stevhliu Collection

stevhliu 's Collections

paper chasing

updated 22 days ago

Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 19
Evaluating Large Language Models Trained on Code

Paper • 2107.03374 • Published Jul 7, 2021 • 8
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 24
GPT-4 Technical Report

Paper • 2303.08774 • Published Mar 15, 2023 • 7
GPT-4o System Card

Paper • 2410.21276 • Published Oct 25, 2024 • 87
OpenAI o1 System Card

Paper • 2412.16720 • Published Dec 21, 2024 • 37
gpt-oss-120b & gpt-oss-20b Model Card

Paper • 2508.10925 • Published Aug 8, 2025 • 14
Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31, 2024 • 78
Gemma 3 Technical Report

Paper • 2503.19786 • Published Mar 25, 2025 • 55
Gemini: A Family of Highly Capable Multimodal Models

Paper • 2312.11805 • Published Dec 19, 2023 • 49
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Paper • 2403.05530 • Published Mar 8, 2024 • 65
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Paper • 2507.06261 • Published Jul 7, 2025 • 67
LLaMA: Open and Efficient Foundation Language Models

Paper • 2302.13971 • Published Feb 27, 2023 • 20
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 250
The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31, 2024 • 117
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Paper • 2401.02954 • Published Jan 5, 2024 • 53
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 59
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 441
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 25
DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 76
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Paper • 2512.02556 • Published Dec 2, 2025 • 256
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 334
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published Aug 8, 2025 • 206
Training Compute-Optimal Large Language Models

Paper • 2203.15556 • Published Mar 29, 2022 • 11
Emergent Abilities of Large Language Models

Paper • 2206.07682 • Published Jun 15, 2022 • 3
Muon is Scalable for LLM Training

Paper • 2502.16982 • Published Feb 24, 2025 • 11