small models
updated
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Paper
• 2310.10837
• Published
• 11
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper
• 2310.11453
• Published
• 106
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper
• 2310.16795
• Published
• 27
LLM-FP4: 4-Bit Floating-Point Quantized Transformers
Paper
• 2310.16836
• Published
• 14
FP8-LM: Training FP8 Large Language Models
Paper
• 2310.18313
• Published
• 33
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Paper
• 2310.19102
• Published
• 11
Ziya2: Data-centric Learning is All LLMs Need
Paper
• 2311.03301
• Published
• 20
Mini-GPTs: Efficient Large Language Models through Contextual Pruning
Paper
• 2312.12682
• Published
• 9
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
• 2312.15166
• Published
• 61
TinyLlama: An Open-Source Small Language Model
Paper
• 2401.02385
• Published
• 95
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper
• 2401.15024
• Published
• 73
Specialized Language Models with Cheap Inference from Limited Domain
Data
Paper
• 2402.01093
• Published
• 47
Rethinking Optimization and Architecture for Tiny Language Models
Paper
• 2402.02791
• Published
• 13
Scaling Laws for Downstream Task Performance of Large Language Models
Paper
• 2402.04177
• Published
• 20
HARE: HumAn pRiors, a key to small language model Efficiency
Paper
• 2406.11410
• Published
• 39