Running Featured 1.28k FineWeb: decanting the web for the finest text data at scale š· 1.28k Generate high-quality text data for LLMs using FineWeb
Running 3.67k The Ultra-Scale Playbook š 3.67k The ultimate guide to training LLM on large GPU Clusters
The Instruction Gap: LLMs get lost in Following Instruction Paper ⢠2601.03269 ⢠Published Dec 19, 2025 ⢠8
Runtime error Featured 2.95k The Smol Training Playbook š 2.95k The secrets to building world-class LLMs
view reply You don't really have to clone the repo. The FastAPI code is just there for demonstration, and you can code the way you like. The main takeaway is the Dockerfile.
view article Article How to generate text: using different decoding methods for language generation with Transformers Mar 1, 2020 ⢠285
Komodo: A Linguistic Expedition into Indonesia's Regional Languages Paper ⢠2403.09362 ⢠Published Mar 14, 2024 ⢠11
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding Paper ⢠2506.16035 ⢠Published Jun 19, 2025 ⢠89