view article Article How We Use Claude Code Skills to Run 1,000+ ML Experiments a Day 17 days ago β’ 46
Flash Sparse Attention: An Alternative Efficient Implementation of Native Sparse Attention Kernel Paper β’ 2508.18224 β’ Published Aug 25 β’ 1
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation Paper β’ 2511.09611 β’ Published Nov 12 β’ 68
moonshotai/Kimi-Linear-48B-A3B-Instruct Text Generation β’ 49B β’ Updated 10 days ago β’ 157k β’ 509