TiDAR: Think in Diffusion, Talk in Autoregression Paper ⢠2511.08923 ⢠Published Nov 12, 2025 ⢠121
view article Article You could have designed state of the art positional encoding Nov 25, 2024 ⢠430
Whisper Collection OpenAI Whisper speech recognition models in MLX format ⢠48 items ⢠Updated Oct 1, 2024 ⢠62
What matters when building vision-language models? Paper ⢠2405.02246 ⢠Published May 3, 2024 ⢠103
Idefics2 š¶ Collection Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. ⢠11 items ⢠Updated May 6, 2024 ⢠92
Zero-Shot Detection and Segmentation Collection Demos of projects focused on zero-shot detection and segmentation. ⢠4 items ⢠Updated Feb 7, 2024 ⢠3