microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition • Updated Dec 10, 2025 • 346k • 1.57k
google/pix2struct-widget-captioning-large Visual Question Answering • 1B • Updated Apr 10, 2024 • 34 • 20