Visual instruction datasets for visual language models
Collection
Collections of multimodal (image+text) instruction finetuning datasets tailored for visual language models like LlaVA, Fuyu, or IDEFICS.
•
5 items
•
Updated
•
2