Tokenize Anything via Prompting
Paper
โข
2312.09128
โข
Published
โข
1
Ting Pan1,2*, Lulu Tang2*, Xinlong Wang2ยถ, Shiguang Shan1
We present Tokenize Anything via Prompting, a unified and promptable model capable of simultaneously segmenting, recognizing, and captioning arbitrary regions, with flexible visual prompts (point, box and sketch). The model is trained with exhaustive segmentation masks sourced from SA-1B, coupled with semantic priors from a pre-trained EVA-CLIP with 5 billion parameters.
See Github Page.
| Model | Description | Schedule | MD5 | Weights |
|---|---|---|---|---|
| tap_vit_h | ViT-H TAP v1.1 model | (100% SA-1B, 180k), (VG, 50ep) | 4bdfb9 | ๐ค HF link |
| tap_vit_l | ViT-L TAP v1.1 model | (100% SA-1B, 180k), (VG, 50ep) | c1d41f | ๐ค HF link |
| tap_vit_b | ViT-B TAP v1.1 model | (100% SA-1B, 180k), (VG, 50ep) | 707f80 | ๐ค HF link |
| Model | Description | Schedule | MD5 | Weights |
|---|---|---|---|---|
| tap_vit_l | ViT-L TAP v1.0 model | (50% SA-1B, 90k), (VG, 25ep) | 03f8ec | ๐ค HF link |
| tap_vit_b | ViT-B TAP v1.0 model | (50% SA-1B, 90k), (VG, 25ep) | b45cbf | ๐ค HF link |
Note: You can generate these weights following the Concept Guide.
| Concept | Description | Weights |
|---|---|---|
| Merged-2560 | Merged concepts | ๐ค HF link |
| LVIS-1203 | LVIS concepts | ๐ค HF link |
| COCO-80 | COCO concepts | ๐ค HF link |
@article{pan2023tap,
title={Tokenize Anything via Prompting},
author={Pan, Ting and Tang, Lulu and Wang, Xinlong and Shan, Shiguang},
journal={arXiv preprint arXiv:2312.09128},
year={2023}
}
We thank the repositories: SAM, EVA, LLaMA, FlashAttention, Gradio, Detectron2 and CodeWithGPU.