Add comprehensive model card for MAGREF with metadata and usage

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +141 -0
README.md ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: image-to-video
4
+ library_name: diffusers
5
+ ---
6
+
7
+ # MAGREF: Masked Guidance for Any-Reference Video Generation with Subject Disentanglement
8
+
9
+ <div align="center">
10
+
11
+ <a href="https://huggingface.co/papers/2505.23742"><img src="https://img.shields.io/badge/Arxiv-2505.20292-b31b1b.svg?logo=arXiv"></a> &ensp;
12
+ <a href="https://magref-video.github.io/magref.github.io/"><img src="https://img.shields.io/static/v1?label=Project&message=Page&color=blue&logo=github-pages"></a> &ensp;
13
+ <a href="https://github.com/MAGREF-Video/MAGREF"><img src="https://img.shields.io/static/v1?label=GitHub&message=Code&color=blue&logo=github"></a> &ensp;
14
+ <a href="https://huggingface.co/MAGREF-Video/MAGREF/tree/main"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%96%20HuggingFace&message=Models&color=green"></a> &ensp;
15
+
16
+ </div>
17
+
18
+ <br>
19
+
20
+ **MAGREF** is a unified and effective framework for **any-reference video generation**, tackling challenges such as identity inconsistency, entanglement among multiple reference subjects, and copy-paste artifacts. This approach incorporates masked guidance and a subject disentanglement mechanism, enabling flexible video synthesis conditioned on diverse reference images and textual prompts.
21
+
22
+ ## Paper Abstract
23
+
24
+ We tackle the task of any-reference video generation, which aims to synthesize videos conditioned on arbitrary types and combinations of reference subjects, together with textual prompts. This task faces persistent challenges, including identity inconsistency, entanglement among multiple reference subjects, and copy-paste artifacts. To address these issues, we introduce MAGREF, a unified and effective framework for any-reference video generation. Our approach incorporates masked guidance and a subject disentanglement mechanism, enabling flexible synthesis conditioned on diverse reference images and textual prompts. Specifically, masked guidance employs a region-aware masking mechanism combined with pixel-wise channel concatenation to preserve appearance features of multiple subjects along the channel dimension. This design preserves identity consistency and maintains the capabilities of the pre-trained backbone, without requiring any architectural changes. To mitigate subject confusion, we introduce a subject disentanglement mechanism which injects the semantic values of each subject derived from the text condition into its corresponding visual region. Additionally, we establish a four-stage data pipeline to construct diverse training pairs, effectively alleviating copy-paste artifacts. Extensive experiments on a comprehensive benchmark demonstrate that MAGREF consistently outperforms existing state-of-the-art approaches, paving the way for scalable, controllable, and high-fidelity any-reference video synthesis.
25
+
26
+ ## Teaser
27
+
28
+ ![teaser](https://github.com/MAGREF-Video/MAGREF/blob/main/assets/teaser.png)
29
+
30
+ ## ๐Ÿ”ฅ News
31
+ * `[2025.10.10]` ๐Ÿ”ฅ Our [Research Paper](https://arxiv.org/abs/2505.23742) of MAGREF is now available. The [Project Page](https://magref-video.github.io/) of MAGREF is created.
32
+ * `[2025.06.20]` ๐Ÿ™ Thanks to **Kijai** for developing the [**ComfyUI nodes**](https://github.com/kijai/ComfyUI-WanVideoWrapper) for MAGREF and FP8-quantized Hugging Face mode! Feel free to try them out and add MAGREF to your workflow.
33
+ * `[2025.06.18]` ๐Ÿ”ฅ In progress. We are actively collecting and processing more diverse datasets and scaling up training with increased computational resources to further improve resolution, temporal consistency, and generation quality. Stay turned๏ผ
34
+ * `[2025.06.16]` ๐Ÿ”ฅ MAGREF is coming! The inference codes and [checkpoint](https://huggingface.co/MAGREF-Video/MAGREF/tree/main) have been released.
35
+
36
+ ## ๐ŸŽฅ Demo
37
+
38
+ https://github.com/user-attachments/assets/ea8f7195-4ffc-4866-b210-f66bac993b7a
39
+
40
+ ## ๐Ÿ“‘ Todo List
41
+ - [x] Inference codes of MAGREF-480P
42
+ - [x] Checkpoint of MAGREF-480P
43
+ - [ ] Checkpoint of MAGREF-14B Pro
44
+ - [ ] Training codes of MAGREF
45
+
46
+ ## โœจ Community Works
47
+ ### ComfyUI
48
+ Thanks for Kijai develop the ComfyUI nodes for MAGREF:
49
+ [https://github.com/kijai/ComfyUI-WanVideoWrapper](https://github.com/kijai/ComfyUI-WanVideoWrapper)
50
+
51
+ FP8 quant Huggingface Mode: [https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan2_1-Wan-I2V-MAGREF-14B_fp8_e4m3fn.safetensors](https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan2_1-Wan-I2V-MAGREF-14B_fp8_e4m3fn.safetensors)
52
+
53
+ ### Guideline
54
+ Guideline by Benji: [https://www.youtube.com/watch?v=rwnh2Nnqje4](https://www.youtube.com/watch?v=rwnh2Nnqje4)
55
+
56
+ ## โš™๏ธ Requirements and Installation
57
+ We recommend the requirements as follows.
58
+
59
+ ### Environment
60
+
61
+ ```bash
62
+ # 0. Clone the repo
63
+ git clone https://github.com/MAGREF-Video/MAGREF.git
64
+ cd MAGREF
65
+
66
+ # 1. Create conda environment
67
+ conda create -n magref python=3.11.2
68
+ conda activate magref
69
+
70
+ # 2. Install PyTorch and other dependencies
71
+ # CUDA 12.1
72
+ pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
73
+ # CUDA 12.4
74
+ pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
75
+
76
+ # 3. Install pip dependencies
77
+ pip install -r requirements.txt
78
+
79
+ # 4. (Optional) Install xfuser for multiple GPUs inference
80
+ pip install "xfuser>=0.4.1"
81
+ ```
82
+
83
+ ### Download MAGREF Checkpoint
84
+
85
+ ```bash
86
+ # if you are in china mainland, run this first: export HF_ENDPOINT=https://hf-mirror.com
87
+ # pip install -U "huggingface_hub[cli]"
88
+ huggingface-cli download MAGREF-Video/MAGREF --local-dir ./ckpts/magref
89
+ ```
90
+
91
+ ## ๐Ÿค— Quick Start
92
+ - Single-GPU inference
93
+ <br>Tested on a single NVIDIA H100 GPU.
94
+ The inference consumes around **70 GB** of VRAM, so an 80 GB GPU is recommended.
95
+ ```bash
96
+ # way 1
97
+ bash infer_single_gpu.sh
98
+
99
+ # way 2
100
+ python generate.py \
101
+ --ckpt_dir ./ckpts/magref \
102
+ --save_dir ./samples \
103
+ --prompt_path ./assets/single_id.txt \
104
+ ```
105
+
106
+ - Multi-GPU inference
107
+ ```bash
108
+ # way 1
109
+ bash infer_multi_gpu.sh
110
+
111
+ # way 2
112
+ torchrun --nproc_per_node=8 generate.py \
113
+ --dit_fsdp --t5_fsdp --ulysses_size 8 \
114
+ --ckpt_dir ./ckpts/magref \
115
+ --save_dir ./samples \
116
+ --prompt_path ./assets/multi_id.txt \
117
+ ```
118
+ > ๐Ÿ’กNote:
119
+ > * To achieve the best generation results, we recommend that you describe the visual content of the reference image as accurately as possible when writing text prompt.
120
+ > * When the generated video is unsatisfactory, the most straightforward solution is to try changing the `--base_seed` and modifying the description in the prompt.
121
+
122
+ ## ๐Ÿ‘ Acknowledgement
123
+
124
+ * This project wouldn't be possible without the following open-sourced repositories: [Wan2.1](https://github.com/Wan-Video/Wan2.1), [VACE](https://github.com/ali-vilab/VACE), [Phantom](https://github.com/Phantom-video/Phantom), [SkyReels-A2](https://github.com/SkyworkAI/SkyReels-A2), [HunyuanCustom](https://github.com/Tencent-Hunyuan/HunyuanCustom), [ConsisID](https://github.com/PKU-YuanGroup/ConsisID), [Concat-ID](https://github.com/ML-GSAI/Concat-ID)
125
+
126
+ ## ๐Ÿ“ง Ethics Concerns
127
+ The images used in these demos are sourced from public domains or generated by models, and are intended solely to showcase the capabilities of this research. If you have any concerns, please contact us at [email protected], and we will promptly remove them.
128
+
129
+ ## โœ๏ธ Citation
130
+
131
+ If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:.
132
+
133
+ ### BibTeX
134
+ ```bibtex
135
+ @article{deng2025magref,
136
+ title={MAGREF: Masked Guidance for Any-Reference Video Generation},
137
+ author={Deng, Yufan and Guo, Xun and Yin, Yuanyang and Fang, Jacob Zhiyuan and Yang, Yiding and Wang, Yizhi and Yuan, Shenghai and Wang, Angtian and Liu, Bo and Huang, Haibin and others},
138
+ journal={arXiv preprint arXiv:2505.23742},
139
+ year={2025}
140
+ }
141
+ ```