SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
Paper
•
2405.11582
•
Published
•
17
An unofficial reproduced PRepBN-Llama-350M checkpoints for SLAB.
https://github.com/xinghaochen/SLAB/tree/main/llama
python evaluation.py --ckpt <checkpoint-path>
BibTeX:
@inproceedings{guo2024slab,
title={SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization},
author={Guo, Jialong and Chen, Xinghao and Tang, Yehui and Wang, Yunhe},
booktitle={International Conference on Machine Learning},
year={2024}
}