Submitted by Jingfeng Yao 89 Towards Scalable Pre-training of Visual Tokenizers for Generation MiniMax 235 4