Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching Paper • 2409.01141 • Published Sep 2, 2024 • 1