Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation Paper • 2403.01479 • Published Mar 3, 2024 • 1