Generate speech from text using a reference voice
Ultra-fast image-to-video in 30-60 seconds with SVD-XT