Try out Step-Audio-EditX
Convert WAV audio to token string
Generate reconstructed image from uploaded image