Audio-generation — Topic

📑 arXiv 2d ago

AST: Adaptive, Seamless, and Training-Free Precise Speech Editing

AST is a training-free speech editing framework using pre-trained autoregressive TTS models with Latent Recomposition to precisely edit speech segments while preserving speaker identity and acoustic context. Eliminates trade-offs between editing quality and consistency by selectively stitching preserved and synthesized segments without task-specific training.

Multimodal Audio-generation Inference

AST: Adaptive, Seamless, and Training-Free Precise Speech Editing ↗

AST: Adaptive, Seamless, and Training-Free Precise Speech Editing