𝓵𝓮𝔃𝓲𝓼𝓱𝓮𝓷 (@lezishen)美团发布 LongCat-AudioDiT 音频生成模型:说话人相似度指标提升至 0.818,现已开源 中发帖

[image] 
[image]
[image]

论文:[2603.29339v1] LongCat-AudioDiT: High-Fidelity Diffusion Text-to-Speech in the Waveform Latent Space
GitHub:GitHub - meituan-longcat/LongCat-AudioDiT · GitHub
HuggingFace:meituan-longcat/LongCat-AudioDiT-1B · Hugging Face