GPT-SoVITS, a tool for few-shot voice conversion and text-to-speech synthesis, is experiencing significant user-reported issues with language-specific performance and memory management, despite active development efforts. The project, designed to support multiple languages, continues to evolve with community involvement.
Recent issues highlight persistent challenges in model performance, particularly in inference accuracy across different languages. Notable problems include audio generation errors like "swallowing words" and mispronunciations. Users are also encountering memory management issues, such as CUDA out-of-memory errors during batch inference.
SORT_KEYS
functionality; major updates to Gradio and inference web UI.Timespan | Opened | Closed | Comments | Labeled | Milestones |
---|---|---|---|---|---|
7 Days | 40 | 17 | 90 | 37 | 1 |
30 Days | 137 | 74 | 356 | 115 | 1 |
90 Days | 302 | 156 | 839 | 238 | 1 |
All Time | 1109 | 584 | - | - | - |
Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.
Developer | Avatar | Branches | PRs | Commits | Files | Changes |
---|---|---|---|---|---|---|
RVC-Boss | 1 | 0/0/0 | 71 | 40 | 47884 | |
蓝梦实 | 1 | 6/6/1 | 6 | 22 | 3096 | |
ChasonJiang | 2 | 4/4/0 | 4 | 12 | 2985 | |
XXXXRT666 | 1 | 9/9/2 | 9 | 44 | 2492 | |
KamioRinn | 1 | 10/11/1 | 11 | 21 | 1623 | |
Lion-Wu | 1 | 1/1/0 | 1 | 5 | 465 | |
红血球AE3803 | 1 | 1/1/0 | 1 | 1 | 44 | |
AkitoLiu | 1 | 1/1/0 | 1 | 2 | 25 | |
licycle | 1 | 1/1/0 | 1 | 1 | 4 | |
Strive-for-excellence | 1 | 1/1/0 | 1 | 1 | 4 | |
undefined (LonelyTaker) | 1 | 2/2/0 | 2 | 2 | 3 | |
Yuan-Man | 1 | 1/1/1 | 1 | 1 | 2 | |
xiaofeicn | 1 | 0/1/0 | 1 | 1 | 2 | |
刘悦 (v3ucn) | 0 | 1/0/0 | 0 | 0 | 0 | |
LC (lc6464) | 0 | 0/0/1 | 0 | 0 | 0 | |
None (CyberWon) | 0 | 1/0/0 | 0 | 0 | 0 | |
Ikko Eltociear Ashimine (eltociear) | 0 | 0/0/1 | 0 | 0 | 0 | |
None (shadow01a) | 0 | 0/0/1 | 0 | 0 | 0 | |
HaTiWinter (HaTiWinter) | 0 | 0/0/1 | 0 | 0 | 0 | |
Maic Gerace (AssassinQuin) | 0 | 1/0/0 | 0 | 0 | 0 | |
Ziyao Wang (wangziyao318) | 0 | 1/0/1 | 0 | 0 | 0 | |
Ming (Ming-Zhou0201) | 0 | 4/0/1 | 0 | 0 | 0 | |
None (james-bond-007) | 0 | 0/0/1 | 0 | 0 | 0 |
PRs: created by that dev and opened/merged/closed-unmerged during the period
The GitHub repository for GPT-SoVITS has seen significant activity, with a total of 525 open issues and 584 closed issues. Recent issues indicate ongoing challenges with model performance, particularly in the areas of inference accuracy and handling various languages. Notably, there are multiple reports of audio generation errors, including issues with "swallowing words" and unexpected outputs when using certain reference audio files.
A recurring theme among the issues is the difficulty users face when trying to achieve consistent results across different languages and audio inputs. Many users have reported problems with specific phrases or words being mispronounced or omitted entirely during synthesis, which raises concerns about the robustness of the underlying models.
Here are some of the most recently created and updated issues:
Issue #1518: 推理日语文本的时候出现错误
Issue #1517: Fast_infercence_ 分支的推理速度
Issue #1516: infer_panel_batch_infer爆显存问题
Issue #1515: 请问继续预训练或者微调的数据可以不带说话人信息吗
Issue #1514: 日语推理提取文本bert特征时卡住
Issue #1513: opencc包要求高版本的GLIBC_2.29
Issue #1512: 瑟读音错误,(se4)会读成(sai4)
Issue #1508: 🎉很多人要的【SSML 标签】 已初步支持!
Overall, while there are many active discussions about improvements and bug fixes, the project appears to be evolving rapidly with community involvement driving enhancements.
The repository RVC-Boss/GPT-SoVITS has a total of 46 open pull requests, with a variety of enhancements, bug fixes, and feature additions aimed at improving the text-to-speech (TTS) synthesis capabilities of the software. The recent activity indicates a focus on optimizing performance, enhancing multi-language support, and refining user interface elements.
PR #1479: 解决并发请求接口时,音色互相串的问题
Created 6 days ago. This PR addresses an issue with concurrent requests causing voice blending problems. It introduces asynchronous handling in the TTS API functions.
PR #1429: fast_inference: support v2 model
Created 13 days ago. This PR adds support for version 2 models in the inference process, ensuring compatibility with updated model architectures.
PR #1121: 添加一键部署到阿里云函数计算的链接
Created 91 days ago. This PR provides a template for deploying the application to Alibaba Cloud Function Compute, streamlining deployment and usage.
PR #1420: fix: gitgnore
Created 14 days ago. A minor fix to the .gitignore
file to include additional patterns for ignored files.
PR #1380: fix webui.py python环境.pth问题修复
Created 20 days ago. This PR updates the webui.py
file to avoid writing paths directly into site-packages, thus preventing potential conflicts in shared environments.
PR #1379: fix api.py 中英混合模式下,修复1.fly被读为one.fly的情况,调整后为 一.fly
Created 20 days ago. Fixes an issue where mixed-language input was mispronounced, ensuring that numeric representations are handled correctly.
PR #1350: fix TTS.py
Created 26 days ago. Adjusts maximum length calculations in TTS processing to ensure accurate handling of input lengths.
PR #1334: Update install.sh
Created 31 days ago. Modifies the installation script to include make
command dependencies for smoother installation processes.
PR #1306: 更好的API,更新情感插件
Created 39 days ago. Updates the emotional management plugin and improves API functionality for better integration with GPT-SoVITS.
PR #1509: [i18n 优化] add SORT_KEYS
Closed recently. Introduces a sorting option for internationalization (i18n) keys to facilitate batch processing of missing values.
PR #1504: Merge_yi修复
Closed recently. Fixes issues related to merging specific phonetic representations in Chinese text processing.
PR #1454: Fix hyphen
Closed recently. Addresses issues with out-of-vocabulary (OOV) words that contain hyphens in English text processing.
The current landscape of pull requests within the RVC-Boss/GPT-SoVITS repository reveals several key themes and trends:
Recent pull requests emphasize optimizing performance, particularly regarding asynchronous processing and batch inference capabilities. For instance, PR #1479 introduces asynchronous handling in API calls to prevent voice blending during concurrent requests—a common issue in TTS systems that can degrade user experience significantly. Similarly, PR #1429's addition of support for version 2 models indicates an ongoing effort to enhance synthesis quality and model compatibility.
Another prominent theme is the enhancement of multi-language capabilities within the system. The repository's commitment to supporting various languages is evident from multiple pull requests aimed at fixing pronunciation issues (e.g., PR #1379) and improving emotional expression through updated APIs (e.g., PR #1306). This aligns with the project's goal of providing robust TTS solutions across different linguistic contexts, catering especially to users who require nuanced language processing.
Several pull requests focus on refining the user interface and user experience aspects of the software. For example, PR #1121 simplifies deployment processes via Alibaba Cloud, making it easier for users to access and utilize the tool without extensive technical knowledge. Additionally, enhancements in web UI layout and functionality are noted in various PRs aimed at improving accessibility and usability for end-users.
The volume of open pull requests suggests active community engagement and contribution towards continuous improvement of the project. The discussions in comments reflect a collaborative environment where contributors are encouraged to share insights and address issues collectively—this is crucial for maintaining momentum in open-source projects.
Despite the positive trends, there are concerns regarding some older pull requests that remain unresolved or have not been merged despite their potential value (e.g., PRs related to i18n improvements). This could indicate bottlenecks in review processes or prioritization challenges within the development team. Additionally, some pull requests address minor fixes or documentation updates that may not be as critical but still contribute significantly to overall project health.
In conclusion, while there is substantial progress reflected in recent pull requests focusing on performance optimization and multi-language support, attention should also be given to resolving older contributions that could enhance functionality further. The active involvement from contributors showcases a vibrant community ready to tackle challenges as they arise while pushing for innovative advancements in TTS technology through GPT-SoVITS.
RVC-Boss
SORT_KEYS
functionality.README.md
and changelogs.SapphireLab
ChasonJiang
KamioRinn
Strive-for-excellence
Yuan-ManX
XXXXRT666
Lion-Wu
Akito-UzukiP
xiaofeicn
licycle
Erythrocyte3803
LonelyTaker