Whisper.cpp, a C++ implementation of OpenAI's Whisper ASR model, is actively developed to enhance speech-to-text capabilities across platforms. The project is driven by a community focused on optimizing performance and expanding compatibility, particularly with GPU acceleration.
Recent issues and pull requests (PRs) reveal a focus on performance improvements and compatibility fixes. Key issues include #2356, reporting infinite loops in multilingual audio processing, and #2355, highlighting a regression in Vulkan support. These indicate ongoing challenges in maintaining stable GPU support. PRs like #2360 introduce accessibility improvements, while others focus on documentation enhancements (#2358) and Go bindings development (#2350).
ggml
and whisper.cpp
.Developer | Avatar | Branches | PRs | Commits | Files | Changes |
---|---|---|---|---|---|---|
hipudding | 1 | 0/0/0 | 1 | 18 | 10830 | |
Georgi Gerganov | 1 | 3/3/0 | 17 | 31 | 9610 | |
slaren | 1 | 0/0/0 | 5 | 35 | 2286 | |
Dibakar Gope | 1 | 0/0/0 | 1 | 2 | 2232 | |
Johannes Gäßler | 1 | 0/0/0 | 3 | 11 | 1470 | |
0cc4m | 1 | 0/0/0 | 2 | 1 | 1162 | |
R0CKSTAR | 1 | 0/0/0 | 2 | 9 | 1028 | |
Mengqing Cao | 1 | 1/1/0 | 1 | 3 | 149 | |
Meng, Hengyu | 1 | 0/0/0 | 1 | 5 | 134 | |
Chen Xi | 1 | 0/0/0 | 1 | 2 | 103 | |
zhentaoyu | 1 | 0/0/0 | 1 | 5 | 98 | |
Salvatore Mesoraca | 1 | 0/0/0 | 1 | 1 | 76 | |
Molly Sophia | 1 | 0/0/0 | 1 | 5 | 50 | |
jdomke | 1 | 0/0/0 | 1 | 5 | 46 | |
l3utterfly | 1 | 0/0/0 | 1 | 1 | 46 | |
Conrad Kramer | 1 | 0/0/0 | 1 | 2 | 43 | |
Joe Todd | 1 | 0/0/0 | 2 | 1 | 35 | |
Mahesh Madhav | 1 | 0/0/0 | 1 | 1 | 32 | |
Ivan Filipov | 1 | 0/0/0 | 1 | 1 | 24 | |
Sigbjørn Skjæret | 1 | 0/0/0 | 1 | 3 | 21 | |
Ouadie EL FAROUKI | 1 | 0/0/0 | 2 | 2 | 21 | |
matteo | 1 | 0/0/0 | 1 | 1 | 15 | |
CarterLi999 | 1 | 0/0/0 | 1 | 1 | 12 | |
wangshuai09 | 1 | 0/0/0 | 1 | 2 | 7 | |
DavidKorczynski | 1 | 0/0/0 | 1 | 1 | 6 | |
Borislav Stanimirov | 1 | 0/0/0 | 1 | 2 | 5 | |
Clint Herron | 1 | 0/0/0 | 1 | 1 | 5 | |
Tony Wasserka | 1 | 0/0/0 | 1 | 1 | 4 | |
luoyu-intel | 1 | 0/0/0 | 1 | 1 | 3 | |
Justine Tunney | 1 | 1/0/0 | 1 | 1 | 2 | |
Alex O'Connell | 1 | 0/0/0 | 1 | 1 | 2 | |
Daniel Bevenius | 1 | 0/0/0 | 1 | 1 | 2 | |
Mark Zhuang | 1 | 0/0/0 | 1 | 1 | 2 | |
Jeroen Mostert | 1 | 0/0/0 | 1 | 1 | 2 | |
Daven Sanassy | 1 | 1/1/0 | 1 | 1 | 1 | |
None (10Jib) | 0 | 1/0/0 | 0 | 0 | 0 | |
György Balikó (gyorgy1) | 0 | 1/0/0 | 0 | 0 | 0 | |
None (hsinhoyeh) | 0 | 1/0/0 | 0 | 0 | 0 | |
Eric Curtin (ericcurtin) | 0 | 1/0/0 | 0 | 0 | 0 | |
Tim Miller (drasticactions) | 0 | 1/0/0 | 0 | 0 | 0 |
PRs: created by that dev and opened/merged/closed-unmerged during the period
Timespan | Opened | Closed | Comments | Labeled | Milestones |
---|---|---|---|---|---|
7 Days | 12 | 3 | 9 | 12 | 1 |
30 Days | 32 | 7 | 30 | 32 | 1 |
90 Days | 107 | 34 | 165 | 105 | 1 |
All Time | 1264 | 635 | - | - | - |
Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.
The ggerganov/whisper.cpp
repository currently has 629 open issues, indicating a high level of ongoing activity and user engagement. Recent issues highlight various challenges users face, including performance regressions, compilation problems, and specific feature requests. Notably, there are recurring themes around GPU support, model performance inconsistencies, and the need for better handling of non-English languages.
Several issues exhibit anomalies, such as the frequent occurrence of hallucinations in transcriptions, particularly with certain models or configurations. Users also report problems with audio processing, including infinite loops during transcription and incorrect handling of timestamps. The presence of multiple issues related to CUDA and OpenCL suggests potential compatibility or performance concerns that need addressing.
Here are some of the most recently created and updated issues:
Issue #2362: Put OpenVINO and OpenBLAS together gives better performance
Issue #2361: Release v1.7.0 ??
Issue #2359: How to use a .safetensors file in this library?
Issue #2356: Transcribing audio files to text goes into infinite loop for audios with multiple languages
Issue #2355: [Regression] No longer compiles with Vulkan
Issue #2310: Whisper.cpp consumes unusually large amounts of system memory when transcribing very long wave files
Issue #2304: Improvement video chat
.safetensors
).This analysis reflects the current state of user engagement and highlights areas where the project may require further development or stabilization efforts.
The provided dataset includes a comprehensive list of pull requests (PRs) from the ggerganov/whisper.cpp
repository, which focuses on implementing OpenAI's Whisper automatic speech recognition model in C++. The dataset comprises 63 open PRs and numerous closed ones, showcasing a variety of enhancements, bug fixes, and feature additions.
PR #2360: Use colorblind friendly TTY color scheme
PR #2358: Fix broken links in README.md
PR #2350: feat(go binding): add beamsize/entropythold/maxcontext to context interface
PR #2346: Set MSVC to use UTF-8 on source files
PR #2339: fix go bindings
PR #2330: fix go bindings
PR #2279: Incorrect timestamps
PR #2272: Fix MKL build issue by correctly finding and linking MKL libraries
PR #2291: Implementing Encoder Begin Callback for golang binding
PR #2264: build : fix typo in CMakeLists.txt
PR #2254: kommand proj
PR #2184: Add support for quantization and custom audio context size to OpenVino
PR #2127: whisper grammar: experimental implementation with boost::spirit
PR #2095: Fixed incorrect docker example in readme
PR #2075: Up OpenBLAS and cuda-toolkit versions build.yml
The analysis of the pull requests reveals several key themes and trends within the development process of whisper.cpp
.
Accessibility Improvements: The recent PRs highlight a strong focus on accessibility, particularly with PR #2360 introducing a colorblind-friendly terminal color scheme. This reflects an awareness of user diversity and the need for inclusive design practices within software development.
Documentation Enhancements: There is a consistent effort to improve documentation as seen in PRs like #2358 (fixing broken links) and #2095 (updating Docker examples). This is crucial for user onboarding and ensuring that developers can effectively utilize the library without encountering obstacles due to outdated or incorrect information.
Go Bindings Development: A notable number of PRs (e.g., #2350, #2339, and #2330) are dedicated to enhancing Go bindings. This indicates an expanding user base that utilizes Go for integrating Whisper functionalities into their applications, suggesting that cross-language compatibility is becoming increasingly important for the project’s growth.
Bug Fixes and Performance Optimizations: Many PRs are focused on fixing bugs (e.g., timestamp issues in PR #2279) and optimizing performance (e.g., quantization support in PR #2184). This ongoing maintenance is vital for ensuring reliability and efficiency as more users adopt the library for real-time applications.
Community Engagement: The discussions surrounding several PRs indicate active community engagement where contributors seek feedback from maintainers or other developers (e.g., discussions on Go CI in PR #2350). This collaborative environment fosters innovation and helps maintain high-quality contributions.
The pull requests within whisper.cpp
reflect a vibrant development community focused on enhancing accessibility, improving documentation, optimizing performance, and expanding language support through Go bindings. However, the high number of open PRs suggests that there may be challenges in effectively managing contributions and prioritizing tasks within the project’s roadmap. Addressing these challenges will be essential as the project continues to grow and evolve in response to user needs and technological advancements.
Georgi Gerganov (ggerganov)
ggml
library.whisper.cpp
file, including handling empty mel spectrograms and using Vulkan as a GPU backend.Salvatore Mesoraca (smeso)
ggml_sub
.Slaren (slaren)
ggml-backend
.Mengqing Cao (MengqingCao)
whisper.cpp
.Hipudding
Ouadie El Farouki (OuadiElfarouki)
Johannes Gäßler (JohannesGaessler)
Molly Sophia (MollySophia)
Justine Tunney (jart)
Jdomke
CarterLi999
Others (including contributors like R0CKSTAR, airMeng, iboB, conradev, etc.) made smaller contributions focusing on various aspects of the project including bug fixes, feature enhancements, and optimizations.
whisper.cpp
project, particularly focusing on GPU acceleration through CUDA and Vulkan backends.The development team is engaged in a robust cycle of feature enhancement, bug fixing, and performance optimization within the whisper.cpp
project. The collaborative nature of their work suggests a strong commitment to maintaining high-quality standards while adapting to evolving requirements in speech recognition technology.