Whisper.cpp, a high-performance implementation of OpenAI's Whisper ASR model, is actively being developed to enhance GPU support and improve multilingual transcription accuracy. The project, maintained by Georgi Gerganov and others, aims to provide a lightweight, dependency-free solution for ASR across various platforms.
Recent issues and pull requests indicate a strong focus on addressing GPU backend problems, particularly with Vulkan and CUDA. This suggests an emphasis on optimizing performance and ensuring compatibility across diverse hardware. The team is also tackling compilation challenges on different platforms and enhancing language support.
Timespan | Opened | Closed | Comments | Labeled | Milestones |
---|---|---|---|---|---|
7 Days | 4 | 2 | 0 | 4 | 1 |
30 Days | 34 | 16 | 21 | 34 | 1 |
90 Days | 100 | 36 | 131 | 99 | 1 |
All Time | 1296 | 651 | - | - | - |
Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.
Developer | Avatar | Branches | PRs | Commits | Files | Changes |
---|---|---|---|---|---|---|
Georgi Gerganov | 1 | 3/3/0 | 14 | 64 | 150892 | |
zhentaoyu | 1 | 0/0/0 | 1 | 10 | 582 | |
Johannes Gäßler | 1 | 0/0/0 | 4 | 16 | 536 | |
compilade | 1 | 0/0/0 | 1 | 2 | 286 | |
Radoslav Gerganov | 1 | 0/0/0 | 3 | 6 | 190 | |
luoyu-intel | 1 | 0/0/0 | 1 | 4 | 177 | |
slaren | 1 | 0/0/0 | 2 | 3 | 124 | |
Justine Tunney | 1 | 0/1/0 | 1 | 1 | 42 | |
hsinhoyeh | 1 | 0/1/0 | 1 | 6 | 39 | |
Mengqing Cao | 1 | 1/1/0 | 1 | 1 | 34 | |
Salvatore Mesoraca | 1 | 0/0/0 | 1 | 1 | 26 | |
Binozo | 1 | 2/1/0 | 1 | 2 | 11 | |
Tim Miller | 1 | 0/1/0 | 1 | 1 | 5 | |
Meng, Hengyu | 1 | 0/0/0 | 1 | 2 | 4 | |
stormofice | 1 | 1/1/0 | 1 | 1 | 4 | |
Brad Murray | 1 | 1/1/0 | 1 | 1 | 4 | |
Toliver | 1 | 1/1/0 | 1 | 1 | 3 | |
UsernamesLame | 1 | 1/1/0 | 1 | 2 | 3 | |
Philippe Normand | 1 | 1/1/0 | 1 | 1 | 2 | |
Peng | 1 | 1/1/0 | 1 | 1 | 2 | |
Ivo von Putzer Reibegg | 1 | 1/1/0 | 1 | 1 | 2 | |
Eric Curtin | 1 | 0/1/0 | 1 | 1 | 2 | |
Akarshan Biswas | 1 | 0/0/0 | 1 | 1 | 2 | |
None (shivghai) | 0 | 1/0/0 | 0 | 0 | 0 | |
byoungdale (byoungdale) | 0 | 1/0/0 | 0 | 0 | 0 | |
Paweł Budzianowski (budzianowski) | 0 | 1/0/1 | 0 | 0 | 0 | |
None (thewh1teagle) | 0 | 1/0/0 | 0 | 0 | 0 | |
Dave Lewis (fromdavelewis) | 0 | 1/0/0 | 0 | 0 | 0 | |
None (definitelyuncertain) | 0 | 1/0/1 | 0 | 0 | 0 |
PRs: created by that dev and opened/merged/closed-unmerged during the period
The GitHub repository for whisper.cpp
has seen significant activity, with 645 open issues currently. Recent contributions indicate a mix of user-reported bugs, feature requests, and discussions about performance optimizations. Notably, there are recurring themes around GPU support, compilation issues across different platforms, and the need for better handling of specific languages and audio formats.
Several issues exhibit anomalies, such as users experiencing crashes or unexpected behavior when using certain models or configurations. For instance, there are reports of performance regressions in newer versions compared to older ones, particularly regarding the handling of CUDA and CoreML backends. Additionally, some users have raised concerns about the accuracy of transcriptions in languages other than English, suggesting that improvements are needed in multilingual support.
Issue #2420: Is there any way to get rid of [Blank Audio] in transcript?
Issue #2418: No assigned threads when manually compiling on MSVC
Issue #2415: Vulkan backend crashes with --processors > 1
Issue #2413: Support for LiteRT android on device AI with GPU acceleration
Issue #2412: won't compile on osx 12.5 M1
Issue #2411: Fallback from Vulkan to CPU (Edited 7 days ago)
Issue #2361: Release v1.7.0 ?? (Edited 6 days ago)
Issue #2402: First load time in Nvidia Jetson AGX Xavier and Orin is more than 10 minutes (Edited 14 days ago)
Issue #2400: The recognition results with Vulkan are so bad (Edited 15 days ago)
Issue #2399: Failed to compile it with Vulkan (Edited 15 days ago)
This analysis reflects a vibrant community actively engaging with the whisper.cpp
project while highlighting areas that may require further attention from maintainers to improve user experience and software reliability.
The repository ggerganov/whisper.cpp
currently has 64 open pull requests (PRs), with a variety of contributions aimed at enhancing the functionality, performance, and usability of the Whisper automatic speech recognition model. The PRs cover a wide range of topics, including Go bindings, server improvements, and GPU support.
PR #2417: Added temperature options for Go bindings. This improves model performance by allowing users to adjust temperature settings to reduce hallucination in outputs.
PR #2406: Server update to erase previous stdout text for multi-row outputs, enhancing usability during transcription.
PR #2330: Fix for Go bindings addressing missing ggml issues, indicating ongoing efforts to stabilize language bindings.
PR #1261: Dynamic selection of extended instruction sets for x86 architecture, which aims to improve binary distribution without compromising performance.
PR #2384: Updates the talk example to align with the latest GPT-2 implementation from ggml, showcasing adaptability to newer models.
PR #2376: Fixing Go binding makefile issues, reflecting ongoing maintenance and improvement of language bindings.
PR #2369: Addition of CI tests for ensuring code reliability across platforms, which is critical for maintaining software quality.
PR #2339: Another fix for Go bindings that builds on previous efforts, showing a pattern of iterative improvements.
PR #2291: Implementation of an Encoder Begin Callback for Go bindings, enhancing the callback capabilities in the context processing.
PR #2279: Fixes incorrect timestamps in transcriptions, addressing user-reported issues and improving output accuracy.
PR #2419: Merged change to use OS-generated temp file names for ffmpeg converted files, improving concurrent processing capabilities.
PR #2416: Merged fix for Go CUDA bindings building issues, indicating successful resolution of a critical build problem.
PR #2401: Sync with ggml updates, demonstrating active maintenance and integration with upstream changes.
The current state of open pull requests in the whisper.cpp
repository reflects a vibrant development environment focused on enhancing both functionality and performance. A significant number of these PRs are related to improving Go bindings (#2417, #2376, #2330), which suggests that there is a growing interest in making Whisper accessible through various programming languages. This trend aligns well with the project's goal of being lightweight and dependency-free while providing robust API support across multiple platforms.
The presence of PRs aimed at server improvements (#2406) and dynamic instruction set selection (#1261) indicates a focus on optimizing performance for diverse hardware configurations. This is crucial as users may deploy Whisper on various architectures ranging from high-end GPUs to mobile devices. The dynamic selection feature is particularly noteworthy as it allows the software to adapt its execution based on available hardware capabilities, thereby maximizing efficiency without requiring users to manage complex configurations manually.
Moreover, the introduction of CI tests (#2369) showcases a commitment to maintaining high code quality and reliability. This is essential as the project scales and more contributors join in. Continuous integration practices will help catch regressions early and ensure that new features do not introduce instability into the existing codebase.
However, there are notable concerns regarding the age of some open PRs. For instance, PRs like #1261 have been open for nearly a year without merging. This could indicate potential bottlenecks in the review process or prioritization challenges within the development team. Addressing these delays is vital; otherwise, it could lead to contributor frustration or disengagement over time.
Another area worth noting is the lack of recent merge activity compared to the volume of open PRs. While it's common for active projects to have many open contributions awaiting review, a balanced approach that ensures timely feedback and merges can help maintain momentum within the community.
In conclusion, while whisper.cpp
is experiencing healthy growth through numerous contributions aimed at expanding its capabilities and improving user experience, attention must be given to streamlining the review process and ensuring that contributors feel valued through timely engagement with their submissions.
Toliver (teejae)
Binozo
Mengqing Cao (MengqingCao)
Philippe Normand (philn)
Georgi Gerganov (ggerganov)
Johannes Gäßler (JohannesGaessler)
Salvatore Mesoraca (smeso)
Tim Miller (drasticactions)
UsernamesLame
hsinhoyeh
slaren
qnixsynapse
luoyu-intel
compilade
airMeng
zhentaoyu
Radoslav Gerganov (rgerganov)
Others: Various contributors made minor updates or fixes across different areas of the project.
The development team is actively engaged in enhancing the whisper.cpp
project with a clear focus on performance optimization and cross-platform compatibility. The collaborative nature of contributions suggests a well-coordinated effort towards achieving project goals while addressing user needs effectively.