OSS Report: ggerganov/whisper.cpp

Aug. 18, 2024, 1:30 p.m. UTC This report was generated by Dispatch AI

Whisper.cpp Development Faces Performance and Compatibility Challenges Amidst Active Community Engagement

Whisper.cpp, a C++ implementation of OpenAI's Whisper ASR model, is actively developed to enhance speech-to-text capabilities across platforms. The project is driven by a community focused on optimizing performance and expanding compatibility, particularly with GPU acceleration.

Recent Activity

Recent issues and pull requests (PRs) reveal a focus on performance improvements and compatibility fixes. Key issues include #2356, reporting infinite loops in multilingual audio processing, and #2355, highlighting a regression in Vulkan support. These indicate ongoing challenges in maintaining stable GPU support. PRs like #2360 introduce accessibility improvements, while others focus on documentation enhancements (#2358) and Go bindings development (#2350).

Development Team Activity

Georgi Gerganov (ggerganov): 17 commits; major syncs and fixes in ggml and whisper.cpp.
Slaren: 5 commits; async copy fixes and CUDA optimizations.
Johannes Gäßler: 3 commits; CUDA implementation fixes.
Hipudding: 1 commit; significant CANN backend additions.
Others: Contributions span bug fixes, feature enhancements, and optimizations.

Of Note

High Open Issue Count: With 629 open issues, the project faces significant user engagement but also potential maintenance challenges.
Performance Regressions: Recurring issues with GPU support suggest instability in CUDA and Vulkan implementations.
Community Contributions: Active community involvement is evident, yet the high number of open PRs (63) suggests possible bottlenecks in managing contributions.
Go Bindings Focus: Multiple PRs enhance Go bindings, indicating growing cross-language compatibility needs.
Accessibility Enhancements: Recent efforts to improve accessibility reflect a commitment to inclusive design practices.

Quantified Reports

Quantify commits

Quantified Commit Activity Over 30 Days

Developer	Branches	PRs	Commits	Files	Changes
hipudding	1	0/0/0	1	18	10830
Georgi Gerganov	1	3/3/0	17	31	9610
slaren	1	0/0/0	5	35	2286
Dibakar Gope	1	0/0/0	1	2	2232
Johannes Gäßler	1	0/0/0	3	11	1470
0cc4m	1	0/0/0	2	1	1162
R0CKSTAR	1	0/0/0	2	9	1028
Mengqing Cao	1	1/1/0	1	3	149
Meng, Hengyu	1	0/0/0	1	5	134
Chen Xi	1	0/0/0	1	2	103
zhentaoyu	1	0/0/0	1	5	98
Salvatore Mesoraca	1	0/0/0	1	1	76
Molly Sophia	1	0/0/0	1	5	50
jdomke	1	0/0/0	1	5	46
l3utterfly	1	0/0/0	1	1	46
Conrad Kramer	1	0/0/0	1	2	43
Joe Todd	1	0/0/0	2	1	35
Mahesh Madhav	1	0/0/0	1	1	32
Ivan Filipov	1	0/0/0	1	1	24
Sigbjørn Skjæret	1	0/0/0	1	3	21
Ouadie EL FAROUKI	1	0/0/0	2	2	21
matteo	1	0/0/0	1	1	15
CarterLi999	1	0/0/0	1	1	12
wangshuai09	1	0/0/0	1	2	7
DavidKorczynski	1	0/0/0	1	1	6
Borislav Stanimirov	1	0/0/0	1	2	5
Clint Herron	1	0/0/0	1	1	5
Tony Wasserka	1	0/0/0	1	1	4
luoyu-intel	1	0/0/0	1	1	3
Justine Tunney	1	1/0/0	1	1	2
Alex O'Connell	1	0/0/0	1	1	2
Daniel Bevenius	1	0/0/0	1	1	2
Mark Zhuang	1	0/0/0	1	1	2
Jeroen Mostert	1	0/0/0	1	1	2
Daven Sanassy	1	1/1/0	1	1	1
None (10Jib)	0	1/0/0	0	0	0
György Balikó (gyorgy1)	0	1/0/0	0	0	0
None (hsinhoyeh)	0	1/0/0	0	0	0
Eric Curtin (ericcurtin)	0	1/0/0	0	0	0
Tim Miller (drasticactions)	0	1/0/0	0	0	0

_{PRs: created by that dev and opened/merged/closed-unmerged during the period}

Quantify Issues

Recent GitHub Issues Activity

Timespan	Opened	Closed	Comments	Labeled	Milestones
7 Days	12	3	9	12	1
30 Days	32	7	30	32	1
90 Days	107	34	165	105	1
All Time	1264	635	-	-	-

_{Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.}

Detailed Reports

Report On: Fetch issues

Recent Activity Analysis

The ggerganov/whisper.cpp repository currently has 629 open issues, indicating a high level of ongoing activity and user engagement. Recent issues highlight various challenges users face, including performance regressions, compilation problems, and specific feature requests. Notably, there are recurring themes around GPU support, model performance inconsistencies, and the need for better handling of non-English languages.

Several issues exhibit anomalies, such as the frequent occurrence of hallucinations in transcriptions, particularly with certain models or configurations. Users also report problems with audio processing, including infinite loops during transcription and incorrect handling of timestamps. The presence of multiple issues related to CUDA and OpenCL suggests potential compatibility or performance concerns that need addressing.

Issue Details

Here are some of the most recently created and updated issues:

Issue #2362: Put OpenVINO and OpenBLAS together gives better performance
- Priority: Normal
- Status: Open
- Created: 0 days ago
- This issue discusses combining OpenVINO and OpenBLAS for improved performance on specific setups.
Issue #2361: Release v1.7.0 ??
- Priority: Normal
- Status: Open
- Created: 1 day ago
- A discussion about the timeline and features for the upcoming release v1.7.0.
Issue #2359: How to use a .safetensors file in this library?
- Priority: Normal
- Status: Open
- Created: 1 day ago
- Users inquire about compatibility with newer file formats in the ecosystem.
Issue #2356: Transcribing audio files to text goes into infinite loop for audios with multiple languages
- Priority: High
- Status: Open
- Created: 2 days ago
- Reports an infinite loop issue when processing multilingual audio files, which is critical for usability.
Issue #2355: [Regression] No longer compiles with Vulkan
- Priority: High
- Status: Open
- Created: 2 days ago
- Indicates a regression in Vulkan support since a recent commit, affecting users relying on this backend.
Issue #2310: Whisper.cpp consumes unusually large amounts of system memory when transcribing very long wave files
- Priority: High
- Status: Open
- Created: 29 days ago
- Highlights memory consumption issues during long audio transcriptions, which could lead to crashes.
Issue #2304: Improvement video chat
- Priority: Normal
- Status: Open
- Created: 34 days ago
- Suggests improvements for video chat functionality using whisper.cpp.

Important Themes

Performance Issues: Many recent issues focus on performance regressions, particularly with GPU acceleration and memory usage during transcription.
Compatibility Concerns: Users frequently report compilation problems related to CUDA and Vulkan support, indicating potential instability in these areas.
Feature Requests: There is a clear demand for additional features such as better handling of non-English languages and improved integration with modern file formats (e.g., .safetensors).
User Experience: Issues like infinite loops during transcription and incorrect timestamp handling suggest that user experience could be significantly improved.

This analysis reflects the current state of user engagement and highlights areas where the project may require further development or stabilization efforts.

Report On: Fetch pull requests

Overview

The provided dataset includes a comprehensive list of pull requests (PRs) from the ggerganov/whisper.cpp repository, which focuses on implementing OpenAI's Whisper automatic speech recognition model in C++. The dataset comprises 63 open PRs and numerous closed ones, showcasing a variety of enhancements, bug fixes, and feature additions.

Summary of Pull Requests

PR #2360: Use colorblind friendly TTY color scheme
- State: Open
- Created: 1 day ago
- Updates the terminal color scheme to be more accessible for colorblind users while maintaining readability across different backgrounds.
PR #2358: Fix broken links in README.md
- State: Open
- Created: 1 day ago
- Corrects broken links in the documentation, ensuring users can access relevant resources.
PR #2350: feat(go binding): add beamsize/entropythold/maxcontext to context interface
- State: Open
- Created: 6 days ago
- Introduces additional parameters for Go bindings, enhancing the flexibility of the context interface.
PR #2346: Set MSVC to use UTF-8 on source files
- State: Open
- Created: 6 days ago
- Ensures that MSVC compiles source files as UTF-8, preventing encoding issues across different systems.
PR #2339: fix go bindings
- State: Open
- Created: 12 days ago
- Addresses issues with Go bindings, including updates to links and flags for non-Apple systems.
PR #2330: fix go bindings
- State: Open
- Created: 18 days ago
- Further fixes to Go bindings, resolving missing include paths and improving build instructions.
PR #2279: Incorrect timestamps
- State: Open
- Created: 46 days ago
- Fixes timestamp generation logic in the transcription output, enhancing accuracy.
PR #2272: Fix MKL build issue by correctly finding and linking MKL libraries
- State: Open
- Created: 48 days ago
- Resolves issues with building against Intel's MKL libraries, improving compatibility.
PR #2291: Implementing Encoder Begin Callback for golang binding
- State: Open
- Created: 41 days ago
- Adds a callback function to the Go binding for better control over processing.
PR #2264: build : fix typo in CMakeLists.txt
- State: Open
- Created: 52 days ago
- Corrects a minor typo in the build configuration file.
PR #2254: kommand proj
- State: Open
- Created: 57 days ago
- Introduces a new project with various improvements and features related to command handling.
PR #2184: Add support for quantization and custom audio context size to OpenVino
- State: Open
- Created: 85 days ago
- Enhances performance options when using OpenVino by adding quantization support.
PR #2127: whisper grammar: experimental implementation with boost::spirit
- State: Open
- Created: 103 days ago
- An experimental parser implementation aimed at improving grammar recognition accuracy.
PR #2095: Fixed incorrect docker example in readme
- State: Open
- Created: 114 days ago
- Updates Docker examples in the documentation to correct image references.
PR #2075: Up OpenBLAS and cuda-toolkit versions build.yml
- State: Open
- Created: 123 days ago
- Upgrades dependencies related to BLAS and CUDA for improved performance.

Analysis of Pull Requests

The analysis of the pull requests reveals several key themes and trends within the development process of whisper.cpp.

Common Themes:

Accessibility Improvements: The recent PRs highlight a strong focus on accessibility, particularly with PR #2360 introducing a colorblind-friendly terminal color scheme. This reflects an awareness of user diversity and the need for inclusive design practices within software development.
Documentation Enhancements: There is a consistent effort to improve documentation as seen in PRs like #2358 (fixing broken links) and #2095 (updating Docker examples). This is crucial for user onboarding and ensuring that developers can effectively utilize the library without encountering obstacles due to outdated or incorrect information.
Go Bindings Development: A notable number of PRs (e.g., #2350, #2339, and #2330) are dedicated to enhancing Go bindings. This indicates an expanding user base that utilizes Go for integrating Whisper functionalities into their applications, suggesting that cross-language compatibility is becoming increasingly important for the project’s growth.
Bug Fixes and Performance Optimizations: Many PRs are focused on fixing bugs (e.g., timestamp issues in PR #2279) and optimizing performance (e.g., quantization support in PR #2184). This ongoing maintenance is vital for ensuring reliability and efficiency as more users adopt the library for real-time applications.
Community Engagement: The discussions surrounding several PRs indicate active community engagement where contributors seek feedback from maintainers or other developers (e.g., discussions on Go CI in PR #2350). This collaborative environment fosters innovation and helps maintain high-quality contributions.

Notable Anomalies:

The repository has a significant number of open pull requests (63), which may suggest challenges in managing contributions or prioritizing features versus maintenance tasks.
Some PRs remain open for extended periods without merging or clear resolutions (e.g., PRs related to experimental features), indicating potential hesitance around adopting new approaches or technologies that may not align with the project's core objectives.
The presence of multiple PRs aimed at fixing similar issues (e.g., multiple fixes for Go bindings) could indicate a lack of clarity or documentation regarding existing implementations, leading to redundant efforts by contributors.

Conclusion:

The pull requests within whisper.cpp reflect a vibrant development community focused on enhancing accessibility, improving documentation, optimizing performance, and expanding language support through Go bindings. However, the high number of open PRs suggests that there may be challenges in effectively managing contributions and prioritizing tasks within the project’s roadmap. Addressing these challenges will be essential as the project continues to grow and evolve in response to user needs and technological advancements.

Report On: Fetch commits

Repo Commits Analysis

Development Team and Recent Activity

Team Members and Recent Contributions

Georgi Gerganov (ggerganov)
- Recent Activity: 17 commits with 9610 changes across 31 files.
- Key Contributions:
- Syncing and fixing various components related to the ggml library.
- Addressed issues in the whisper.cpp file, including handling empty mel spectrograms and using Vulkan as a GPU backend.
- Collaborated on multiple PRs, including those related to CI workflows and build configurations.
Salvatore Mesoraca (smeso)
- Recent Activity: 1 commit with 76 changes across 1 file.
- Key Contributions: Implemented support for forward pass broadcasting in ggml_sub.
Slaren (slaren)
- Recent Activity: 5 commits with 2286 changes across 35 files.
- Key Contributions:
- Fixed issues related to async copy from CPU in the ggml-backend.
- Collaborated on various CUDA-related improvements and optimizations.
Mengqing Cao (MengqingCao)
- Recent Activity: 1 commit with 149 changes across 3 files.
- Key Contributions: Added support for Ascend NPU in whisper.cpp.
Hipudding
- Recent Activity: 1 commit with 10830 changes across 18 files.
- Key Contributions: Major additions related to the CANN backend.
Ouadie El Farouki (OuadiElfarouki)
- Recent Activity: 2 commits with 21 changes across 2 files.
- Key Contributions: Updated SYCL device filtering.
Johannes Gäßler (JohannesGaessler)
- Recent Activity: 3 commits with 1470 changes across 11 files.
- Key Contributions: Made several fixes and optimizations in CUDA implementations.
Molly Sophia (MollySophia)
- Recent Activity: 1 commit with 50 changes across 5 files.
- Key Contributions: Added epsilon as a parameter for group normalization.
Justine Tunney (jart)
- Recent Activity: 1 commit with 2 changes across 1 file.
- Key Contributions: Fixed overflows in the ELU function.
Jdomke
- Recent Activity: 1 commit with 46 changes across 5 files.
- Key Contributions: Improved runtime SVE configuration reading.
CarterLi999
- Recent Activity: 1 commit with 12 changes across 1 file.
- Key Contributions: Fixed inactive elements handling for RISC-V vector types.
Others (including contributors like R0CKSTAR, airMeng, iboB, conradev, etc.) made smaller contributions focusing on various aspects of the project including bug fixes, feature enhancements, and optimizations.

Patterns and Themes

The team has been actively working on enhancing the performance and compatibility of the whisper.cpp project, particularly focusing on GPU acceleration through CUDA and Vulkan backends.
There is a notable emphasis on collaboration among team members, as seen in co-authored commits and shared contributions to significant features like the Ascend NPU support and improvements to the ggml library.
Recent activities indicate ongoing efforts to address bugs and optimize existing functionalities, reflecting a responsive development approach to user feedback and technical challenges.

Conclusions

The development team is engaged in a robust cycle of feature enhancement, bug fixing, and performance optimization within the whisper.cpp project. The collaborative nature of their work suggests a strong commitment to maintaining high-quality standards while adapting to evolving requirements in speech recognition technology.