‹ Reports
The Dispatch

OSS Report: ggerganov/whisper.cpp


Whisper.cpp Development Team Focuses on GPU and Multilingual Support Amidst High Community Engagement

Whisper.cpp, a high-performance implementation of OpenAI's Whisper ASR model, is actively being developed to enhance GPU support and improve multilingual transcription accuracy. The project, maintained by Georgi Gerganov and others, aims to provide a lightweight, dependency-free solution for ASR across various platforms.

Recent Activity

Recent issues and pull requests indicate a strong focus on addressing GPU backend problems, particularly with Vulkan and CUDA. This suggests an emphasis on optimizing performance and ensuring compatibility across diverse hardware. The team is also tackling compilation challenges on different platforms and enhancing language support.

Development Team Activity (Reverse Chronological Order)

  1. Toliver (teejae): Updated server code for temp file names.
  2. Binozo: Fixed CUDA build for Go bindings; README updates.
  3. Mengqing Cao: Added Ascend NPU instructions to README.
  4. Philippe Normand: Fixed libdir value in pkgconfig for CMake.
  5. Georgi Gerganov: Multiple fixes/enhancements across components; 14 commits.
  6. Johannes Gäßler: Fixed undefined behavior in ggml; CUDA features added.
  7. Salvatore Mesoraca: Fixed tensor operation issues in ggml.
  8. Tim Miller: Minor CMake configuration fixes.
  9. UsernamesLame: README updates; fixed example links.
  10. hsinhoyeh: Added Go binding features for context interface.
  11. slaren: Bug fixes/improvements across components.
  12. qnixsynapse: Minor CMake updates.
  13. luoyu-intel: SYCL and CUDA support updates.
  14. compilade: Major ggml tensor operations updates.
  15. airMeng: Minor updates across files.
  16. zhentaoyu: Extensive SYCL implementation updates.
  17. Radoslav Gerganov: Minor RPC handling bug fixes.

Of Note

Quantified Reports

Quantify Issues



Recent GitHub Issues Activity

Timespan Opened Closed Comments Labeled Milestones
7 Days 4 2 0 4 1
30 Days 34 16 21 34 1
90 Days 100 36 131 99 1
All Time 1296 651 - - -

Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.

Quantify commits



Quantified Commit Activity Over 30 Days

Developer Avatar Branches PRs Commits Files Changes
Georgi Gerganov 1 3/3/0 14 64 150892
zhentaoyu 1 0/0/0 1 10 582
Johannes Gäßler 1 0/0/0 4 16 536
compilade 1 0/0/0 1 2 286
Radoslav Gerganov 1 0/0/0 3 6 190
luoyu-intel 1 0/0/0 1 4 177
slaren 1 0/0/0 2 3 124
Justine Tunney 1 0/1/0 1 1 42
hsinhoyeh 1 0/1/0 1 6 39
Mengqing Cao 1 1/1/0 1 1 34
Salvatore Mesoraca 1 0/0/0 1 1 26
Binozo 1 2/1/0 1 2 11
Tim Miller 1 0/1/0 1 1 5
Meng, Hengyu 1 0/0/0 1 2 4
stormofice 1 1/1/0 1 1 4
Brad Murray 1 1/1/0 1 1 4
Toliver 1 1/1/0 1 1 3
UsernamesLame 1 1/1/0 1 2 3
Philippe Normand 1 1/1/0 1 1 2
Peng 1 1/1/0 1 1 2
Ivo von Putzer Reibegg 1 1/1/0 1 1 2
Eric Curtin 1 0/1/0 1 1 2
Akarshan Biswas 1 0/0/0 1 1 2
None (shivghai) 0 1/0/0 0 0 0
byoungdale (byoungdale) 0 1/0/0 0 0 0
Paweł Budzianowski (budzianowski) 0 1/0/1 0 0 0
None (thewh1teagle) 0 1/0/0 0 0 0
Dave Lewis (fromdavelewis) 0 1/0/0 0 0 0
None (definitelyuncertain) 0 1/0/1 0 0 0

PRs: created by that dev and opened/merged/closed-unmerged during the period

Detailed Reports

Report On: Fetch issues



Recent Activity Analysis

The GitHub repository for whisper.cpp has seen significant activity, with 645 open issues currently. Recent contributions indicate a mix of user-reported bugs, feature requests, and discussions about performance optimizations. Notably, there are recurring themes around GPU support, compilation issues across different platforms, and the need for better handling of specific languages and audio formats.

Several issues exhibit anomalies, such as users experiencing crashes or unexpected behavior when using certain models or configurations. For instance, there are reports of performance regressions in newer versions compared to older ones, particularly regarding the handling of CUDA and CoreML backends. Additionally, some users have raised concerns about the accuracy of transcriptions in languages other than English, suggesting that improvements are needed in multilingual support.

Issue Details

Most Recently Created Issues

  1. Issue #2420: Is there any way to get rid of [Blank Audio] in transcript?

    • Priority: Normal
    • Status: Open
    • Created: 0 days ago
    • Updated: N/A
  2. Issue #2418: No assigned threads when manually compiling on MSVC

    • Priority: Normal
    • Status: Open
    • Created: 1 day ago
    • Updated: N/A
  3. Issue #2415: Vulkan backend crashes with --processors > 1

    • Priority: High
    • Status: Open
    • Created: 6 days ago
    • Updated: N/A
  4. Issue #2413: Support for LiteRT android on device AI with GPU acceleration

    • Priority: Normal
    • Status: Open
    • Created: 7 days ago
    • Updated: N/A
  5. Issue #2412: won't compile on osx 12.5 M1

    • Priority: High
    • Status: Open
    • Created: 8 days ago
    • Updated: N/A

Most Recently Updated Issues

  1. Issue #2411: Fallback from Vulkan to CPU (Edited 7 days ago)

    • Discusses the instability of Vulkan on Windows/Linux and suggests the need for improved error handling.
  2. Issue #2361: Release v1.7.0 ?? (Edited 6 days ago)

    • Users express concerns over the timeline for new releases and features.
  3. Issue #2402: First load time in Nvidia Jetson AGX Xavier and Orin is more than 10 minutes (Edited 14 days ago)

    • Highlights performance issues on specific hardware configurations.
  4. Issue #2400: The recognition results with Vulkan are so bad (Edited 15 days ago)

    • Users report poor transcription quality when using Vulkan backend.
  5. Issue #2399: Failed to compile it with Vulkan (Edited 15 days ago)

    • Compilation issues related to Vulkan backend are discussed.

Themes and Commonalities

  • GPU Support Issues: A significant number of recent issues revolve around problems with GPU backends, particularly Vulkan and CUDA, indicating potential instability or performance regressions.
  • Compilation Problems: Users frequently report difficulties compiling on various platforms (Windows, macOS), especially when specific flags or dependencies are involved.
  • Language Handling: There are ongoing discussions about the accuracy of transcriptions in non-English languages, suggesting a need for enhancements in multilingual processing capabilities.
  • Performance Concerns: Several users have noted discrepancies in performance between different versions of the software, prompting discussions about optimization strategies.

This analysis reflects a vibrant community actively engaging with the whisper.cpp project while highlighting areas that may require further attention from maintainers to improve user experience and software reliability.

Report On: Fetch pull requests



Overview

The repository ggerganov/whisper.cpp currently has 64 open pull requests (PRs), with a variety of contributions aimed at enhancing the functionality, performance, and usability of the Whisper automatic speech recognition model. The PRs cover a wide range of topics, including Go bindings, server improvements, and GPU support.

Summary of Pull Requests

Open Pull Requests

  • PR #2417: Added temperature options for Go bindings. This improves model performance by allowing users to adjust temperature settings to reduce hallucination in outputs.

  • PR #2406: Server update to erase previous stdout text for multi-row outputs, enhancing usability during transcription.

  • PR #2330: Fix for Go bindings addressing missing ggml issues, indicating ongoing efforts to stabilize language bindings.

  • PR #1261: Dynamic selection of extended instruction sets for x86 architecture, which aims to improve binary distribution without compromising performance.

  • PR #2384: Updates the talk example to align with the latest GPT-2 implementation from ggml, showcasing adaptability to newer models.

  • PR #2376: Fixing Go binding makefile issues, reflecting ongoing maintenance and improvement of language bindings.

  • PR #2369: Addition of CI tests for ensuring code reliability across platforms, which is critical for maintaining software quality.

  • PR #2339: Another fix for Go bindings that builds on previous efforts, showing a pattern of iterative improvements.

  • PR #2291: Implementation of an Encoder Begin Callback for Go bindings, enhancing the callback capabilities in the context processing.

  • PR #2279: Fixes incorrect timestamps in transcriptions, addressing user-reported issues and improving output accuracy.

Closed Pull Requests

  • PR #2419: Merged change to use OS-generated temp file names for ffmpeg converted files, improving concurrent processing capabilities.

  • PR #2416: Merged fix for Go CUDA bindings building issues, indicating successful resolution of a critical build problem.

  • PR #2401: Sync with ggml updates, demonstrating active maintenance and integration with upstream changes.

Analysis of Pull Requests

The current state of open pull requests in the whisper.cpp repository reflects a vibrant development environment focused on enhancing both functionality and performance. A significant number of these PRs are related to improving Go bindings (#2417, #2376, #2330), which suggests that there is a growing interest in making Whisper accessible through various programming languages. This trend aligns well with the project's goal of being lightweight and dependency-free while providing robust API support across multiple platforms.

The presence of PRs aimed at server improvements (#2406) and dynamic instruction set selection (#1261) indicates a focus on optimizing performance for diverse hardware configurations. This is crucial as users may deploy Whisper on various architectures ranging from high-end GPUs to mobile devices. The dynamic selection feature is particularly noteworthy as it allows the software to adapt its execution based on available hardware capabilities, thereby maximizing efficiency without requiring users to manage complex configurations manually.

Moreover, the introduction of CI tests (#2369) showcases a commitment to maintaining high code quality and reliability. This is essential as the project scales and more contributors join in. Continuous integration practices will help catch regressions early and ensure that new features do not introduce instability into the existing codebase.

However, there are notable concerns regarding the age of some open PRs. For instance, PRs like #1261 have been open for nearly a year without merging. This could indicate potential bottlenecks in the review process or prioritization challenges within the development team. Addressing these delays is vital; otherwise, it could lead to contributor frustration or disengagement over time.

Another area worth noting is the lack of recent merge activity compared to the volume of open PRs. While it's common for active projects to have many open contributions awaiting review, a balanced approach that ensures timely feedback and merges can help maintain momentum within the community.

In conclusion, while whisper.cpp is experiencing healthy growth through numerous contributions aimed at expanding its capabilities and improving user experience, attention must be given to streamlining the review process and ensuring that contributors feel valued through timely engagement with their submissions.

Report On: Fetch commits



Repo Commits Analysis

Development Team and Recent Activity

Team Members and Recent Contributions

  1. Toliver (teejae)

    • Recent Activity: Updated server code to use OS-generated temp file names for converted files.
    • Files Changed: 1 file, 3 lines modified (+2, -1).
  2. Binozo

    • Recent Activity: Fixed CUDA build for Go bindings and updated README with build instructions.
    • Collaboration: Co-authored with Binozo.
    • Files Changed: 2 files, 11 lines modified (+11, -0).
  3. Mengqing Cao (MengqingCao)

    • Recent Activity: Added Ascend NPU instructions to the README.
    • Files Changed: 1 file, 34 lines added (+34).
  4. Philippe Normand (philn)

    • Recent Activity: Fixed libdir value in pkgconfig file for CMake.
    • Files Changed: 1 file, 2 lines modified (+1, -1).
  5. Georgi Gerganov (ggerganov)

    • Recent Activity: Extensive contributions including:
    • Multiple fixes and enhancements across various components (CUDA, Vulkan, ggml).
    • Reverted previous changes related to MSVC settings.
    • Updated multiple files related to synchronization and backend support.
    • Total of 14 commits in the last 30 days with significant changes across numerous files.
    • Files Changed: 64 files, 150,892 lines modified.
  6. Johannes Gäßler (JohannesGaessler)

    • Recent Activity: Contributed to fixing undefined behavior in ggml and added new features related to CUDA operations.
    • Files Changed: 16 files, 536 lines modified.
  7. Salvatore Mesoraca (smeso)

    • Recent Activity: Worked on fixing issues related to tensor operations in ggml.
    • Files Changed: 1 file, 26 lines modified (+5, -21).
  8. Tim Miller (drasticactions)

    • Recent Activity: Minor fixes in CMake configuration.
    • Files Changed: 1 file, 5 lines modified.
  9. UsernamesLame

    • Recent Activity: Updated README and fixed broken links in examples.
    • Files Changed: 2 files, 3 lines modified (+2, -1).
  10. hsinhoyeh

    • Recent Activity: Added features to Go bindings including parameters for context interface.
    • Files Changed: 6 files, 39 lines modified.
  11. slaren

    • Recent Activity: Contributed multiple bug fixes and improvements across various components.
    • Files Changed: 3 files, 124 lines modified.
  12. qnixsynapse

    • Recent Activity: Minor updates in CMake configurations.
    • Files Changed: 1 file, 2 lines modified.
  13. luoyu-intel

    • Recent Activity: Significant updates related to SYCL and CUDA support.
    • Files Changed: 4 files, 177 lines modified.
  14. compilade

    • Recent Activity: Major updates related to ggml tensor operations.
    • Files Changed: 2 files, 286 lines modified.
  15. airMeng

    • Recent Activity: Minor updates across various files.
    • Files Changed: 2 files, 4 lines modified.
  16. zhentaoyu

    • Recent Activity: Extensive updates to SYCL implementations and optimizations.
    • Files Changed: 10 files, 582 lines modified.
  17. Radoslav Gerganov (rgerganov)

    • Recent Activity: Minor bug fixes in RPC handling.
    • Files Changed: 6 files, 190 lines modified.
  18. Others: Various contributors made minor updates or fixes across different areas of the project.

Patterns and Themes

  • The majority of recent activity is concentrated around bug fixes and enhancements related to CUDA and Vulkan support, indicating a focus on improving performance and compatibility across different hardware platforms.
  • Georgi Gerganov remains the most active contributor with a substantial number of commits reflecting ongoing development efforts in multiple areas of the project.
  • Collaboration is evident with several co-authored commits suggesting a team-oriented approach towards problem-solving and feature development.
  • The team is actively addressing issues related to documentation and user guidance through README updates alongside code changes.

Conclusion

The development team is actively engaged in enhancing the whisper.cpp project with a clear focus on performance optimization and cross-platform compatibility. The collaborative nature of contributions suggests a well-coordinated effort towards achieving project goals while addressing user needs effectively.