The project in question is ggerganov/whisper.cpp
, a port of OpenAI's Whisper model in C/C++. This software provides high-performance inference for automatic speech recognition (ASR) with support for various hardware architectures and platforms. It is a plain C/C++ implementation without dependencies, optimized for Apple Silicon and supports AVX intrinsics for x86 architectures, among others. The project is under an MIT License and has a wide range of applications, including running on mobile devices, web browsers through WebAssembly, and even on Raspberry Pi.
The project is actively maintained with a substantial amount of stars (29901) and forks (2924), indicating a strong interest from the community. The repository has a large number of open issues (577), which suggests either an active user base reporting problems or a backlog of issues that need to be addressed by the development team.
The development team shows a pattern of active contributions with several members making significant updates to the codebase. There is a mix of minor and major commits, indicating ongoing maintenance as well as the addition of new features or optimizations. The presence of merged pull requests suggests that the team is responsive to contributions from the community. The activity across different branches indicates that work is being done in parallel on various aspects of the project.
From the recent activity, it can be concluded that:
Given the high number of open issues, it may be beneficial for the team to prioritize issue resolution to ensure stability and user satisfaction. Additionally, considering the diverse platforms supported by whisper.cpp
, continuous integration testing across these platforms could help catch potential compatibility issues early in the development process.
PR #2054: This PR is attempting to add hardware acceleration in the data structure and is linked to an issue in a different repository. It involves significant changes, including preparation for submitting Qualcomm's QNN backend. The PR was created and closed very quickly, which might indicate it was either a mistake or a very rapid iteration that needs careful review.
PR #2049: Fixes a small bug in the whisper.nvim
script. It seems like a straightforward fix, but it was closed quickly without being merged, which could mean the issue was resolved differently or the contribution was not accepted.
PR #2048: Adds a parameter to disable timestamps in the addon.node
example. Closed quickly without merge, potentially indicating that the change was either not needed or incorporated differently.
PR #2045: Fixes embedding of the Metal library by including an external header file inline. This is important for macOS users who rely on Metal for GPU acceleration. The PR was closed without merge, which might need further investigation to ensure the issue has been resolved.
PR #2044: Adds dtw
to server.cpp
. Closed quickly without merge, suggesting it might have been superseded by another solution or not required.
PR #2043: Adds AVX512 detection to Makefile and CMakeLists.txt, which is important for performance optimization on CPUs that support AVX512 instructions. Closed without merge, possibly due to being incomplete or needing further refinement.
PR #2005: Added a boolean to fold language-model tokens to lowercase in whisper_context_params
. This could be significant for grammar matching predictability. It received several review comments and commits before being closed.
PR #2000: Added additional parameters to addon.node
, improving output information control. It included changes to tests and source files and was closed recently.
PR #1990: Updated Swift package for Linux configuration, addressing platform-specific framework requirements. Closed recently and could impact Swift developers using whisper.cpp on Linux systems.
PR #1978: Added links to OPENVINO models, which can be significant for users utilizing OPENVINO for model optimization and inference acceleration. Closed recently after discussion about where to host pretrained model repositories.
PR #1973: Fixed an issue with CUDA installations where libcuda.so
is present only in stubs folders. This could be important for users setting up whisper.cpp with CUDA support. Closed recently after edits.
PR #1969: Wrote documentation on dependencies needed for building on Fedora Linux. This is helpful for users on Fedora systems looking to compile whisper.cpp.
PR #1952: Returned -1
in a specific function to avoid confusion/misunderstanding. While this seems minor, it's important for clarity in code behavior and was closed recently.
PR #1924: Provided a demo for Android capable of streaming audio in real-time. This is significant as it showcases the potential use of whisper.cpp in real-time applications on mobile devices. Closed recently after several edits and discussions.
PR #1913: Enabled node addon support for multiple input files and callback function support for providing progress. This enhances the functionality of the addon.node example and was closed recently after discussions and fixes.
PR #1854: Attempted to fix tokenization issues by using BPE Tokenizer from llama.cpp. This is particularly notable as tokenization directly affects transcription accuracy. Closed recently after extensive discussion and testing.
PR #1841: Implemented dynamic CUDA driver loader and static linking against CUDA runtime, potentially allowing binaries with CUDA support to run on systems without CUDA-supported GPUs. However, this change was reverted back to focus on a more general solution.
PR #1833: Introduced gRPC bidirectional streams for real-time service calls from various client environments using gRPC network protocol. Although not merged into master, it could live as its own repository due to dependencies introduced.
PR #1823: Proposed streaming raw audio from stdin
, which would allow piping audio from sources like ffmpeg
directly into whisper.cpp without knowing the stream's length beforehand. The PR aimed at reducing duplication with existing examples but wasn't merged; instead, efforts were directed towards refactoring shared streaming consumption code.
PR #1791: Generalized install locations based on architecture (e.g., /usr/lib64
for x86_64 on Fedora), making installation paths more flexible across different systems.
PR #1768: Added functionality to get encoder output, exposing ggml_tensor
outside of whisper.cpp's package, which may not align with the goal of minimizing external dependencies.
PR #1679: Command-style grammar implementation in main executable allows response-file usage as a single parameter, useful for working around command-line length limits or simplifying character escaping across platforms.
The open pull requests show active development and attempts at improving hardware acceleration support, bug fixes, feature enhancements (like timestamp disabling), and performance optimizations (AVX512 detection). Some PRs were closed very quickly without being merged, indicating either rapid iteration or alternative solutions being preferred.
Recently closed PRs demonstrate attention to detail (like lowercase folding for predictability), platform-specific improvements (Fedora build dependencies), real-time streaming capabilities (Android demo), and enhancements to existing examples (multiple input file support). These changes reflect ongoing efforts to make whisper.cpp more robust, user-friendly, and versatile across different use cases and platforms.
Significant closed PRs reveal attempts at broadening compatibility with dynamic CUDA driver loading and introducing modern communication protocols like gRPC for service calls. Efforts were also made to streamline codebases by addressing tokenization issues borrowed from llama.cpp and exploring new ways of handling audio input (stdin
streaming).
Overall, these pull requests highlight a community-driven approach to enhancing whisper.cpp's functionality while maintaining its simplicity and efficiency ethos.
Developer | Avatar | Branches | PRs | Commits | Files | Changes |
---|---|---|---|---|---|---|
Neo Zhang Jianyu | 1 | 0/0/0 | 1 | 1 | 1072 | |
Georgi Gerganov | 1 | 0/0/0 | 10 | 23 | 1067 | |
slaren | 1 | 0/0/0 | 1 | 6 | 945 | |
Carolinabanana | 1 | 0/0/0 | 1 | 12 | 622 | |
Meng, Hengyu | 1 | 0/0/0 | 1 | 2 | 157 | |
ulatekh | 1 | 2/4/0 | 4 | 9 | 108 | |
slashlib | 1 | 1/1/0 | 1 | 2 | 27 | |
Brad Murray | 1 | 1/1/0 | 1 | 1 | 25 | |
Ouadie EL FAROUKI | 1 | 0/0/0 | 1 | 1 | 21 | |
Rotem Dan | 1 | 1/1/0 | 1 | 1 | 9 | |
Slava Primenko | 1 | 1/1/0 | 1 | 1 | 4 | |
Abhilash Majumder | 1 | 0/0/0 | 1 | 1 | 3 | |
Daniel Bevenius | 1 | 0/0/0 | 1 | 1 | 2 | |
Didzis Gosko (didzis) | 0 | 2/0/0 | 0 | 0 | 0 | |
zhouwg (zhouwg) | 0 | 1/0/0 | 0 | 0 | 0 | |
Pedro Probst (pprobst) | 0 | 1/0/0 | 0 | 0 | 0 | |
Kendrick Taylor (sixcircuit) | 0 | 1/0/0 | 0 | 0 | 0 | |
Emmanuel Schmidbauer (eschmidbauer) | 0 | 1/0/1 | 0 | 0 | 0 |
PRs: created by that dev and opened/merged/closed-unmerged during the period
~~~
The project in question is ggerganov/whisper.cpp
, a port of OpenAI's Whisper model in C/C++. This software provides high-performance inference for automatic speech recognition (ASR) with support for various hardware architectures and platforms. It is a plain C/C++ implementation without dependencies, optimized for Apple Silicon and supports AVX intrinsics for x86 architectures, among others. The project is under an MIT License and has a wide range of applications, including running on mobile devices, web browsers through WebAssembly, and even on Raspberry Pi.
The project is actively maintained with a substantial amount of stars (29901) and forks (2924), indicating a strong interest from the community. The repository has a large number of open issues (577), which suggests either an active user base reporting problems or a backlog of issues that need to be addressed by the development team.
The development team shows a pattern of active contributions with several members making significant updates to the codebase. There is a mix of minor and major commits, indicating ongoing maintenance as well as the addition of new features or optimizations. The presence of merged pull requests suggests that the team is responsive to contributions from the community. The activity across different branches indicates that work is being done in parallel on various aspects of the project.
From the recent activity, it can be concluded that:
Given the high number of open issues, it may be beneficial for the team to prioritize issue resolution to ensure stability and user satisfaction. Additionally, considering the diverse platforms supported by whisper.cpp
, continuous integration testing across these platforms could help catch potential compatibility issues early in the development process.
Developer | Avatar | Branches | PRs | Commits | Files | Changes |
---|---|---|---|---|---|---|
Neo Zhang Jianyu | 1 | 0/0/0 | 1 | 1 | 1072 | |
Georgi Gerganov | 1 | 0/0/0 | 10 | 23 | 1067 | |
slaren | 1 | 0/0/0 | 1 | 6 | 945 | |
Carolinabanana | 1 | 0/0/0 | 1 | 12 | 622 | |
Meng, Hengyu | 1 | 0/0/0 | 1 | 2 | 157 | |
ulatekh | 1 | 2/4/0 | 4 | 9 | 108 | |
slashlib | 1 | 1/1/0 | 1 | 2 | 27 | |
Brad Murray | 1 | 1/1/0 | 1 | 1 | 25 | |
Ouadie EL FAROUKI | 1 | 0/0/0 | 1 | 1 | 21 | |
Rotem Dan | 1 | 1/1/0 | 1 | 1 | 9 | |
Slava Primenko | 1 | 1/1/0 | 1 | 1 | 4 | |
Abhilash Majumder | 1 | 0/0/0 | 1 | 1 | 3 | |
Daniel Bevenius | 1 | 0/0/0 | 1 | 1 | 2 | |
Didzis Gosko (didzis) | 0 | 2/0/0 | 0 | 0 | 0 | |
zhouwg (zhouwg) | 0 | 1/0/0 | 0 | 0 | 0 | |
Pedro Probst (pprobst) | 0 | 1/0/0 | 0 | 0 | 0 | |
Kendrick Taylor (sixcircuit) | 0 | 1/0/0 | 0 | 0 | 0 | |
Emmanuel Schmidbauer (eschmidbauer) | 0 | 1/0/1 | 0 | 0 | 0 |
PRs: created by that dev and opened/merged/closed-unmerged during the period
To compile whisper.cpp
on Windows with CUDA support, you'll need to adjust the build flags and paths to match the Windows environment. Here's a general guide that should help you set up the compilation process:
CUDA Toolkit: Ensure that you have the NVIDIA CUDA Toolkit installed on your Windows machine. You can download it from the official NVIDIA website.
Environment Variables: Make sure that the CUDA_PATH
environment variable is set correctly, pointing to your CUDA Toolkit installation directory (e.g., C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2
).
Compiler: Use a compiler that supports CUDA, such as Visual Studio with the NVIDIA Nsight Visual Studio Edition or nvcc
from the command line.
Build Flags: Modify the build flags to match the Windows environment. For example, remove -lculibos
and -lrt
, as they are not available on Windows. You may also need to adjust library paths from /usr/lib/wsl/lib
to the appropriate Windows directories.
Linker Flags: The linker flags will need to be updated for Windows. Typically, you would link against .lib
files rather than .so
files on Windows.
Here's an example of how you might modify your build flags for Windows:
NVCC = nvcc
PKG_CFLAGS += -DGGML_USE_CUBLAS -I"$(CUDA_PATH)\include"
PKG_CPPFLAGS += -DGGML_USE_CUBLAS -I"$(CUDA_PATH)\include"
PKG_LIBS += -lcuda -lcublas -lcudart -lcublasLt -L"$(CUDA_PATH)\lib\x64"
NVCCFLAGS = --forward-unknown-to-host-compiler -arch=$(CUDA_ARCH_FLAG)
OBJECTS_CUDA = whisper_cpp/ggml-cuda.o
Note that -lpthread
and -ldl
are also not applicable on Windows, so they should be removed as well.
nvcc
can find all necessary headers and libraries.If you're using CMake instead of a Makefile, you'll need to adjust your CMakeLists.txt
accordingly.
Keep in mind that compiling CUDA applications on Windows can be quite different from Linux due to differences in file paths, available libraries, and toolchain configurations. If you encounter specific errors during compilation, please provide those error messages for more targeted assistance.
PR #2054: This PR is attempting to add hardware acceleration in the data structure and is linked to an issue in a different repository. It involves significant changes, including preparation for submitting Qualcomm's QNN backend. The PR was created and closed very quickly, which might indicate it was either a mistake or a very rapid iteration that needs careful review.
PR #2049: Fixes a small bug in the whisper.nvim
script. It seems like a straightforward fix, but it was closed quickly without being merged, which could mean the issue was resolved differently or the contribution was not accepted.
PR #2048: Adds a parameter to disable timestamps in the addon.node
example. Closed quickly without merge, potentially indicating that the change was either not needed or incorporated differently.
PR #2045: Fixes embedding of the Metal library by including an external header file inline. This is important for macOS users who rely on Metal for GPU acceleration. The PR was closed without merge, which might need further investigation to ensure the issue has been resolved.
PR #2044: Adds dtw
to server.cpp
. Closed quickly without merge, suggesting it might have been superseded by another solution or not required.
PR #2043: Adds AVX512 detection to Makefile and CMakeLists.txt, which is important for performance optimization on CPUs that support AVX512 instructions. Closed without merge, possibly due to being incomplete or needing further refinement.
PR #2005: Added a boolean to fold language-model tokens to lowercase in whisper_context_params
. This could be significant for grammar matching predictability. It received several review comments and commits before being closed.
PR #2000: Added additional parameters to addon.node
, improving output information control. It included changes to tests and source files and was closed recently.
PR #1990: Updated Swift package for Linux configuration, addressing platform-specific framework requirements. Closed recently and could impact Swift developers using whisper.cpp on Linux systems.
PR #1978: Added links to OPENVINO models, which can be significant for users utilizing OPENVINO for model optimization and inference acceleration. Closed recently after discussion about where to host pretrained model repositories.
PR #1973: Fixed an issue with CUDA installations where libcuda.so
is present only in stubs folders. This could be important for users setting up whisper.cpp with CUDA support. Closed recently after edits.
PR #1969: Wrote documentation on dependencies needed for building on Fedora Linux. This is helpful for users on Fedora systems looking to compile whisper.cpp.
PR #1952: Returned -1
in a specific function to avoid confusion/misunderstanding. While this seems minor, it's important for clarity in code behavior and was closed recently.
PR #1924: Provided a demo for Android capable of streaming audio in real-time. This is significant as it showcases the potential use of whisper.cpp in real-time applications on mobile devices. Closed recently after several edits and discussions.
PR #1913: Enabled node addon support for multiple input files and callback function support for providing progress. This enhances the functionality of the addon.node example and was closed recently after discussions and fixes.
PR #1854: Attempted to fix tokenization issues by using BPE Tokenizer from llama.cpp. This is particularly notable as tokenization directly affects transcription accuracy. Closed recently after extensive discussion and testing.
PR #1841: Implemented dynamic CUDA driver loader and static linking against CUDA runtime, potentially allowing binaries with CUDA support to run on systems without CUDA-supported GPUs. However, this change was reverted back to focus on a more general solution.
PR #1833: Introduced gRPC bidirectional streams for real-time service calls from various client environments using gRPC network protocol. Although not merged into master, it could live as its own repository due to dependencies introduced.
PR #1823: Proposed streaming raw audio from stdin
, which would allow piping audio from sources like ffmpeg
directly into whisper.cpp without knowing the stream's length beforehand. The PR aimed at reducing duplication with existing examples but wasn't merged; instead, efforts were directed towards refactoring shared streaming consumption code.
PR #1791: Generalized install locations based on architecture (e.g., /usr/lib64
for x86_64 on Fedora), making installation paths more flexible across different systems.
PR #1768: Added functionality to get encoder output, exposing ggml_tensor
outside of whisper.cpp's package, which may not align with the goal of minimizing external dependencies.
PR #1679: Command-style grammar implementation in main executable allows response-file usage as a single parameter, useful for working around command-line length limits or simplifying character escaping across platforms.
The open pull requests show active development and attempts at improving hardware acceleration support, bug fixes, feature enhancements (like timestamp disabling), and performance optimizations (AVX512 detection). Some PRs were closed very quickly without being merged, indicating either rapid iteration or alternative solutions being preferred.
Recently closed PRs demonstrate attention to detail (like lowercase folding for predictability), platform-specific improvements (Fedora build dependencies), real-time streaming capabilities (Android demo), and enhancements to existing examples (multiple input file support). These changes reflect ongoing efforts to make whisper.cpp more robust, user-friendly, and versatile across different use cases and platforms.
Significant closed PRs reveal attempts at broadening compatibility with dynamic CUDA driver loading and introducing modern communication protocols like gRPC for service calls. Efforts were also made to streamline codebases by addressing tokenization issues borrowed from llama.cpp and exploring new ways of handling audio input (stdin
streaming).
Overall, these pull requests highlight a community-driven approach to enhancing whisper.cpp's functionality while maintaining its simplicity and efficiency ethos.
whisper.cpp
ProjectThe whisper.cpp
project is a C/C++ implementation of OpenAI's Whisper model, designed for high-performance inference of automatic speech recognition (ASR). The repository includes core implementation files, header files, example usage, and support for various hardware accelerations and platforms.
Core Implementation Files (whisper.cpp
, whisper.h
):
whisper.cpp
: Contains the main logic for the Whisper model operations. This file likely includes the implementation of the model's inference engine, handling input preprocessing, neural network computations, and output post-processing.whisper.h
: The header file for whisper.cpp
, declaring functions, classes, and variables used across the implementation. It ensures modularity and reusability of code by exposing necessary interfaces to other parts of the project.Tensor Operations (ggml.c
, ggml.h
):
ggml.c
: Implements core tensor operations essential for neural network computations. These operations are optimized for performance on different platforms using specific hardware features like SIMD instructions and GPU acceleration.ggml.h
: Header file for ggml.c
, defining the tensor data structures and operations. It acts as an interface for the neural network computations in whisper.cpp
.Example Implementations:
examples/main/main.cpp
: Demonstrates how to use the Whisper model to transcribe audio files. This file is crucial for new users to understand how to integrate and utilize the model in applications.examples/stream/stream.cpp
: Shows real-time audio transcription capabilities, highlighting the model's performance in streaming scenarios. This is particularly useful for applications requiring live audio processing.Modularity: The separation of core functionalities (whisper.*
), tensor operations (ggml.*
), and examples into distinct files suggests a modular design. This structure aids in maintainability and scalability as each component can be developed and debugged independently.
Optimization: The implementation details mention optimizations for various architectures (Apple Silicon, x86 with AVX, NVIDIA GPUs), indicating a focus on high performance across different hardware platforms.
Cross-Platform Support: The project supports a wide range of platforms (macOS, iOS, Android, Windows, Linux), making it versatile for integration into various applications.
Documentation and Examples: The presence of detailed example files and extensive documentation in the README helps users understand how to use the library effectively.
Error Handling: From the provided snippets and descriptions, it is not clear how robust the error handling mechanisms are within the codebase. Proper error handling is crucial for production-level code, especially in scenarios involving hardware acceleration and real-time processing.
Testing: There is no explicit mention of a testing framework or unit tests in the provided repository description. Implementing a comprehensive testing suite would be beneficial to ensure reliability and stability as the project evolves.
Code Comments: While not visible in the snippets provided, ensuring that the codebase is well-commented is vital for maintainability, especially when complex optimizations or hardware-specific code are involved.
The whisper.cpp
project exhibits a well-structured codebase with clear modularity and optimization for performance. It provides extensive platform support and documentation to aid users in integrating the Whisper model into their applications. However, areas such as error handling, testing, and detailed code comments (if lacking) could be potential points of improvement to enhance code quality further.
The project in question is ggerganov/whisper.cpp
, a port of OpenAI's Whisper model in C/C++. This software provides high-performance inference for automatic speech recognition (ASR) with support for various hardware architectures and platforms. It is a plain C/C++ implementation without dependencies, optimized for Apple Silicon and supports AVX intrinsics for x86 architectures, among others. The project is under an MIT License and has a wide range of applications, including running on mobile devices, web browsers through WebAssembly, and even on Raspberry Pi.
The project is actively maintained with a substantial amount of stars (29901) and forks (2924), indicating a strong interest from the community. The repository has a large number of open issues (577), which suggests either an active user base reporting problems or a backlog of issues that need to be addressed by the development team.
The development team shows a pattern of active contributions with several members making significant updates to the codebase. There is a mix of minor and major commits, indicating ongoing maintenance as well as the addition of new features or optimizations. The presence of merged pull requests suggests that the team is responsive to contributions from the community. The activity across different branches indicates that work is being done in parallel on various aspects of the project.
From the recent activity, it can be concluded that:
Given the high number of open issues, it may be beneficial for the team to prioritize issue resolution to ensure stability and user satisfaction. Additionally, considering the diverse platforms supported by whisper.cpp
, continuous integration testing across these platforms could help catch potential compatibility issues early in the development process.
Developer | Avatar | Branches | PRs | Commits | Files | Changes |
---|---|---|---|---|---|---|
Neo Zhang Jianyu | 1 | 0/0/0 | 1 | 1 | 1072 | |
Georgi Gerganov | 1 | 0/0/0 | 10 | 23 | 1067 | |
slaren | 1 | 0/0/0 | 1 | 6 | 945 | |
Carolinabanana | 1 | 0/0/0 | 1 | 12 | 622 | |
Meng, Hengyu | 1 | 0/0/0 | 1 | 2 | 157 | |
ulatekh | 1 | 2/4/0 | 4 | 9 | 108 | |
slashlib | 1 | 1/1/0 | 1 | 2 | 27 | |
Brad Murray | 1 | 1/1/0 | 1 | 1 | 25 | |
Ouadie EL FAROUKI | 1 | 0/0/0 | 1 | 1 | 21 | |
Rotem Dan | 1 | 1/1/0 | 1 | 1 | 9 | |
Slava Primenko | 1 | 1/1/0 | 1 | 1 | 4 | |
Abhilash Majumder | 1 | 0/0/0 | 1 | 1 | 3 | |
Daniel Bevenius | 1 | 0/0/0 | 1 | 1 | 2 | |
Didzis Gosko (didzis) | 0 | 2/0/0 | 0 | 0 | 0 | |
zhouwg (zhouwg) | 0 | 1/0/0 | 0 | 0 | 0 | |
Pedro Probst (pprobst) | 0 | 1/0/0 | 0 | 0 | 0 | |
Kendrick Taylor (sixcircuit) | 0 | 1/0/0 | 0 | 0 | 0 | |
Emmanuel Schmidbauer (eschmidbauer) | 0 | 1/0/1 | 0 | 0 | 0 |
PRs: created by that dev and opened/merged/closed-unmerged during the period