The llama.cpp
project, developed by ggerganov, is a C/C++ library for large language model (LLM) inference, emphasizing portability and efficiency across diverse hardware. The project is in a state of rapid development, with a strong community presence. It supports a wide array of models and hardware optimizations, making it versatile for various deployment scenarios.
Eric Curtin (ericcurtin)
llama-run
.Georgi Gerganov (ggerganov)
Xuan-Son Nguyen (ngxson)
William Tambellini (WilliamTambellini)
Issixx (issixx)
Daniel Bevenius (danbev)
--no-warmup
option for embeddings to improve load speed.Molly Sophia (MollySophia)
Uvos (IMbackK)
Nuno (rare-magma)
Johannes Gäßler (JohannesGaessler)
Timespan | Opened | Closed | Comments | Labeled | Milestones |
---|---|---|---|---|---|
7 Days | 49 | 31 | 141 | 2 | 1 |
30 Days | 165 | 114 | 556 | 9 | 1 |
90 Days | 380 | 199 | 1593 | 14 | 1 |
All Time | 4389 | 4107 | - | - | - |
Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.
Developer | Avatar | Branches | PRs | Commits | Files | Changes |
---|---|---|---|---|---|---|
Georgi Gerganov | ![]() |
4 | 14/13/1 | 43 | 30 | 8761 |
Olivier Chafik | ![]() |
1 | 2/2/0 | 3 | 24 | 3915 |
Xuan-Son Nguyen | ![]() |
2 | 10/8/0 | 9 | 18 | 2308 |
Johannes Gäßler | ![]() |
1 | 4/5/0 | 5 | 21 | 2269 |
Eric Curtin | ![]() |
1 | 10/9/1 | 9 | 7 | 2044 |
Jeff Bolz | ![]() |
1 | 9/11/0 | 11 | 13 | 852 |
Eve | ![]() |
1 | 0/1/0 | 1 | 6 | 840 |
bandoti | ![]() |
1 | 1/1/0 | 1 | 11 | 517 |
Nicolò Scipione | ![]() |
1 | 1/1/0 | 1 | 3 | 240 |
Akarshan Biswas | ![]() |
1 | 1/2/0 | 2 | 7 | 231 |
uvos | ![]() |
1 | 5/4/0 | 4 | 7 | 207 |
stduhpf | ![]() |
1 | 1/1/0 | 1 | 3 | 192 |
Diego Devesa | ![]() |
3 | 6/4/0 | 8 | 13 | 186 |
Daniel Bevenius | ![]() |
1 | 2/2/0 | 2 | 3 | 167 |
Radoslav Gerganov | ![]() |
1 | 2/2/0 | 2 | 11 | 129 |
tc-mb | ![]() |
1 | 1/1/0 | 1 | 6 | 115 |
LostRuins Concedo | ![]() |
2 | 1/1/0 | 3 | 3 | 103 |
amd-dwang | ![]() |
1 | 1/1/0 | 1 | 1 | 94 |
Junil Kim | ![]() |
1 | 0/0/0 | 1 | 5 | 92 |
Haus1 | ![]() |
1 | 1/1/0 | 1 | 2 | 87 |
fj-y-saito | ![]() |
1 | 0/1/0 | 1 | 1 | 83 |
RunningLeon | ![]() |
1 | 0/1/0 | 1 | 1 | 60 |
Kyle Bruene | ![]() |
1 | 1/1/0 | 1 | 1 | 44 |
jiahao su | ![]() |
1 | 0/0/0 | 1 | 1 | 34 |
Nikita Sarychev | ![]() |
1 | 0/1/0 | 1 | 1 | 22 |
Nuno | ![]() |
1 | 3/3/0 | 3 | 2 | 21 |
Michael Engel | ![]() |
1 | 1/1/0 | 1 | 1 | 18 |
Jafar Uruç | ![]() |
1 | 1/1/0 | 1 | 3 | 16 |
Frank Mai | ![]() |
1 | 1/1/0 | 1 | 2 | 14 |
codezjx | ![]() |
1 | 1/1/0 | 1 | 2 | 11 |
lexasub | ![]() |
1 | 1/1/0 | 1 | 2 | 11 |
issixx | ![]() |
1 | 0/0/0 | 1 | 1 | 10 |
Molly Sophia | ![]() |
1 | 2/1/0 | 1 | 1 | 10 |
Ihar Hrachyshka | ![]() |
1 | 1/1/0 | 1 | 1 | 8 |
Jiří Podivín | ![]() |
1 | 1/1/0 | 1 | 1 | 5 |
peidaqi | ![]() |
1 | 1/1/0 | 1 | 1 | 4 |
William Tambellini | ![]() |
1 | 0/0/0 | 1 | 1 | 4 |
Bernhard M. Wiedemann | ![]() |
1 | 1/1/0 | 1 | 1 | 3 |
Michael Podvitskiy | ![]() |
1 | 2/1/0 | 1 | 1 | 2 |
David Renshaw | ![]() |
1 | 1/1/0 | 1 | 1 | 2 |
Emreerdog | ![]() |
1 | 1/1/0 | 1 | 1 | 2 |
someone13574 | ![]() |
1 | 1/1/0 | 1 | 1 | 2 |
Christopher Nielsen | ![]() |
1 | 1/1/0 | 1 | 1 | 1 |
musoles | ![]() |
1 | 0/1/0 | 1 | 1 | 1 |
Nigel Bosch (pnb) | 0 | 1/0/0 | 0 | 0 | 0 | |
Charles Xu (chaxu01) | 0 | 1/0/0 | 0 | 0 | 0 | |
Tei Home (teihome) | 0 | 1/0/0 | 0 | 0 | 0 | |
Steve Grubb (stevegrubb) | 0 | 1/0/1 | 0 | 0 | 0 | |
Herman Semenoff (GermanAizek) | 0 | 3/0/0 | 0 | 0 | 0 | |
Jordan Nanos (JordanNanos) | 0 | 1/0/1 | 0 | 0 | 0 | |
Gabe Goodhart (gabe-l-hart) | 0 | 1/0/1 | 0 | 0 | 0 | |
Dhruv Anand (Dhruvanand24) | 0 | 1/0/0 | 0 | 0 | 0 | |
None (savesanketsw) | 0 | 1/0/0 | 0 | 0 | 0 | |
None (fairydreaming) | 0 | 1/0/0 | 0 | 0 | 0 | |
Rémy Oudompheng (remyoudompheng) | 0 | 1/0/0 | 0 | 0 | 0 | |
Aleksei Nikiforov (AlekseiNikiforovIBM) | 0 | 1/0/0 | 0 | 0 | 0 |
PRs: created by that dev and opened/merged/closed-unmerged during the period
Risk | Level (1-5) | Rationale |
---|---|---|
Delivery | 3 | The project shows active development with numerous feature requests and bug fixes. However, the backlog of 282 open issues and the high volume of changes across multiple branches pose potential delivery risks if not managed effectively. The presence of draft features in pull requests also indicates ongoing development that may affect delivery timelines. |
Velocity | 3 | The project has a high velocity with significant contributions from multiple developers. However, the imbalance between opened and closed issues, along with the high number of open pull requests, suggests potential risks to maintaining this pace. Effective management is required to prevent integration issues that could slow down progress. |
Dependency | 4 | The project's extensive support for various models and hardware platforms introduces dependency risks. The reliance on external models and libraries requires careful integration and maintenance to avoid compatibility issues. Frequent updates and new feature requests further exacerbate these risks. |
Team | 3 | The team demonstrates strong collaboration and responsiveness to community feedback. However, the high volume of open issues and pull requests suggests potential strain on the team, which could lead to burnout or communication challenges if not managed carefully. |
Code Quality | 3 | Efforts to maintain code quality are evident through refactoring and documentation updates. However, minor issues like unnecessary whitespace changes and draft features indicate areas for improvement in code review processes. The rapid pace of development necessitates vigilant oversight to maintain high code quality standards. |
Technical Debt | 3 | Technical debt is being addressed incrementally through refactoring and bug fixes. However, the complexity introduced by supporting multiple architectures and platforms poses ongoing challenges. Continuous efforts are needed to manage technical debt effectively as new features are integrated. |
Test Coverage | 3 | While some pull requests include CI tests, the presence of draft features and unresolved bugs suggests potential gaps in test coverage. Comprehensive testing is crucial to ensure reliability as new features are developed and integrated. |
Error Handling | 2 | Recent commits show improvements in error handling, such as checks for missing parameters and fixes for segmentation faults. These enhancements indicate a proactive approach to managing errors, reducing risks related to error handling. |
The llama.cpp
project has seen significant recent activity, with a focus on expanding model support, enhancing performance, and addressing various bugs. The project continues to evolve rapidly, reflecting its active development and community engagement.
Model Support and Enhancement Requests: A recurring theme is the request for support of new models like Qwen2-VL (#9246), Pixtral by Mistral (#9440), and others. This indicates a strong demand for the project to keep up with emerging models in the AI landscape.
Performance Optimization: Several issues highlight performance concerns, such as slow inference times (#11114) and degradation with specific configurations (#10435). These suggest ongoing efforts to optimize the library for different hardware setups.
Backend and Compilation Challenges: Issues related to specific backends (e.g., Vulkan, ROCm) and compilation problems on various platforms (e.g., Windows, ARM64) are prevalent. This underscores the complexity of maintaining cross-platform compatibility.
Bug Fixes and Anomalies: There are numerous bug reports ranging from segmentation faults (#9949) to unexpected behavior in model outputs (#10681). These highlight the challenges of ensuring stability across diverse use cases.
Community Contributions: The project benefits from a vibrant community contributing enhancements, such as new sampling methods (#11057) and chat template support (#11056). This reflects strong user engagement and collaboration.
#11490: Feature Request for Deepseek Janus-Pro-7B & Janus-1.3B support.
#11488: Failed attempt to quantize model under Android termux proot.
#11469: Misc. bug regarding missing amd64 CPU docker images.
#11447: Compile bug related to CUDA Visual Studio on Windows.
The llama.cpp
project is actively addressing a wide array of issues, from model support requests to backend-specific bugs. The community's involvement is crucial in driving improvements and ensuring the project's adaptability to new challenges in the AI domain.
llama.cpp
Project#11489: server : add /apply-template endpoint for additional use cases of Minja functionality
/apply-template
endpoint to the server, allowing users to apply chat templates to messages without running inference. This PR also includes a CI test.#11484: server : update auto gen files comments [no ci]
server.cpp
, removing references to deps.sh
which was removed in a previous commit.#11477: Readme Update: Added IRIS under UI section
llama.cpp
, to the README under the UI section.llama.cpp
, potentially increasing its visibility and adoption.#11453: ggml : x2 speed for WASM by optimizing SIMD
#11452: llama: Add support for RWKV v7 architecture
llama.cpp
, potentially attracting users interested in RWKV models.#11480: Parse https://ollama.com/library/ syntax
llama-run
to parse URLs from the Ollama library, improving usability for users searching for models via the web UI.#11475: embedding : enable --no-warmup option
--no-warmup
option for embeddings, allowing users to disable warmup runs.#11473: llamacpp-server: Fixed wrong function name in llamacpp server unit test
#11471: Hip: Supress transformation warning in softmax.cu
#11466 & #11465 & #11457 & #11449 & #11448 & #11445 & #11441 & #11438 & #11437 & #11434 & #11427 & #11424 & #11423 & #11422 & #11420 & #11419 & #11418 & #11409 & #11407 & #11399 & #11396 & #11392 & #11386 & #11381 & #11380 & #11377 & #11375 & #11373 & #11372 & #11369 & #11368 & #11366 & #11364 & #11362 & #11356 & #11355
Overall, the llama.cpp
project demonstrates a dynamic development environment with continuous enhancements and active community engagement. The open PRs indicate ongoing efforts to expand model support and optimize performance, while closed PRs reflect successful resolutions of issues and implementation of new features.
examples/run/run.cpp
Includes and Preprocessor Directives: The file begins with platform-specific includes and conditional compilation directives, ensuring compatibility across different operating systems. This is a good practice for maintaining cross-platform support.
Namespace and Libraries: The use of standard libraries like <iostream>
, <vector>
, and <string>
is appropriate for handling I/O operations, collections, and string manipulations. The inclusion of third-party libraries like curl
and json.hpp
indicates the file's functionality related to network operations and JSON parsing.
Signal Handling: The file includes a signal handler for SIGINT, which is a good practice for gracefully terminating the program on user interruption.
Utility Functions: Functions like fmt
and printe
are well-defined for formatted output, enhancing code readability and maintainability.
Class Design: The Opt
class encapsulates command-line argument parsing logic. It uses private member variables to store default values and public methods to initialize parameters. This design promotes encapsulation and separation of concerns.
Error Handling: Error messages are printed using the printe
function, which is consistent throughout the file. However, the use of exit codes could be standardized across different error scenarios for clarity.
HTTP Client Implementation: The HttpClient
class demonstrates a robust implementation for handling HTTP requests using libcurl. It includes methods for setting headers, handling progress updates, and managing file locks, which are crucial for reliable network communication.
Code Organization: The file is organized into sections with clear responsibilities, such as argument parsing, HTTP client setup, and model initialization. This modular approach aids in understanding and maintaining the code.
Documentation: Inline comments are sparse but present in critical sections. Additional comments explaining complex logic or assumptions would improve code comprehensibility.
scripts/sync-ggml.last
Content: This file contains a single line representing a commit hash.
Purpose: It likely serves as a marker or reference point for syncing with a specific state of the ggml library.
ggml/src/ggml.c
Preprocessor Directives: The file includes several preprocessor directives for platform-specific configurations, ensuring compatibility across different environments.
Functionality: This file appears to handle core functionalities related to memory management, logging, and mathematical operations within the ggml library.
Error Handling: The use of custom abort functions (ggml_abort
) with backtrace capabilities is a robust approach for debugging critical errors.
Logging System: A structured logging system is implemented, allowing for customizable log levels and user-defined callbacks. This enhances the flexibility of logging outputs.
Memory Management: Functions like ggml_aligned_malloc
demonstrate careful consideration of memory alignment requirements, which is crucial for performance optimization on modern architectures.
ggml/src/ggml-cpu/ggml-cpu.c
Platform-Specific Code: Similar to other files in this project, it includes platform-specific optimizations, particularly for SIMD operations on various architectures (e.g., ARM NEON, AVX).
SIMD Operations: The file defines macros for SIMD operations that abstract architecture-specific intrinsics, promoting portability while leveraging hardware capabilities.
Atomic Operations: Implements atomic operations using platform-specific APIs (e.g., Windows Interlocked functions), ensuring thread safety in concurrent environments.
Threading Support: Includes threading constructs compatible with both Windows and POSIX systems, enhancing cross-platform multithreading support.
common/arg.cpp
Command-Line Parsing: Implements comprehensive command-line argument parsing with support for environment variable overrides. This flexibility enhances user experience by allowing configuration through multiple channels.
Modular Design: Uses classes (common_arg
) to encapsulate argument properties and behaviors, promoting reusability and maintainability.
Error Reporting: Provides detailed error messages during argument parsing failures, aiding users in correcting input errors promptly.
src/llama.cpp
Model Loading Logic: Contains functions related to loading model parameters from files, indicating its role as part of the model initialization process within the llama library.
Error Handling Strategy: Utilizes try-catch blocks extensively around critical operations like model loading, providing resilience against runtime exceptions.
Modular Architecture Support: Supports various model architectures through conditional logic, demonstrating flexibility in handling different model types within a single framework.
Eric Curtin (ericcurtin)
Georgi Gerganov (ggerganov)
gg/llama-kv-cache
and gg/build-pack-lib-include
.William Tambellini (WilliamTambellini)
Issixx (issixx)
Daniel Bevenius (danbev)
--no-warmup
option for llama-embeddings and made minor fixes to improve model load speed.Molly Sophia (MollySophia)
Emreerdog
Peidaqi
Xuan-Son Nguyen (ngxson)
Uvos (IMbackK)
Nuno (rare-magma)
Akarshan Biswas (qnixsynapse)
Michael Engel (engelmi)
Johannes Gäßler (JohannesGaessler)
Diego Devesa (slaren)
llama.cpp
project across various aspects such as performance optimization, bug fixes, feature additions, and documentation updates.