The MiniCPM-V project, managed by OpenBMB, focuses on developing multimodal LLMs (MLLMs) for vision-language understanding. These models integrate image and text inputs to generate high-quality text outputs. Notably, the MiniCPM-Llama3-V 2.5 model is a highlight, featuring 8 billion parameters and outperforming competitors like GPT-4V-1106 and Gemini Pro in various benchmarks. The project emphasizes performance, efficiency, and supports over 30 languages, making it suitable for deployment on diverse platforms including end-side devices.
Developer | Avatar | Branches | PRs | Commits | Files | Changes |
---|---|---|---|---|---|---|
Haoyu Li | 1 | 1/1/0 | 1 | 49 | 5610 | |
Tianyu Yu | 1 | 0/0/0 | 72 | 10 | 2048 | |
qianyu chen | 1 | 9/6/3 | 6 | 8 | 550 | |
Cui Junbo | 1 | 0/0/0 | 14 | 9 | 248 | |
JamePeng | 1 | 0/1/0 | 1 | 2 | 207 | |
Boke Syo | 1 | 1/1/0 | 1 | 1 | 159 | |
Hongji Zhu | 1 | 0/0/0 | 16 | 7 | 68 | |
Chao Jia | 1 | 0/0/0 | 3 | 3 | 30 | |
zhangao | 1 | 0/0/0 | 3 | 1 | 28 | |
YuzaChongyi | 1 | 1/1/0 | 3 | 5 | 23 | |
imarochkin | 1 | 0/0/0 | 1 | 3 | 9 | |
tc-mb | 1 | 0/0/0 | 2 | 2 | 4 | |
ByeongkiJeong | 1 | 2/2/0 | 2 | 1 | 4 | |
EC2 Default User | 1 | 0/0/0 | 1 | 1 | 3 | |
BU Fanchen 卜凡辰 | 1 | 1/1/0 | 1 | 1 | 2 | |
王鹤男 (whn09) | 0 | 1/1/0 | 0 | 0 | 0 | |
None (wanesoft) | 0 | 1/1/0 | 0 | 0 | 0 |
PRs: created by that dev and opened/merged/closed-unmerged during the period
Developer | Avatar | Branches | PRs | Commits | Files | Changes |
---|---|---|---|---|---|---|
Haoyu Li | 1 | 1/1/0 | 1 | 49 | 5610 | |
Tianyu Yu | 1 | 0/0/0 | 72 | 10 | 2048 | |
qianyu chen | 1 | 9/6/3 | 6 | 8 | 550 | |
Cui Junbo | 1 | 0/0/0 | 14 | 9 | 248 | |
JamePeng | 1 | 0/1/0 | 1 | 2 | 207 | |
Boke Syo | 1 | 1/1/0 | 1 | 1 | 159 | |
Hongji Zhu | 1 | 0/0/0 | 16 | 7 | 68 | |
Chao Jia | 1 | 0/0/0 | 3 | 3 | 30 | |
zhangao | 1 | 0/0/0 | 3 | 1 | 28 | |
YuzaChongyi | 1 | 1/1/0 | 3 | 5 | 23 | |
imarochkin | 1 | 0/0/0 | 1 | 3 | 9 | |
tc-mb | 1 | 0/0/0 | 2 | 2 | 4 | |
ByeongkiJeong | 1 | 2/2/0 | 2 | 1 | 4 | |
EC2 Default User | 1 | 0/0/0 | 1 | 1 | 3 | |
BU Fanchen 卜凡辰 | 1 | 1/1/0 | 1 | 1 | 2 | |
王鹤男 (whn09) | 0 | 1/1/0 | 0 | 0 | 0 | |
None (wanesoft) | 0 | 1/1/0 | 0 | 0 | 0 |
PRs: created by that dev and opened/merged/closed-unmerged during the period
The MiniCPM-V project, managed by the OpenBMB organization, is a series of end-side multimodal LLMs (MLLMs) designed for vision-language understanding. These models take image and text inputs to provide high-quality text outputs. Since its inception in February 2024, the project has released several versions of the model, with a focus on strong performance and efficient deployment. The most notable models in this series are MiniCPM-Llama3-V 2.5 and MiniCPM-V 2.0. MiniCPM-Llama3-V 2.5, in particular, is equipped with 8B parameters and surpasses many proprietary models like GPT-4V-1106 and Gemini Pro in performance. It supports over 30 languages and can be efficiently deployed on end-side devices due to optimizations like quantization and CPU/NPU optimizations.
finetune
directory.docs
directory.docs
directory.Other members like YuzaChongyi, JamePeng, lihytotoro, EC2 Default User, waxnkw, imarochkin, bokveizen, tc-mb, whn09, wanesoft have contributed through specific updates to scripts or documentation, often focusing on fine-tuning processes or enhancing the project's accessibility through improved documentation.
The development team is highly active with a clear focus on both enhancing the project's core functionalities (such as fine-tuning scripts) and maintaining robust documentation across multiple languages. There is a strong emphasis on ensuring the software runs efficiently across various platforms including end-side devices which is critical for deployment scenarios. Collaboration patterns suggest a mix of independent work with periodic integrative efforts where team members review and merge each other's contributions, ensuring consistency and quality across the project's outputs.
Recent activity on the GitHub repository for the project MiniCPM-V indicates a high level of engagement with 24 open issues, many of which were created within the last few days. The issues range from questions about specific functionalities, bug reports, to requests for enhancements and discussions on various aspects of the project.
A recurring theme in these issues is the challenge users face with finetuning models (#220, #216), indicating potential areas for improvement in documentation or functionality for easier model customization. Additionally, there is significant interest in community support and development (#217), highlighting the project's communal value.
bitsandbytes
8-bit quantization requires Accelerate - Last updated 0 days agoThese issues reflect active engagement from both maintainers and community members in addressing recent concerns and queries.
finetune/finetune.py
dataset
and trainer
. The use of dataclasses
for argument management is appropriate.safe_save_model_for_hf_trainer
and make_supervised_data_module
show good modularization of code. However, the script could benefit from more inline comments explaining complex sections.finetune/readme.md
<details>
tag) to organize content neatly.docs/inference_on_multiple_gpus.md
accelerate
library. Includes code snippets that are directly applicable.web_demo_streamlit-2_5.py
The provided files are well-crafted with specific functionalities clearly implemented. However, there are areas such as error handling and code documentation where improvements could be made to enhance robustness and maintainability. The use of modern Python features like data classes in configuration management is commendable.