The MiniCPM-V project by OpenBMB focuses on developing multimodal large language models (MLLMs) optimized for mobile deployment. The latest iteration, MiniCPM-Llama3-V 2.5, competes with GPT-4V in performance and supports over 30 languages. The project's active maintenance and frequent updates suggest a positive trajectory, evidenced by its significant community engagement (2,198 stars and 152 forks on GitHub).
Key Points:
Recent development activities show a concerted effort by the team to refine the project's documentation and enhance its functional capabilities. The team members, including Hongji Zhu, YuzaChongyi, and others, have been actively updating README files, fine-tuning scripts, and demo applications.
Collaboration Patterns:
The recent issues and PRs suggest a focus on refining the model’s performance and usability:
The MiniCPM-V project is marked by robust development efforts aimed at enhancing multimodal language model performance and accessibility. While the team demonstrates effective collaboration and responsiveness to issues, challenges such as performance discrepancies and dependency conflicts need addressing to maintain the project’s trajectory towards becoming a reliable tool for mobile-based AI solutions.
Developer | Avatar | Branches | PRs | Commits | Files | Changes |
---|---|---|---|---|---|---|
Tianyu Yu | 1 | 0/0/0 | 2 | 27 | 2901 | |
YuzaChongyi | 1 | 3/3/0 | 3 | 6 | 40 | |
qianyu chen | 1 | 0/1/0 | 1 | 9 | 37 | |
Hongji Zhu | 1 | 0/0/0 | 5 | 3 | 35 | |
Cui Junbo | 1 | 0/0/0 | 2 | 2 | 14 | |
uuhc | 1 | 1/1/0 | 1 | 1 | 3 | |
Yuan Yao | 1 | 0/0/0 | 1 | 1 | 2 |
PRs: created by that dev and opened/merged/closed-unmerged during the period
~~~
The MiniCPM-V project by OpenBMB focuses on developing multimodal large language models (MLLMs) optimized for mobile deployment. The latest iteration, MiniCPM-Llama3-V 2.5, competes with GPT-4V in performance and supports over 30 languages. The project demonstrates a strong development trajectory with frequent updates and a high level of community engagement, evidenced by its GitHub stars and forks. The team's recent activities suggest a robust collaborative environment with an emphasis on refining model functionalities and enhancing user documentation.
Notable Elements:
The development team, including Hongji Zhu, YuzaChongyi, uuhc, Yuan Yao, Cui Junbo, Tianyu Yu, and qianyu chen, has been actively updating the project’s documentation and refining its features. Recent commits primarily focus on updating README files and fine-tuning scripts, indicating ongoing efforts to improve clarity in user guidance and model performance.
Collaboration Patterns:
Recent Issues and PRs:
The MiniCPM-V project is marked by its ambitious scope and active development cycle. It stands out for its multilingual capabilities and focus on mobile optimization. However, attention needs to be given to resolving performance issues and dependency conflicts to maintain its trajectory. The project benefits from a collaborative team that is responsive to community feedback as evidenced by their quick handling of recent issues and updates.
Developer | Avatar | Branches | PRs | Commits | Files | Changes |
---|---|---|---|---|---|---|
Tianyu Yu | 1 | 0/0/0 | 2 | 27 | 2901 | |
YuzaChongyi | 1 | 3/3/0 | 3 | 6 | 40 | |
qianyu chen | 1 | 0/1/0 | 1 | 9 | 37 | |
Hongji Zhu | 1 | 0/0/0 | 5 | 3 | 35 | |
Cui Junbo | 1 | 0/0/0 | 2 | 2 | 14 | |
uuhc | 1 | 1/1/0 | 1 | 1 | 3 | |
Yuan Yao | 1 | 0/0/0 | 1 | 1 | 2 |
PRs: created by that dev and opened/merged/closed-unmerged during the period
Developer | Avatar | Branches | PRs | Commits | Files | Changes |
---|---|---|---|---|---|---|
Tianyu Yu | 1 | 0/0/0 | 2 | 27 | 2901 | |
YuzaChongyi | 1 | 3/3/0 | 3 | 6 | 40 | |
qianyu chen | 1 | 0/1/0 | 1 | 9 | 37 | |
Hongji Zhu | 1 | 0/0/0 | 5 | 3 | 35 | |
Cui Junbo | 1 | 0/0/0 | 2 | 2 | 14 | |
uuhc | 1 | 1/1/0 | 1 | 1 | 3 | |
Yuan Yao | 1 | 0/0/0 | 1 | 1 | 2 |
PRs: created by that dev and opened/merged/closed-unmerged during the period
The MiniCPM-V project, managed by the organization OpenBMB, aims to develop a series of multimodal large language models (MLLMs) that can be efficiently deployed on mobile devices. The latest model in this series, MiniCPM-Llama3-V 2.5, boasts performance comparable to GPT-4V and supports over 30 languages. The project is actively maintained with frequent updates and improvements, indicating a robust development trajectory. The repository has garnered significant interest with 2,198 stars and 152 forks, reflecting its relevance and utility in the AI community.
1 day ago - Merge pull request #108 from uuhc/uuhc
requirements.txt
(+3, -0)1 day ago - Update README_en.md
README_en.md
(+2, -2)1 day ago - Update README.md
README.md
(+2, -2)2 days ago - fix web_demo_2.5 for int4
web_demo_2.5.py
(+8, -3)2 days ago - Update README_en.md
README_en.md
(+4, -4)2 days ago - Update README.md
README.md
(+4, -4)1 day ago - Merge pull request #112 from YuzaChongyi/main
finetune/readme.md
(+3, -0)1 day ago - update finetune readme
finetune/readme.md
(+3, -0)1 day ago - Merge pull request #111 from YuzaChongyi/main
finetune/finetune_ds.sh
(+1, -0), finetune/trainer.py
(+1, -2)1 day ago - fix finetune
finetune/finetune_ds.sh
(+1, -0), finetune/trainer.py
(+1, -2)2 days ago - Merge pull request #92 from YuzaChongyi/main
README.md
(+8, -8), README_en.md
(+8, -9), assets/airplane.jpeg
(added)2 days ago – update demo – Files: README.md (+8,-8), README_en.md (+8,-9), assets/airplane.jpeg (added) – Details: Updated demo-related content.
2 days ago – update table – Files: README.md (+2,-2), README_en.md (+2,-2) – Details: Updated tables in READMEs.
2 days ago – update readme – Files: README.md (+2,-2), README_en.md (+1,-1) – Details: Minor updates to READMEs.
2 days ago – update readme – Files: README.md (+1,-1) – Details: Minor update to main README.
2 days ago – Update to MiniCPM-Llama3-V 2.5 – Files: Multiple files updated and added including images and markdown files. – Details: Major update introducing MiniCPM-Llama3-V 2.5.
The recent activities indicate a highly collaborative environment with multiple team members frequently updating documentation and refining features related to model deployment and fine-tuning capabilities:
README.md
, README_en.md
) suggest an emphasis on clear communication and user guidance.finetune/readme.md
, finetune/finetune_ds.sh
, finetune/trainer.py
) highlight ongoing efforts to enhance model training processes.web_demo_2.5.py
) indicate active work on making the models more accessible for testing and demonstration purposes.Overall, the team appears well-coordinated with a clear focus on both improving model performance and ensuring ease of use for end-users through comprehensive documentation and support for various deployment scenarios.
Recent GitHub issue activity for the OpenBMB/MiniCPM-V project has been notably high, with a significant number of issues created and updated within the last few days.
Several issues exhibit notable anomalies or complications. For instance, #121 highlights a significant performance discrepancy between two model versions, which could indicate underlying inefficiencies or bugs in the int4 model. Issue #120 reports conflicting dependencies in the requirements.txt file, which could hinder new users from setting up the project. Additionally, issue #116 reports a critical runtime error due to shape mismatches in tensors, which could be a blocker for users trying to run the model.
Themes among the issues include performance concerns (e.g., #121), dependency conflicts (e.g., #120), and runtime errors (e.g., #116). There are also multiple requests for documentation and technical reports (e.g., #122), indicating a demand for more comprehensive project documentation.
Issue #122: tech report or docs
Issue #121: int4和bffloat16推理时间问题(着急)
Issue #120: package versions have conflicting dependencies.
Issue #119: 是否有,仿OpenAI API风格的demo运行文件
Issue #118: 请问能不能训练特定场景,然后转成ONNX
Issue #115: M1,16G带不动如何卸载?
Issue #112: update finetune readme
Issue #111: fix finetune
Issue #108: feat: update requirements
Issue #106: 支持ollama部署,thx
web_demo_streamlit-2_5.py
(+98 lines)web_demo_streamlit.py
(+99 lines)ds_config_zero2.json
(+52 lines)finetune_ds.sh
(+48 lines)finetune_minicpmv.py
(+477 lines)minicpmv/model/configuration_minicpm.py
(+216 lines)minicpmv/model/modeling_minicpm.py
(+1454 lines)minicpmv/model/modeling_minicpmv.py
(+422 lines)minicpmv/model/resampler.py
(+164 lines)finetune/readme.md
(+3 lines)finetune/finetune_ds.sh
(+1 line)finetune/trainer.py
(+1 line, -2 lines)requirements.txt
(+3 lines)README.md
(+8 lines, -8 lines)README_en.md
(+8 lines, -9 lines)assets/airplane.jpeg
(added)Several recent PRs (#112, #111, #108) were created and closed within a day. This indicates a responsive maintenance process for minor fixes and updates.
PRs like #81 introduced substantial additions to the fine-tuning scripts using Huggingface Trainer. These changes are crucial for users looking to customize their models.
PRs such as #71 and #49 focused on improving documentation by adding descriptions and correcting typos. These enhancements are essential for maintaining clear communication with users.
The project shows active development with frequent updates and quick resolutions for minor issues. However, some open PRs like #61 and #36 have been pending for over a month, suggesting they may need more attention or resources for review. The recent focus on fine-tuning capabilities and documentation improvements reflects ongoing efforts to enhance usability and functionality.
chat.py
torch
, transformers
, and PIL
.init_omni_lmm
: Initializes the OmniLMM model, tokenizer, and image processor. Handles device setup and model loading.expand_question_into_multimodal
: Converts questions into a multimodal format by embedding image tokens.wrap_question_for_omni_lmm
: Prepares the input data for the model by tokenizing and formatting the conversation.OmniLMM12B
, OmniLMM3B
, MiniCPMV2_5
: Wrapper classes for different model versions, encapsulating methods to handle chat interactions.OmniLMMChat
: Main class to select the appropriate model version based on the provided path.finetune/finetune.py
torch
, transformers
, and custom modules.ModelArguments
, DataArguments
, TrainingArguments
: Define configurations for model paths, data paths, and training parameters.rank0_print
: Utility function to print messages only from rank 0 in distributed settings.make_supervised_data_module
: Prepares datasets and collators for supervised fine-tuning.get_parameter_number
: Calculates the number of trainable and total parameters in the model.train
: Main function to handle training logic, including parsing arguments, setting up the model, tokenizer, and data modules, and initiating training using a custom trainer (CPMTrainer
).web_demo_2.5.py
create_component
: Utility to create Gradio components dynamically based on provided parameters.chat
, upload_img
, respond
, regenerate_button_clicked
: Functions to handle various interactions within the Gradio interface (e.g., uploading images, responding to user queries).requirements.txt
torch
, transformers
) as well as specialized ones (gradio
, timm
).finetune/dataset.py
SupervisedDataset
: Custom dataset class for supervised fine-tuning. Implements necessary methods like __len__
and __getitem__
.data_collator
, conversation_to_ids
, etc.) to preprocess data, convert conversations into token IDs, slice images, etc.omnilmm/model/omnilmm.py
OmniLMMConfig
), vision modules (create_vision_module
), main model class (OmniLMMModel
), and causal LM class (OmniLMMForCausalLM
).The source code is well-structured with clear separation of concerns across different files. Each file serves a distinct purpose ranging from model interaction (chat.py
), fine-tuning (finetune/finetune.py
), web demo setup (web_demo_2.5.py
), dependency management (requirements.txt
), dataset handling (finetune/dataset.py
), to core model architecture (omnilmm/model/omnilmm.py
). The code quality is high with adequate error handling, clear documentation, and proper use of modern Python features like data classes.