‹ Reports
The Dispatch

GitHub Repo Analysis: OpenBMB/MiniCPM-V


Executive Summary

The MiniCPM-V project, managed by OpenBMB, focuses on developing multimodal LLMs (MLLMs) for vision-language understanding. These models integrate image and text inputs to generate high-quality text outputs. Notably, the MiniCPM-Llama3-V 2.5 model is a highlight, featuring 8 billion parameters and outperforming competitors like GPT-4V-1106 and Gemini Pro in various benchmarks. The project emphasizes performance, efficiency, and supports over 30 languages, making it suitable for deployment on diverse platforms including end-side devices.

Recent Activity

Team Members and Contributions

Recent Commits

  1. qianyu chen - Updated q_lora code for memory optimization - 1 day ago
  2. Chao Jia - Updated README.md files - 2 days ago
  3. Tianyu Yu - Added new assets for WeChat integration - 3 days ago
  4. Cui Junbo - Merged PR for multi-GPU inference update - 4 days ago
  5. Hongji Zhu - Added system compatibility warnings for Mac users - 5 days ago

Risks

Of Note

Quantified Commit Activity Over 14 Days

Developer Avatar Branches PRs Commits Files Changes
Haoyu Li 1 1/1/0 1 49 5610
Tianyu Yu 1 0/0/0 72 10 2048
qianyu chen 1 9/6/3 6 8 550
Cui Junbo 1 0/0/0 14 9 248
JamePeng 1 0/1/0 1 2 207
Boke Syo 1 1/1/0 1 1 159
Hongji Zhu 1 0/0/0 16 7 68
Chao Jia 1 0/0/0 3 3 30
zhangao 1 0/0/0 3 1 28
YuzaChongyi 1 1/1/0 3 5 23
imarochkin 1 0/0/0 1 3 9
tc-mb 1 0/0/0 2 2 4
ByeongkiJeong 1 2/2/0 2 1 4
EC2 Default User 1 0/0/0 1 1 3
BU Fanchen 卜凡辰 1 1/1/0 1 1 2
王鹤男 (whn09) 0 1/1/0 0 0 0
None (wanesoft) 0 1/1/0 0 0 0

PRs: created by that dev and opened/merged/closed-unmerged during the period

Quantified Reports

Quantify commits



Quantified Commit Activity Over 14 Days

Developer Avatar Branches PRs Commits Files Changes
Haoyu Li 1 1/1/0 1 49 5610
Tianyu Yu 1 0/0/0 72 10 2048
qianyu chen 1 9/6/3 6 8 550
Cui Junbo 1 0/0/0 14 9 248
JamePeng 1 0/1/0 1 2 207
Boke Syo 1 1/1/0 1 1 159
Hongji Zhu 1 0/0/0 16 7 68
Chao Jia 1 0/0/0 3 3 30
zhangao 1 0/0/0 3 1 28
YuzaChongyi 1 1/1/0 3 5 23
imarochkin 1 0/0/0 1 3 9
tc-mb 1 0/0/0 2 2 4
ByeongkiJeong 1 2/2/0 2 1 4
EC2 Default User 1 0/0/0 1 1 3
BU Fanchen 卜凡辰 1 1/1/0 1 1 2
王鹤男 (whn09) 0 1/1/0 0 0 0
None (wanesoft) 0 1/1/0 0 0 0

PRs: created by that dev and opened/merged/closed-unmerged during the period

Detailed Reports

Report On: Fetch commits



Project Overview

The MiniCPM-V project, managed by the OpenBMB organization, is a series of end-side multimodal LLMs (MLLMs) designed for vision-language understanding. These models take image and text inputs to provide high-quality text outputs. Since its inception in February 2024, the project has released several versions of the model, with a focus on strong performance and efficient deployment. The most notable models in this series are MiniCPM-Llama3-V 2.5 and MiniCPM-V 2.0. MiniCPM-Llama3-V 2.5, in particular, is equipped with 8B parameters and surpasses many proprietary models like GPT-4V-1106 and Gemini Pro in performance. It supports over 30 languages and can be efficiently deployed on end-side devices due to optimizations like quantization and CPU/NPU optimizations.

Team Members and Recent Activities

qianyu chen (qyc-98)

  • Recent Commits:
    • Updated q_lora code and memory cost with zero3 and offloading.
    • Updated zero3 code and OOM FAQs.
    • Updated LoRA finetuning code.
    • Updated model_minicpmv.py for latest compatibility.
  • Files Worked On: Various scripts and README files within the finetune directory.
  • Collaborations: Collaborated on pull requests involving updates to finetuning scripts.

Chao Jia (jctime)

  • Recent Commits:
    • Updated README.md, README_zh.md, and README_en.md.
  • Files Worked On: README files in multiple languages.
  • Collaborations: Solely focused on documentation updates without direct collaborations mentioned.

Tianyu Yu (yiranyyu)

  • Recent Commits:
    • Multiple updates to README files across different languages.
    • Added new assets and documentation related to WeChat integration.
  • Files Worked On: Primarily README files and some asset additions.
  • Collaborations: Multiple commits suggest solo work on documentation, with frequent updates indicating active involvement in project documentation.

Cui Junbo (Cuiunbo)

  • Recent Commits:
    • Merged pull requests related to updating inference on multiple GPUs.
    • Updated news sections in README files.
  • Files Worked On: Documentation related to new features and functionalities.
  • Collaborations: Engaged in merging pull requests from other contributors, indicating a role in overseeing contributions.

Hongji Zhu (iceflame89)

  • Recent Commits:
    • Added warnings when inferring with mps and bf16 on Mac.
    • Updated various README files across different languages.
  • Files Worked On: Primarily involved in updating documentation and adding specific warnings for Mac users.
  • Collaborations: Appears to work independently on updates pertinent to system compatibility issues.

ByeongkiJeong

  • Recent Commits:
    • Updated documentation for inference on multiple GPUs by fixing typos.
  • Files Worked On: Documentation within the docs directory.
  • Collaborations: Contributions seem focused on minor but crucial corrections.

Boke Syo (bokesyo)

  • Recent Commits:
    • Created a document for multiple GPU inference.
  • Files Worked On: New documentation under the docs directory.
  • Collaborations: Initiated documentation that was later merged by another team member, suggesting collaboration.

Additional Contributors

Other members like YuzaChongyi, JamePeng, lihytotoro, EC2 Default User, waxnkw, imarochkin, bokveizen, tc-mb, whn09, wanesoft have contributed through specific updates to scripts or documentation, often focusing on fine-tuning processes or enhancing the project's accessibility through improved documentation.

Patterns and Conclusions

The development team is highly active with a clear focus on both enhancing the project's core functionalities (such as fine-tuning scripts) and maintaining robust documentation across multiple languages. There is a strong emphasis on ensuring the software runs efficiently across various platforms including end-side devices which is critical for deployment scenarios. Collaboration patterns suggest a mix of independent work with periodic integrative efforts where team members review and merge each other's contributions, ensuring consistency and quality across the project's outputs.

Report On: Fetch issues



Recent Activity Analysis

Recent activity on the GitHub repository for the project MiniCPM-V indicates a high level of engagement with 24 open issues, many of which were created within the last few days. The issues range from questions about specific functionalities, bug reports, to requests for enhancements and discussions on various aspects of the project.

Notable Issues

  • Issue #222 and #221 both address specific use cases and functionalities related to model testing and training without images, indicating a focus on enhancing the model's versatility and usability in different scenarios.
  • Issue #220 highlights a critical problem where the model stops responding after finetuning, suggesting potential issues in the finetuning process or model configuration that could significantly impact user experience.
  • Issue #217 discusses the creation of a user group for better discussion and feedback, indicating an active community engagement.
  • Issue #216 deals with the inability of a finetuned model to perform as expected, which could point to issues in either the training data or model's learning capability.
  • Issues #214, #213, and others raise technical questions about specific functionalities and configurations, suggesting a user base that is deeply engaged with understanding and utilizing the project's capabilities fully.

Common Themes

A recurring theme in these issues is the challenge users face with finetuning models (#220, #216), indicating potential areas for improvement in documentation or functionality for easier model customization. Additionally, there is significant interest in community support and development (#217), highlighting the project's communal value.

Issue Details

Most Recently Created Issues

  • #222: Lora微调后如何进行测试呢 - Created 0 days ago
  • #221: 如何训练单模态数据,没有图片的 - Created 0 days ago
  • #220: [BUG] after funetine, model inference is None / empty - Created 0 days ago
  • #219: 可以指定模型位置么 - Created 0 days ago
  • #217: 建一个群,方便用户们讨论和反馈 - Created 1 day ago

Most Recently Updated Issues

  • #216: 模型微调,获取不了检测的能力。 - Last updated 0 days ago
  • #215: 我的M3芯片本地运行MiniCPM-Llama3-V-2_5-int4得到了报错Using bitsandbytes 8-bit quantization requires Accelerate - Last updated 0 days ago
  • #214: 关于VLM计数推理幻觉的询问 - Last updated 1 day ago
  • #213: Questions about finetuning - Last updated 1 day ago
  • #212: lora微调grad_norm为nan,loss为0[BUG] - Last updated 1 day ago</li> </ul> <p>These issues reflect active engagement from both maintainers and community members in addressing recent concerns and queries.</p> </div> <div id="aspect_3" class="report-section"> <h2 class="secondary-divider">Report On: Fetch Files For Assessment</h2> <a name="aspect_3"></a> <br/><br/> <h2>Analysis of Source Code Files</h2> <h3>File: <a href="https://github.com/OpenBMB/MiniCPM-V/blob/main/finetune/finetune.py"><code>finetune/finetune.py</code></a></h3> <h4>Structure and Quality:</h4> <ul> <li><strong>Imports and Dependencies</strong>: The script imports necessary libraries and modules for handling deep learning tasks, including PyTorch, Transformers, and custom modules like <code>dataset</code> and <code>trainer</code>. The use of <code>dataclasses</code> for argument management is appropriate.</li> <li><strong>Data Classes</strong>: Defined for managing model arguments, data arguments, training configurations, and LoRA (Low-Rank Adaptation) settings. This is a clean way to handle configurations.</li> <li><strong>Main Functionality</strong>:<ul> <li>The script defines a training function that sets up the model, tokenizer, data handling, and trainer. It supports distributed training using DeepSpeed.</li> <li>LoRA-specific configurations are handled appropriately, allowing selective fine-tuning of model parameters which is crucial for large models.</li> <li>Functions like <code>safe_save_model_for_hf_trainer</code> and <code>make_supervised_data_module</code> show good modularization of code. However, the script could benefit from more inline comments explaining complex sections.</li> </ul> </li> <li><strong>Error Handling</strong>: There is minimal explicit error handling, which could be improved to make the script more robust against common issues like file not found or incorrect configurations.</li> <li><strong>Performance Optimizations</strong>: Utilizes techniques like gradient checkpointing and parameter offloading (Zero-3) to manage memory efficiently during training.</li> <li><strong>Code Quality</strong>: The code is generally well-structured but could benefit from more detailed comments. Some functions are quite long and could be refactored into smaller units.</li> </ul> <h3>File: <a href="https://github.com/OpenBMB/MiniCPM-V/blob/main/finetune/readme.md"><code>finetune/readme.md</code></a></h3> <h4>Structure and Quality:</h4> <ul> <li><strong>Content</strong>: Provides comprehensive documentation on how to fine-tune the MiniCPM models. It includes sections on data preparation, full-parameter fine-tuning, LoRA fine-tuning, memory usage statistics, FAQs, and troubleshooting tips.</li> <li><strong>Clarity and Detail</strong>: The document is well-written with clear instructions and detailed explanations. It effectively uses markdown features like details expansion (<code><details></code> tag) to organize content neatly.</li> <li><strong>Relevance</strong>: All information is relevant and crucial for users intending to fine-tune the MiniCPM models. It addresses common scenarios and potential issues users might face.</li> </ul> <h3>File: <a href="https://github.com/OpenBMB/MiniCPM-V/blob/main/docs/inference_on_multiple_gpus.md"><code>docs/inference_on_multiple_gpus.md</code></a></h3> <h4>Structure and Quality:</h4> <ul> <li><strong>Content</strong>: Describes methods for performing inference using multiple GPUs to manage large models that do not fit into the memory of a single GPU.</li> <li><strong>Technical Depth</strong>: Provides a practical guide on distributing model layers across GPUs using the <code>accelerate</code> library. Includes code snippets that are directly applicable.</li> <li><strong>Clarity</strong>: The document is clear and well-organized. Technical terms and steps are explained thoroughly, making it accessible even to users who may not be familiar with multi-GPU setups.</li> <li><strong>Utility</strong>: Highly useful for scenarios where model inference requires more memory than available on a single GPU.</li> </ul> <h3>File: <a href="https://github.com/OpenBMB/MiniCPM-V/blob/main/web_demo_streamlit-2_5.py"><code>web_demo_streamlit-2_5.py</code></a></h3> <h4>Structure and Quality:</h4> <ul> <li><strong>Functionality</strong>: Implements a web demo for interacting with the MiniCPM-Llama3-V 2.5 model using Streamlit. Supports both text and image inputs.</li> <li><strong>User Interface</strong>: Utilizes Streamlit widgets effectively to create an interactive user interface. Includes features like chat history and adjustable model parameters through sliders.</li> <li><strong>Code Quality</strong>: The code is straightforward with adequate use of Streamlit's caching to optimize loading times. However, it lacks error handling which might be necessary for production environments (e.g., handling invalid image files).</li> <li><strong>Documentation</strong>: Inline comments are minimal; adding more would enhance readability and maintainability.</li> </ul> <h3>Summary</h3> <p>The provided files are well-crafted with specific functionalities clearly implemented. However, there are areas such as error handling and code documentation where improvements could be made to enhance robustness and maintainability. The use of modern Python features like data classes in configuration management is commendable.</p> </div> </div> </div> </body> <script src="/static/github-logo-loader.4ce3f35b6f9c.js"></script> <script src="/static/diffs-processing.3d9bd6e637a7.js"></script> <script src="/static/subnav-section-highlight.f3cdbfc10a0c.js"></script> <script src="/static/simple-datatables.c5478338f0b0.js"></script> <script src="/static/subnav-toggle.cb6ff17a23ca.js"></script> <script src="/static/tabs.081ec4777ff9.js"></script> <script src="/static/accordion.9fbab959a9c9.js"></script> <script> document.addEventListener('DOMContentLoaded', () => { // Function to initialize tables function initializeTables(selector, options) { const tables = document.querySelectorAll(selector); tables.forEach(function(table) { new simpleDatatables.DataTable(table, options); }); } // Initialize other interactive tables initializeTables(".table-container table", { searchable: false, fixedHeight: false, perPage: 10, perPageSelect: false, }); // Initialize interactive tables for harvest-forecast tables initializeTables(".harvest-forecast-table table", { searchable: true, sortable: false, fixedHeight: false, perPage: 25, perPageSelect: false, columns: [ { select: 9, sort: "desc" } // selecting the "diff" column ], }); // Initialize interactive tables for dev quant table initializeTables(".dev-quant-table table", { searchable: true, sortable: false, fixedHeight: false, perPage: 10, perPageSelect: false, }); // Replace failing avatars const avatars = document.querySelectorAll('.dev-quant-table table img, .pr-table img'); const fallbackImage = "/static/logos/sans-github.8dcc6b5262f3.svg"; avatars.forEach(avatar => { //console.log(`evaluating ${avatar.src}`); avatar.onerror = function() { //console.log(`Error loading image: ${avatar.src}`); avatar.src = fallbackImage; // Set the fallback image on error //console.log(`replacing src url with ${avatar.src}`); }; // Preload the avatar to check if it loads correctly const img = new Image(); img.src = avatar.src; img.onload = function() { //console.log(`Image loaded successfully: ${avatar.src}`); }; img.onerror = function() { //console.log(`Image failed to load, setting fallback: ${avatar.src}`); avatar.src = fallbackImage; // Set the fallback image if it fails to load }; }); }); const readMoreElements = document.querySelectorAll('.read-more'); readMoreElements.forEach(element => { element.addEventListener('click', function() { const rationale = this.previousElementSibling; if (rationale.classList.contains('expanded')) { rationale.style.maxHeight = rationale.scrollHeight + 'px'; rationale.classList.remove('expanded'); setTimeout(() => { rationale.style.maxHeight = '4rem'; }, 10); this.textContent = '[+] Read More'; } else { rationale.classList.add('expanded'); rationale.style.maxHeight = rationale.scrollHeight + 'px'; this.textContent = '[-] Hide Contents'; } }); }); </script> </html>