OSS Report: OpenBMB/MiniCPM

Aug. 16, 2024, 6:30 a.m. UTC This report was generated by Dispatch AI

MiniCPM Enhances Edge AI with Efficient LLMs, Despite Deployment Challenges

MiniCPM is an open-source project by OpenBMB that develops small-scale large language models for edge devices. Its flagship 2.7B parameter model competes with much larger LLMs while enabling real-time inference on mobile phones.

Recent development has focused on expanding deployment options, enhancing fine-tuning capabilities, and improving quantization techniques. The project has seen consistent activity, with notable additions including QLoRA training, AutoAWQ quantization support, and integration with frameworks like LangChain and LLaMA Factory.

Recent Activity

Issues and PRs indicate a strong focus on deployment and performance optimization. Users have reported challenges deploying MiniCPM on various platforms, particularly mobile devices (e.g., #149, #104). The team has responded by adding support for frameworks like Ollama, FastLLM, and PowerInfer (PRs #145, #79, #166).

Recent development team activity:

LDLINGLINGLING:
- Added QLoRA training method (#176)
- Implemented AutoAWQ quantization support (#157)
- Added PowerInfer deployment example for minicpm-s-1b model (#166)
- Fixed MLX implementation bugs (#162)
- Added LLaMA Factory fine-tuning examples (#161)
cyx2000:
- Added MiniCPMV to Hugging Face demo (#111)
- Implemented fine-tuning model settings for bf16 and fp16 (#106)
zh-zheng:
- Updated README and documentation (various commits)
- Added support for MiniCPM 2.0 (mentioned in issues)
ywfang:
- Updated MLX-related documentation and requirements
- Fixed evaluation result reporting
- Added support for MiniCPM-MoE-8x2B model
SwordFaith:
- Added 128k context length evaluation
- Fixed supervised fine-tuning dataset issues

Of Note

The project maintains competitive performance with much larger models while enabling edge deployment, demonstrating the potential for efficient LLMs.
Recent development has heavily focused on quantization and deployment optimization, suggesting a strong commitment to edge AI applications.
The addition of multi-modal (MiniCPM-V) and mixture-of-experts (MiniCPM-MoE-8x2B) variants indicates exploration of advanced model architectures within the small-scale LLM paradigm.
While the project shows active development, the centralized review process (most PRs merged by LDLINGLINGLING) may impact long-term community engagement.
Recurring deployment issues across platforms suggest a need for more robust cross-platform support and documentation.

Quantified Reports

Quantify Issues

Recent GitHub Issues Activity

Timespan	Opened	Closed	Comments	Labeled	Milestones
7 Days	4	0	3	0	1
30 Days	12	9	14	3	1
90 Days	39	47	61	13	1
All Time	148	131	-	-	-

_{Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.}

Quantify commits

Quantified Commit Activity Over 30 Days

Developer	Avatar	Branches	PRs	Commits	Files	Changes
LDLINGLINGLING		1	4/4/0	6	4	62

_{PRs: created by that dev and opened/merged/closed-unmerged during the period}

Detailed Reports

Report On: Fetch issues

Here is a brief analysis of the recent GitHub issue activity for the MiniCPM project:

Recent Activity Analysis:

The project has seen steady issue activity over the past few months, with 17 open issues currently. Recent issues focus on topics like model deployment, inference errors, and requests for additional features or clarifications.

Some notable issues and themes:

Deployment challenges: Several users have reported issues deploying MiniCPM on different platforms, especially mobile/edge devices. For example:
Issue #188 reports errors when trying to run multi-modal inference on MiniCPM-V.
Issue #149 describes problems deploying on Android, with the app crashing after initialization.
Issue #104 notes errors when trying to convert weights for deployment.

These suggest ongoing work may be needed to improve cross-platform deployment stability.

Inference and performance: A few issues relate to unexpected inference results or performance:
Issue #163 reports unexpectedly long inference times on a GPU.
Issue #143 describes poor code generation capabilities compared to reported benchmark results.
Feature requests: There are several requests for additional features or model variants:
Issue #91 asks about releasing base model weights (pre-fine-tuning).
Issue #129 requests support for running the multi-modal version on llama.cpp.
Issue #60 suggests adding support for the fastllm inference framework.
Documentation and examples: Some issues request more detailed documentation or examples:
Issue #66 asks for examples of loading fine-tuned LoRA weights.
Issue #4 requests a more detailed requirements.txt file.

Overall, the issues suggest active community engagement with the project, with users attempting deployments across various platforms. The maintainers appear responsive, often providing workarounds or explanations. However, some recurring deployment and inference challenges may warrant further investigation or documentation improvements.

Issue Details:

Most recently created: #188: "[Bad Case]: 多模态 MiniCPM-V 推理报错" (open, created 0 days ago) #187: "[Bad Case]: 多模态MiniCPM-V 2.0 transformers 推理报错" (open, created 2 days ago)

Most recently updated: #188: "[Bad Case]: 多模态 MiniCPM-V 推理报错" (open, updated 0 days ago) #187: "[Bad Case]: 多模态MiniCPM-V 2.0 transformers 推理报错" (open, updated 0 days ago)

These recent issues both relate to inference errors with the multi-modal MiniCPM-V model, suggesting this may be an area requiring attention from the development team.

Report On: Fetch pull requests

Overview

This report analyzes 31 closed pull requests for the OpenBMB/MiniCPM repository, which contains an open-source large language model designed for edge devices.

Summary of Pull Requests

#183: Added tutorial entry points for MiniCPM in README files (16 days ago)

#180: Added xtuner open source community link (18 days ago)

#177: Added LLaMA-Factory navigation to homepage (18 days ago)

#176: Added QLoRA training method (21 days ago)

#172: Added Langchain demo for multi-file RAG on GPUs with <6GB VRAM (31 days ago)

#170, #169: Added quick navigation, quantization, and LLaMA-Factory content to README (32 days ago)

#166: Added PowerInfer deployment example for MiniCPM-S-1B model (39 days ago)

#162: Fixed two bugs in MLX code (50 days ago)

#161: Added LLaMA-Factory fine-tuning examples (50 days ago)

#157: Added AutoAWQ support for MiniCPM (52 days ago)

#156: Fixed user token issues for different model sizes (53 days ago)

#145: Added Ollama support for MiniCPM-1B (53 days ago)

#122: Added OpenAI API support (56 days ago)

#111: Added MiniCPMV support in Hugging Face demo (43 days ago)

#110: Added MLX inference for Mac (127 days ago)

#106: Added bf16 and fp16 settings for fine-tuning (114 days ago)

#79: Added FastLLM support (168 days ago)

Earlier PRs (>170 days ago) mainly involved documentation updates, bug fixes, and minor feature additions.

Analysis of Pull Requests

The pull requests for the MiniCPM project demonstrate a consistent focus on improving accessibility, performance, and deployment options for the model. Several key themes emerge from this analysis:

Expanding Deployment Options: Many PRs focused on adding support for various deployment frameworks and platforms. This includes Ollama (#145), FastLLM (#79), MLX for Mac (#110), and PowerInfer (#166). These additions significantly broaden the model's usability across different environments and hardware.
Enhancing Fine-tuning Capabilities: PRs #176 and #161 introduced new fine-tuning methods like QLoRA and LLaMA-Factory integration. This demonstrates a commitment to improving the model's adaptability for specific tasks and domains.
Quantization and Efficiency: Several PRs (#157, #169) addressed quantization techniques like AutoAWQ and bitsandbytes (bnb). This aligns with MiniCPM's goal of efficient deployment on edge devices.
Documentation and Accessibility: Many PRs (#183, #180, #177) focused on improving documentation, adding tutorials, and enhancing navigation in the README files. This indicates a strong emphasis on making the project more accessible to users and contributors.
Bug Fixes and Optimizations: PRs like #162 and #156 addressed specific bugs and optimized the code for different model sizes, showing ongoing maintenance and refinement of the codebase.
Expanding Ecosystem Integration: The addition of OpenAI API support (#122) and Langchain demo (#172) shows efforts to integrate MiniCPM with popular AI development ecosystems.
Multi-modal and Specialized Versions: PR #111 added support for MiniCPMV, indicating development of specialized versions of the model.

The frequency and nature of these pull requests suggest an active development cycle with contributions from multiple community members. The project appears to be evolving rapidly, with a clear focus on making the model more versatile, efficient, and accessible to a wide range of users and deployment scenarios.

However, it's worth noting that most PRs are being merged by a single user (LDLINGLINGLING), which might indicate a centralized review process. Encouraging more diverse reviewer participation could potentially benefit the project's long-term sustainability and community engagement.

Overall, the pull requests reflect MiniCPM's positioning as a competitive, efficient large language model suitable for edge devices, with ongoing efforts to expand its capabilities and ease of use across various platforms and use cases.

Report On: Fetch commits

Development Team and Recent Activity

Team Members and Recent Activity

LDLINGLINGLING:
- Most active contributor in the past 30 days
- Added tutorials and navigation links for MiniCPM
- Implemented QLora training method
- Added support for langchain with local MiniCPM
- Implemented BNB quantization
- Added PowerInfer deployment example for minicpm-s-1b model
- Fixed bugs in MLX implementation
- Added LLaMA Factory fine-tuning examples
- Implemented AutoAWQ quantization support
cyx2000:
- Added MiniCPMV to Hugging Face demo
- Implemented fine-tuning model settings for bf16 and fp16
Zhi Zheng (zh-zheng):
- Updated README and documentation
- Added support for MiniCPM 2.0
ywfang:
- Updated MLX-related documentation and requirements
- Fixed evaluation result reporting
- Added support for MiniCPM-MoE-8x2B model
Xiang Long (SwordFaith):
- Added 128k context length evaluation
- Fixed supervised fine-tuning dataset issues
zRzRzRzRzRzRzR:
- Implemented OpenAI API support
- Added MLX inference support
DingDing (ShengdingHu):
- Updated README and documentation
- Added llama.cpp support

Patterns and Themes

Active development: The repository shows consistent activity over the past 30 days, with frequent updates and improvements.
Focus on efficiency and deployment: Recent work has centered on quantization, edge deployment, and support for efficient inference frameworks like MLX and PowerInfer.
Expanding model variants: The team has been adding support for new MiniCPM variants, including MoE and 128k context length versions.
Improving documentation: There's a consistent effort to keep documentation up-to-date in both English and Chinese.
Integration with popular frameworks: Recent work has focused on integrating MiniCPM with frameworks like LangChain, LLaMA Factory, and OpenAI API.
Performance optimization: The team is actively working on quantization techniques (QLora, BNB, AutoAWQ) to improve model efficiency.
Community engagement: The addition of tutorials, examples, and improved navigation suggests an effort to make the project more accessible to users and potential contributors.