LLaVA-NeXT, an open-source project aimed at advancing multimodal AI models for image, video, and text understanding, is currently grappling with significant technical challenges that threaten its reliability and usability.
The repository has been active with 123 open issues, highlighting critical concerns such as reproducibility failures (#178) and size mismatch errors in model files (#177). These issues suggest inconsistencies in model documentation or configuration that could undermine user trust. Additionally, the community's interest in forming real-time support groups (#179) indicates a growing user base seeking structured guidance.
Li Bo (Luodian)
LLaVA_OneVision_Tutorials.ipynb
for tutorial accuracy.ChunyuanLI
Kaichen Zhang (kcz358)
Yuanhan Zhang (ZhangYuanhan-AI)
Renrui Zhang (ZrrSkywalker)
Aryeh Hillman (abhillman)
Timespan | Opened | Closed | Comments | Labeled | Milestones |
---|---|---|---|---|---|
7 Days | 21 | 7 | 13 | 21 | 1 |
30 Days | 63 | 14 | 99 | 62 | 1 |
90 Days | 130 | 31 | 268 | 128 | 1 |
All Time | 161 | 38 | - | - | - |
Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.
Developer | Avatar | Branches | PRs | Commits | Files | Changes |
---|---|---|---|---|---|---|
Li Bo | 1 | 2/2/0 | 32 | 84 | 7655 | |
Yuanhan Zhang | 2 | 0/0/0 | 9 | 11 | 460 | |
Kaichen Zhang - NTU | 1 | 1/1/0 | 7 | 6 | 124 | |
ChunyuanLI | 2 | 3/2/0 | 3 | 1 | 29 | |
Aryeh Hillman (abhillman) | 1 | 1/1/0 | 1 | 1 | 3 | |
Renrui Zhang | 1 | 0/0/0 | 1 | 1 | 2 | |
Xiaodong Wang (Wang-Xiaodong1899) | 0 | 1/0/0 | 0 | 0 | 0 |
PRs: created by that dev and opened/merged/closed-unmerged during the period
The LLaVA-NeXT GitHub repository has seen significant activity recently, with 123 open issues and a notable influx of new discussions surrounding model performance and usage. Several issues highlight critical problems, such as failures to reproduce results from the original paper and size mismatch errors in model files. Themes of community engagement are evident, particularly in requests for clarification on model configurations and training data.
A few issues stand out due to their implications for the project's reliability and usability. For instance, the frequent reports of size mismatches and installation errors suggest potential inconsistencies in the model files or documentation. Additionally, the emergence of community-driven discussions about creating chat groups for real-time support indicates a growing user base that may require more structured guidance.
#179: Community Chatting Group
#178: Failure to reproduce the paper results
#177: Size Mismatch Issue in ‘mm_projector.bin’ for ‘llava-onevision-qwen2-0.5b-ov’ Model
#175: Resources required for training with 72b language models
#174: Cauldron dataset
#172: Llava-Next-OV FPS
#171: The effectiveness of Stage-1.5
#170: Update of the cli.py script
#169: Batch inference for LLaVA One Vision
#168: UReader data mismatch with images
The recent activity on LLaVA-NeXT's GitHub repository indicates a vibrant community grappling with various technical challenges related to model performance and configuration. The issues raised reflect both critical bugs that could hinder usability and broader inquiries into the project's architecture and data handling practices. Addressing these concerns promptly will be essential for maintaining user trust and project momentum.
The LLaVA-NeXT project currently has 9 open pull requests and 10 closed pull requests, reflecting ongoing development and community engagement in enhancing its multimodal capabilities. The pull requests cover a range of updates, bug fixes, and documentation improvements.
PR #160: Update README.md
Created 6 days ago by Chunyuan LI. This PR adds 12 lines to the README file, improving documentation clarity.
PR #136: fix prepare_inputs_labels_for_multimodal in llava_arch
Created 11 days ago by Xiaodong Wang. This PR addresses a bug that caused repeated additions of image_feature
when image_idx
was not in video_idx_in_batch
.
PR #84: Fix prepare inputs labels for multimodal
Created 56 days ago by Khai Mai. This PR adds assertions to ensure the number of images matches the number of image tokens and fixes a bug related to handling zero images in input.
PR #75: Update LLaVA-NeXT.md - Typo (missing letter)
Created 58 days ago by Tayyib Chohan. A minor typo fix in documentation.
PR #73: Make some ad-hoc changes to use the interleave model
Created 58 days ago by Haruki Sakajo. This PR includes changes to utilize the interleave model, with minor code adjustments.
PR #65: Features update
Created 64 days ago by Socialstranger. This PR introduces several new features, including new HTML files.
PR #40: Samples
Created 89 days ago by Alastair D'Silva. This PR adds sample scripts and a Gradio UI, along with several bug fixes.
PR #34: fix: Conversation.copy()
Created 93 days ago by Yuki Imajuku. This PR improves the copy()
function but is noted as potentially redundant.
PR #23: Fixed Prompt formatting in conversation.py
Created 98 days ago by KT313. This PR corrects prompt formatting issues that led to duplicate tokens.
PR #180: Update LLaVA OneVision model to lmms-lab/llava-onevision-qwen2-7b-ov
Merged recently, this PR updates the OneVision model documentation.
PR #163: Update README.md
Merged recently, this PR includes minor updates to the README file.
PR #161: Update README.md
Merged recently, this PR also contains updates to the README file with more substantial changes than PR #163.
PR #152: Provide the correct video processing logic with decord
Merged recently, this PR enhances video processing logic in tutorials.
PR #134: fix imports of missing and deprecated qwen-moe
Merged recently, this PR resolves import issues related to deprecated models.
PR #112: Remove Redundant sentencepiece
Dep
Merged recently, this PR removes unnecessary dependencies from the project configuration.
PR #67: add multi-image inference
Merged recently, this significant update introduces multi-image inference capabilities.
PR #29 & PR #27: fix class not found error
Both were closed without merging due to being deemed unnecessary after discussion among contributors.
PR #1: Update README.md
Closed due to being trivial and flagged as a contributor seeking attention through minor edits across various repositories.
The current state of pull requests for the LLaVA-NeXT project indicates an active development environment with a strong focus on both functionality and documentation. The open pull requests largely revolve around fixing bugs related to multimodal input handling (e.g., PRs #136 and #84), which suggests an ongoing effort to refine the model's capability to process diverse data types effectively. The emphasis on ensuring that inputs are correctly matched with their corresponding features highlights a commitment to robust functionality within the multimodal framework.
Documentation improvements are also prevalent, as seen in multiple recent pull requests (#160, #163, and others) aimed at enhancing clarity and usability for end-users. Such efforts are crucial for community engagement, especially given that LLaVA-NeXT is an open-source project that relies on contributions from users who may not be deeply familiar with its architecture or intended use cases.
Notably, there is a mix of minor corrections (like typos) alongside substantial feature additions (e.g., multi-image inference in PR #67). This blend indicates a healthy balance between maintaining existing code quality and pushing forward with innovative features that could enhance user experience and application versatility.
However, there are some anomalies worth noting. For instance, several pull requests (#29 and #27) were closed without merging due to discussions indicating they were unnecessary or redundant. This suggests potential communication gaps or misunderstandings within the contributor community regarding what changes are essential versus those that may be considered trivial or already addressed elsewhere. Additionally, the presence of a contributor who has been flagged for making trivial edits across multiple repositories raises concerns about genuine contributions versus opportunistic behavior aimed at gaining visibility within popular projects.
The lack of recent merge activity in certain areas may also indicate bottlenecks or resource constraints within the development team. Given that many open pull requests have been created within a relatively short timeframe (e.g., several within the last two months), it may be beneficial for maintainers to prioritize reviews and merges more systematically to keep momentum going and encourage further contributions from the community.
In conclusion, while LLaVA-NeXT demonstrates robust activity in terms of feature development and community engagement through pull requests, attention should be paid to ensuring effective communication among contributors and maintaining a clear focus on meaningful enhancements rather than trivial edits. Addressing these aspects will be vital for sustaining growth and innovation within this promising multimodal AI project.
Li Bo (Luodian)
LLaVA_OneVision_Tutorials.ipynb
file, including fixing tutorial errors and updating video processing logic.ChunyuanLI
Kaichen Zhang (kcz358)
LLaVA_OneVision.md
documentation.Yuanhan Zhang (ZhangYuanhan-AI)
README.md
files across various branches.Renrui Zhang (ZrrSkywalker)
Aryeh Hillman (abhillman)
The development team is actively engaged in both feature development and maintenance tasks, with a clear emphasis on improving documentation and collaborative efforts. The focus on multimodal capabilities suggests a commitment to advancing the project's objectives within the AI research community.