LLaVA-NeXT, a framework for integrating language and vision capabilities, continues to refine its multimodal functionalities with a focus on video processing and user documentation improvements.
Recent issues and pull requests indicate a strong emphasis on resolving model performance discrepancies (#254) and improving documentation clarity (#256). The development team is actively addressing technical errors related to model loading (#248) and missing dependencies (#249), reflecting ongoing efforts to streamline the user experience.
Li Bo (Luodian)
ChunyuanLI
Tianyi Xiong (tyxiong23)
Yuanhan Zhang (ZhangYuanhan-AI)
Kaichen Zhang (kcz358)
Nguyen-Quang-Trung (ngquangtrung57)
Raushan Turganbay (zucchini-nlp)
The team is actively collaborating on documentation and video processing enhancements, indicating a cohesive strategy towards improving multimodal capabilities.
Timespan | Opened | Closed | Comments | Labeled | Milestones |
---|---|---|---|---|---|
7 Days | 17 | 4 | 8 | 17 | 1 |
30 Days | 67 | 17 | 99 | 66 | 1 |
90 Days | 168 | 42 | 349 | 165 | 1 |
All Time | 228 | 56 | - | - | - |
Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.
Developer | Avatar | Branches | PRs | Commits | Files | Changes |
---|---|---|---|---|---|---|
Yuanhan Zhang | 2 | 2/2/0 | 5 | 9 | 746 | |
Li Bo | 1 | 1/1/0 | 12 | 10 | 537 | |
Tianyi Xiong | 1 | 4/3/1 | 19 | 8 | 359 | |
ChunyuanLI | 1 | 0/0/0 | 6 | 2 | 26 | |
Kaichen Zhang - NTU | 1 | 1/1/0 | 2 | 2 | 16 | |
Nguyen-Quang-Trung | 1 | 1/1/0 | 1 | 1 | 10 | |
Raushan Turganbay | 1 | 1/1/0 | 1 | 1 | 3 | |
None (NarekN7) | 0 | 1/0/0 | 0 | 0 | 0 | |
None (litianjian) | 0 | 1/0/0 | 0 | 0 | 0 | |
None (TayyibChohan) | 0 | 0/1/0 | 0 | 0 | 0 | |
Xiaodong Wang (Wang-Xiaodong1899) | 0 | 0/0/1 | 0 | 0 | 0 |
PRs: created by that dev and opened/merged/closed-unmerged during the period
The LLaVA-NeXT project currently has 172 open issues, with recent activity indicating a steady stream of inquiries and bug reports. Notably, several issues revolve around model performance discrepancies and configuration challenges, reflecting the complexity of integrating multimodal capabilities.
Common themes include confusion regarding model parameters, particularly in relation to different versions (e.g., 0.5B vs. 7B models), and requests for clarification on training data and evaluation metrics. There is also a significant focus on resolving technical errors related to model loading and inference.
Issue #257: Does LLaVA-NeXT support 336x336 image inputs, like LLaVA-1.5?
Issue #256: What is the purpose of the three sh files in script/interleave since we can evaluate using lmms-eval?
Issue #255: Video/Image Processing (padding, channel order)
Issue #254: Model performs well when using flash_attention_2 or SDPA, but outputs "!!!!" when using the original attention.
Issue #253: How to merge LoRA fine-tuned model with base model?
Issue #249: dpo_ov7b.sh imports data_processing which is missing.
Issue #248: Running the eval example script for Llava-next-video reports an error.
Issue #247: 3 PyTorch allocator cache flushes since last step.
Issue #245 & #244 & #243 & #242 & #240 & #239 & #238 & #234 & #233 & #232 & #231 & #230 & #229 & #227 & #226 & #224 & #223 & #221 & #220 & #219 & #218 & #217 & #216 & #215 & #214 & #213 & #212 & #211 & #210... (Multiple issues related to community discussions, feature requests, and minor bugs.)
Several issues highlight critical areas of concern:
The ongoing activity within the LLaVA-NeXT repository reflects a vibrant community engaged in troubleshooting and enhancing the multimodal capabilities of the framework. The concentration of issues around model performance and configuration suggests areas for improvement in documentation and user guidance, which could facilitate smoother user experiences moving forward.
The LLaVA-NeXT project has a series of active and closed pull requests that reflect ongoing development and maintenance efforts. The open pull requests focus on enhancing functionality, fixing bugs, and improving documentation, while the closed pull requests indicate a history of active contributions and iterative improvements.
PR #252: Redesigning prompt
PR #250: Fix typos
PR #160: Update README.md
PR #84: Fix prepare inputs labels for multimodal
PR #73: Make some ad-hoc changes to use the interleave model
PR #65: Features update
PR #40: Samples
PR #34: fix: Conversation.copy()
copy()
function in conversation handling.PR #23: Fixed Prompt formatting in conversation.py
PR #241, #237, #236, #235: Documentation updates
PR #228: Revert "Fix: videos in LLaVa-OV"
PR #205, #198, #195, #183: Video processing updates
PR #180: Update LLaVA OneVision model to lmms-lab/llava-onevision-qwen2-7b-ov
The open pull requests indicate a strong focus on enhancing functionality and fixing bugs within the LLaVA-NeXT framework. The presence of PRs like #252 and #84 suggests ongoing efforts to improve model inference capabilities and handle multimodal inputs more effectively. PRs addressing typos (#250) and documentation updates (#160) reflect an emphasis on maintaining code quality and providing clear guidance to users.
Closed pull requests reveal a history of active development with a mix of feature enhancements (#205, #198) and maintenance tasks (#241, #237). The reversion of changes in PR #228 highlights a responsive approach to development where adjustments are made based on feedback or issues encountered post-deployment. The updates related to video processing (#195, #183) suggest an ongoing effort to refine this aspect of the framework, which is crucial given its multimodal capabilities.
Overall, the pull request activity demonstrates a vibrant development process with a focus on continuous improvement, user experience enhancement, and robust community engagement through transparent collaboration.
Li Bo (Luodian)
ChunyuanLI
Tianyi Xiong (tyxiong23)
Yuanhan Zhang (ZhangYuanhan-AI)
Kaichen Zhang (kcz358)
Nguyen-Quang-Trung (ngquangtrung57)
Raushan Turganbay (zucchini-nlp)
Overall, the development team is engaged in a productive cycle of enhancing both the functionality and usability of the LLaVA-NeXT project.