‹ Reports
The Dispatch

OSS Report: LLaVA-VL/LLaVA-NeXT


LLaVA-NeXT Development Focuses on Enhancing Video Processing and Documentation

LLaVA-NeXT, a framework for integrating language and vision capabilities, continues to refine its multimodal functionalities with a focus on video processing and user documentation improvements.

Recent Activity

Recent issues and pull requests indicate a strong emphasis on resolving model performance discrepancies (#254) and improving documentation clarity (#256). The development team is actively addressing technical errors related to model loading (#248) and missing dependencies (#249), reflecting ongoing efforts to streamline the user experience.

Development Team and Recent Activity

  1. Li Bo (Luodian)

    • Updated README.md (3 days ago).
    • Merged PR #205 for video inference logic (5 days ago).
  2. ChunyuanLI

    • Updated release dates in README.md (7 days ago).
  3. Tianyi Xiong (tyxiong23)

    • Added DPO training scripts to LLaVA_OneVision_Chat.md (7 days ago).
  4. Yuanhan Zhang (ZhangYuanhan-AI)

    • Refactored video loading function (3 days ago).
  5. Kaichen Zhang (kcz358)

    • Merged PRs related to video processing (19 days ago).
  6. Nguyen-Quang-Trung (ngquangtrung57)

    • Contributed safe load tokenizer for llama_3 (23 days ago).
  7. Raushan Turganbay (zucchini-nlp)

    • Updated demo files in tutorial notebook (23 days ago).

The team is actively collaborating on documentation and video processing enhancements, indicating a cohesive strategy towards improving multimodal capabilities.

Of Note

Quantified Reports

Quantify Issues



Recent GitHub Issues Activity

Timespan Opened Closed Comments Labeled Milestones
7 Days 17 4 8 17 1
30 Days 67 17 99 66 1
90 Days 168 42 349 165 1
All Time 228 56 - - -

Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.

Quantify commits



Quantified Commit Activity Over 30 Days

Developer Avatar Branches PRs Commits Files Changes
Yuanhan Zhang 2 2/2/0 5 9 746
Li Bo 1 1/1/0 12 10 537
Tianyi Xiong 1 4/3/1 19 8 359
ChunyuanLI 1 0/0/0 6 2 26
Kaichen Zhang - NTU 1 1/1/0 2 2 16
Nguyen-Quang-Trung 1 1/1/0 1 1 10
Raushan Turganbay 1 1/1/0 1 1 3
None (NarekN7) 0 1/0/0 0 0 0
None (litianjian) 0 1/0/0 0 0 0
None (TayyibChohan) 0 0/1/0 0 0 0
Xiaodong Wang (Wang-Xiaodong1899) 0 0/0/1 0 0 0

PRs: created by that dev and opened/merged/closed-unmerged during the period

Detailed Reports

Report On: Fetch issues



Recent Activity Analysis

The LLaVA-NeXT project currently has 172 open issues, with recent activity indicating a steady stream of inquiries and bug reports. Notably, several issues revolve around model performance discrepancies and configuration challenges, reflecting the complexity of integrating multimodal capabilities.

Common themes include confusion regarding model parameters, particularly in relation to different versions (e.g., 0.5B vs. 7B models), and requests for clarification on training data and evaluation metrics. There is also a significant focus on resolving technical errors related to model loading and inference.

Issue Details

Recent Issues

  1. Issue #257: Does LLaVA-NeXT support 336x336 image inputs, like LLaVA-1.5?

    • Priority: Low
    • Status: Open
    • Created: 0 days ago
    • Updated: N/A
  2. Issue #256: What is the purpose of the three sh files in script/interleave since we can evaluate using lmms-eval?

    • Priority: Medium
    • Status: Open
    • Created: 1 day ago
    • Updated: N/A
  3. Issue #255: Video/Image Processing (padding, channel order)

    • Priority: Medium
    • Status: Open
    • Created: 2 days ago
    • Updated: N/A
  4. Issue #254: Model performs well when using flash_attention_2 or SDPA, but outputs "!!!!" when using the original attention.

    • Priority: High
    • Status: Open
    • Created: 2 days ago
    • Updated: 1 day ago
  5. Issue #253: How to merge LoRA fine-tuned model with base model?

    • Priority: Medium
    • Status: Open
    • Created: 3 days ago
    • Updated: N/A
  6. Issue #249: dpo_ov7b.sh imports data_processing which is missing.

    • Priority: High
    • Status: Open
    • Created: 5 days ago
    • Updated: N/A
  7. Issue #248: Running the eval example script for Llava-next-video reports an error.

    • Priority: High
    • Status: Open
    • Created: 5 days ago
    • Updated: N/A
  8. Issue #247: 3 PyTorch allocator cache flushes since last step.

    • Priority: Low
    • Status: Open
    • Created: 5 days ago
    • Updated: N/A
  9. Issue #245 & #244 & #243 & #242 & #240 & #239 & #238 & #234 & #233 & #232 & #231 & #230 & #229 & #227 & #226 & #224 & #223 & #221 & #220 & #219 & #218 & #217 & #216 & #215 & #214 & #213 & #212 & #211 & #210... (Multiple issues related to community discussions, feature requests, and minor bugs.)

Analysis of Notable Issues

Several issues highlight critical areas of concern:

  • The discrepancies in model performance between versions (e.g., Issue #254) suggest potential underlying bugs or configuration mismatches that need addressing.
  • The frequent inquiries about the purpose of specific scripts (e.g., Issue #256) indicate a need for clearer documentation regarding the project's structure and usage.
  • Issues related to missing dependencies (e.g., Issue #249) are common, pointing to potential gaps in setup instructions or package management.

Conclusion

The ongoing activity within the LLaVA-NeXT repository reflects a vibrant community engaged in troubleshooting and enhancing the multimodal capabilities of the framework. The concentration of issues around model performance and configuration suggests areas for improvement in documentation and user guidance, which could facilitate smoother user experiences moving forward.

Report On: Fetch pull requests



Overview

The LLaVA-NeXT project has a series of active and closed pull requests that reflect ongoing development and maintenance efforts. The open pull requests focus on enhancing functionality, fixing bugs, and improving documentation, while the closed pull requests indicate a history of active contributions and iterative improvements.

Summary of Pull Requests

Open Pull Requests

  • PR #252: Redesigning prompt

    • Focuses on adding inference scripts for LLaVA models.
    • Introduces new notebooks for model inference.
  • PR #250: Fix typos

    • A minor fix addressing typographical errors in the codebase.
  • PR #160: Update README.md

    • Updates to the README file, likely for clarity or additional information.
  • PR #84: Fix prepare inputs labels for multimodal

    • Addresses input preparation for multimodal tasks, ensuring correct handling of cases with no images.
  • PR #73: Make some ad-hoc changes to use the interleave model

    • Implements changes to integrate the interleave model into the existing framework.
  • PR #65: Features update

    • Introduces new features or updates existing ones, though details are vague.
  • PR #40: Samples

    • Adds sample scripts and a Gradio UI for better demonstration and usability.
  • PR #34: fix: Conversation.copy()

    • Improves the copy() function in conversation handling.
  • PR #23: Fixed Prompt formatting in conversation.py

    • Fixes prompt formatting issues to avoid duplicate tokens.

Closed Pull Requests

  • PR #241, #237, #236, #235: Documentation updates

    • These PRs focus on updating documentation related to LLaVA-OneVision-Chat, including training scripts and contributor lists.
  • PR #228: Revert "Fix: videos in LLaVa-OV"

    • Reverts a previous change related to video handling in tutorials.
  • PR #205, #198, #195, #183: Video processing updates

    • These PRs involve updates to video processing logic, including inference logic and tokenizer loading safety.
  • PR #180: Update LLaVA OneVision model to lmms-lab/llava-onevision-qwen2-7b-ov

    • Updates the model version used within the project.

Analysis of Pull Requests

The open pull requests indicate a strong focus on enhancing functionality and fixing bugs within the LLaVA-NeXT framework. The presence of PRs like #252 and #84 suggests ongoing efforts to improve model inference capabilities and handle multimodal inputs more effectively. PRs addressing typos (#250) and documentation updates (#160) reflect an emphasis on maintaining code quality and providing clear guidance to users.

Closed pull requests reveal a history of active development with a mix of feature enhancements (#205, #198) and maintenance tasks (#241, #237). The reversion of changes in PR #228 highlights a responsive approach to development where adjustments are made based on feedback or issues encountered post-deployment. The updates related to video processing (#195, #183) suggest an ongoing effort to refine this aspect of the framework, which is crucial given its multimodal capabilities.

Overall, the pull request activity demonstrates a vibrant development process with a focus on continuous improvement, user experience enhancement, and robust community engagement through transparent collaboration.

Report On: Fetch commits



Repo Commits Analysis

Development Team and Recent Activity

Team Members

  1. Li Bo (Luodian)

    • Recent Activity:
    • Updated README.md (3 days ago).
    • Merged PR #205 to update video inference logic (5 days ago).
    • Contributed to multiple updates in documentation and training scripts related to LLaVA-OneVision-Chat.
    • Collaborated with ChunyuanLI and Tianyi Xiong on documentation updates.
  2. ChunyuanLI

    • Recent Activity:
    • Updated release dates in README.md and contributed to the documentation for LLaVA-OneVision-Chat (7 days ago).
    • Collaborated with Li Bo on various documentation updates.
  3. Tianyi Xiong (tyxiong23)

    • Recent Activity:
    • Made numerous updates to the LLaVA_OneVision_Chat.md, including adding DPO training scripts (7 days ago).
    • Collaborated with Li Bo and ChunyuanLI on documentation improvements.
  4. Yuanhan Zhang (ZhangYuanhan-AI)

    • Recent Activity:
    • Refactored video loading function and added new training scripts for video processing (3 days ago).
    • Worked on updating video inference logic and contributed significantly to video-related files.
  5. Kaichen Zhang (kcz358)

    • Recent Activity:
    • Merged PRs related to video processing and updated tutorials (19 days ago).
  6. Nguyen-Quang-Trung (ngquangtrung57)

    • Recent Activity:
    • Contributed a safe load tokenizer for llama_3 (23 days ago).
  7. Raushan Turganbay (zucchini-nlp)

    • Recent Activity:
    • Updated demo files in the tutorial notebook (23 days ago).

Summary of Activities

  • The team has been actively updating documentation, particularly for the LLaVA-OneVision-Chat feature, which suggests a focus on improving user experience and clarity.
  • Significant contributions have been made towards enhancing video processing capabilities, indicating an ongoing effort to refine multimodal functionalities.
  • Collaboration is evident between team members, especially among Li Bo, ChunyuanLI, and Tianyi Xiong, who frequently work together on documentation and feature enhancements.
  • The recent activities show a strong emphasis on merging pull requests that improve both functionality and documentation, reflecting a cohesive development strategy.

Patterns and Conclusions

  • Collaboration: There is a clear pattern of collaboration among team members, particularly in documentation efforts and feature development.
  • Focus Areas: The recent commits highlight a concentrated effort on improving video processing capabilities and enhancing user interaction through better documentation.
  • Active Development: The frequency of commits indicates an active development cycle with ongoing improvements being made across various components of the project.
  • Documentation Emphasis: A notable amount of activity is dedicated to updating documentation, which is crucial for user engagement and understanding of new features.

Overall, the development team is engaged in a productive cycle of enhancing both the functionality and usability of the LLaVA-NeXT project.