OSS Report: THUDM/CogVideo

Aug. 29, 2024, 7:30 p.m. UTC This report was generated by Dispatch AI

CogVideo Project Faces Challenges with Resource Management and Model Performance

CogVideo, an open-source text-to-video generation project, is grappling with user-reported memory allocation issues and demands for improved model performance, highlighting the need for better resource optimization and feature enhancements.

CogVideo and its successor, CogVideoX, are designed to synthesize high-quality videos from text descriptions using advanced transformer architectures. The project is maintained by a diverse team of contributors focused on enhancing functionality and accessibility.

Recent Activity

Recent issues and pull requests (PRs) reflect a focus on optimizing performance and addressing user concerns. Notable open PRs include #210, which aims to prevent memory allocation errors in gradio_web_demo.py, and #142, which fixes bugs related to model directory paths. These efforts indicate a trajectory towards improving usability for users with limited hardware resources.

Development Team Activity

zR (zRzRzRzRzRzRzR): Central figure with 63 commits, focusing on GPU memory updates, README enhancements, and bug fixes.
ArtificialZeng: Addressed file name errors in README files with 2 commits.
bertjiazheng: Worked on video loading and frame padding bugs with 3 commits.
tunglinwood: Contributed to README localization with 2 commits.
huangshiyu13: Updated paper links in README with 4 commits.
HaiyiMei: Fixed a bug in cli_vae_demo.py with 1 commit.
cly2625: Fixed out-of-index bugs in data_video.py with 1 commit.
tengjiayan20: Updated multi-GPU finetuning scripts with 3 commits.
eltociear: Added Japanese README updates with 1 commit.
yzy-thu: Made minor updates with 1 commit.
wenyihong: Updated README with 1 commit.
yvrjsharma: Contributed to localization efforts with 1 commit.
learningpro: Updated requirements.txt and fixed model path issues with 2 commits.

Of Note

Memory Allocation Issues: Open PR #210 addresses critical memory allocation errors, reflecting ongoing challenges in resource management for users with limited VRAM.
Localization Efforts: Significant updates to README files in multiple languages indicate an emphasis on making the project accessible globally.
Model Performance Concerns: User-reported issues highlight dissatisfaction with current model performance, particularly when comparing SAT sampling results to Diffusers.
Community Engagement: The active participation of contributors and users suggests a vibrant community committed to improving the project's capabilities.
Feature Requests: There is a notable demand for new features such as image-to-video generation, indicating potential areas for future development.

Quantified Reports

Quantify Issues

Recent GitHub Issues Activity

Timespan	Opened	Closed	Comments	Labeled	Milestones
7 Days	33	29	83	29	1
30 Days	120	100	342	114	1
90 Days	120	100	342	114	1
1 Year	122	100	344	116	1
All Time	158	131	-	-	-

_{Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.}

Quantify commits

Quantified Commit Activity Over 30 Days

Developer	Branches	PRs	Commits	Files	Changes
zR	1	15/15/0	63	172	35880
tunglinwood	1	1/1/0	2	3	647
Ikko Eltociear Ashimine	1	1/1/1	1	5	418
Jia Zheng	1	2/2/0	3	1	175
Shiyu Huang	1	0/0/0	4	4	81
yangzy_thu	1	0/0/0	1	5	79
Jiayan Teng	1	0/0/0	3	9	66
Yuvraj Sharma	1	1/1/0	1	4	14
Dr. Artificial曾小健	1	7/2/5	2	2	8
cly2625	1	2/1/1	1	1	7
learningpro	1	1/1/0	2	2	5
Wenyi Hong	1	0/0/0	1	1	2
Haiyi	1	1/1/0	1	1	1
dudulu (icowan)	0	1/0/1	0	0	0
Rodrigo Antônio de Araújo (rodjjo)	0	2/0/1	0	0	0
Yuan-Man (Yuan-ManX)	0	2/0/2	0	0	0
Arturo Guerrero (arturogro)	0	1/0/1	0	0	0
None (glide-the)	0	3/0/3	0	0	0
None (jackiealex)	0	0/1/0	0	0	0
Faych Chen (neverbiasu)	0	1/0/1	0	0	0
Geeve George (GeeveGeorge)	0	1/0/1	0	0	0
Fate_nihility (CodeLyokoscj)	0	2/0/1	0	0	0
None (CharlesCNorton)	0	1/0/1	0	0	0
Kartikey Porwal (kartikeyporwal)	0	1/0/1	0	0	0

_{PRs: created by that dev and opened/merged/closed-unmerged during the period}

Detailed Reports

Report On: Fetch issues

Recent Activity Analysis

The GitHub repository for the CogVideo project has seen a notable uptick in activity, with 27 open issues currently logged. Recent issues highlight various user experiences, including technical challenges and feature requests, indicating an engaged user base actively seeking support and improvements.

Several issues reflect common themes, particularly around model performance and resource management. Notably, many users report encountering CUDA out-of-memory errors, suggesting that the current resource requirements may be too high for some setups. Additionally, there are numerous requests for enhancements related to model capabilities, such as image-to-video generation and improved inference efficiency.

Issue Details

Most Recently Created Issues

Issue #209: OSError: t5-v1_1-xxl is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
- Priority: High
- Status: Open
- Created: 0 days ago
- Updated: N/A
Issue #207: Use case demonstration on single 3090ti
- Priority: Medium
- Status: Open
- Created: 0 days ago
- Updated: N/A
Issue #205: RuntimeError: GET was unable to find an engine to execute this computation
- Priority: High
- Status: Open
- Created: 0 days ago
- Updated: N/A
Issue #204: 5B model: Tried to allocate 56.50 GiB
- Priority: Critical
- Status: Open
- Created: 0 days ago
- Updated: N/A
Issue #203: Fail to download SAT 5B model
- Priority: Medium
- Status: Open
- Created: 0 days ago
- Updated: N/A

Most Recently Updated Issues

Issue #202: GPU usage can not always be 100%
- Priority: Medium
- Status: Open
- Created: 1 day ago
- Updated: Recently edited
Issue #201: SAT sampling results are worse than Diffusers sampling results
- Priority: Medium
- Status: Open
- Created: 1 day ago
- Updated: Recently edited
Issue #199: Finetune时GPU利用率波动很大 (GPU utilization fluctuates significantly during fine-tuning)
- Priority: Medium
- Status: Open
- Created: 1 day ago
- Updated: Recently edited
Issue #194: Work plan and enhancement / 工作计划和用户诉求 (Work plan and user requests)
- Priority: Low (enhancement)
- Status: Open
- Created: 1 day ago
- Updated: Recently edited
Issue #193: 为啥生成的视频人物都是西方人面孔？ (Why do the generated videos only feature Western faces?)
- Priority: Low
- Status: Open
- Created: 1 day ago
- Updated: Recently edited

Analysis of Themes and Commonalities

The recent issues indicate several recurring themes:

Resource Management: Many users are facing memory-related errors, particularly with CUDA out-of-memory issues when using larger models like the CogVideoX-5B. This suggests that the current documentation may need to provide clearer guidelines on hardware requirements or optimizations for users with limited resources.
Model Performance: Issues regarding the performance of different models (e.g., SAT vs. Diffusers) highlight user concerns about output quality and efficiency. There is a clear demand for improvements in how models handle various input types and resolutions.
Feature Requests: Users are actively requesting new features such as image-to-video generation capabilities and enhancements to existing functionalities like prompt optimization and video length adjustments.
Community Engagement: The variety of issues raised reflects an engaged community eager to contribute feedback and seek assistance, which is crucial for the iterative development of the project.

Overall, the current state of open issues illustrates both the potential of CogVideo as a tool for text-to-video generation and the challenges users face in leveraging its capabilities effectively.

Report On: Fetch pull requests

Overview

The analysis of the pull requests (PRs) for the THUDM/CogVideo repository reveals a mix of ongoing development efforts, bug fixes, and feature enhancements aimed at improving the functionality and performance of the text-to-video generation models. Currently, there are two open PRs and a significant number of closed PRs, indicating active maintenance and community engagement.

Summary of Pull Requests

Open Pull Requests

PR #210: Remove to device to avoid memory allocation errors
Created by Rodrigo Antônio de Araújo, this PR aims to remove the .to(device) method call in the gradio_web_demo.py file to prevent memory allocation errors by applying memory settings first. This change is significant for users with limited VRAM.
PR #142: Bug fixes
Submitted by Fate_nihility, this PR addresses multiple bugs related to model directory paths in various README files and ensures that output directories exist. This is essential for user experience and correct functionality.

Closed Pull Requests

PR #208: Add an option to run gradio web demo with very low vram
Closed without merging due to redundancy with PR #206. It proposed a feature for low VRAM usage but was deemed unnecessary after reviewing existing contributions.
PR #206: GPU memory update
Merged successfully, this PR introduced optimized code to reduce video memory usage significantly, which aligns with the project's goal of making it accessible for users with lower-end hardware.
PR #196 & PR #195: fix wrong file name
Both merged PRs corrected file naming issues in Japanese and Chinese README files, highlighting ongoing efforts to maintain multilingual support.
PR #188: fix
This merged PR made minor corrections across multiple README files, indicating a focus on documentation quality.
PR #181: loading video online
Merged successfully, this PR addressed specific issues related to loading videos online, enhancing user experience.
PR #179: 5B model release
Merged successfully, this PR updated the rife_model.py file as part of the larger release strategy for the new model.
PR #178: Cog video x dev
This merged PR included updates to the release draft readme and user guide, indicating preparation for a new version launch.
PR #173: CogVideoX-5B config
Merged successfully, this PR added new configuration options for the latest model, showcasing ongoing development in model capabilities.
PR #165: Add upscale model integration EIFE integration and batch processing for video frames
Not merged due to extensive changes required; it proposed significant enhancements but faced challenges in integration.

Analysis of Pull Requests

The pull requests submitted to the THUDM/CogVideo repository reflect a robust engagement from contributors focused on both improving functionality and addressing user concerns. The two open pull requests (#210 and #142) indicate active work on optimizing performance and fixing bugs that could hinder usability. Notably, PR #210's focus on memory allocation is particularly relevant given the project's emphasis on making high-quality video generation accessible even on lower-end hardware configurations.

A significant portion of closed pull requests (48 total) demonstrates a proactive approach to maintenance. Many of these closed PRs involve minor fixes or documentation updates (e.g., PRs #196, #195, and #188), which are crucial for ensuring that users have accurate information when using the software. The merging of several bug fixes indicates an ongoing commitment to improving user experience by resolving issues swiftly.

The presence of closed but not merged pull requests (e.g., PRs #165 and #208) suggests that while contributors are eager to enhance features such as VRAM optimization and batch processing capabilities, there may be challenges in integrating these changes into the existing codebase. The feedback from maintainers like zR highlights a collaborative environment where contributors are encouraged to refine their submissions based on project needs.

Moreover, recent updates focusing on GPU memory optimizations (e.g., PR #206) align well with user feedback regarding performance improvements. This responsiveness is critical in maintaining community trust and engagement. The project’s ability to adapt quickly to user needs—such as reducing VRAM requirements—demonstrates a clear understanding of its target audience's constraints.

In summary, the analysis reveals a vibrant development environment characterized by active contributions aimed at enhancing both functionality and usability. However, challenges remain regarding feature integration that may require additional collaboration between contributors and maintainers. Continued focus on documentation quality and responsiveness to user feedback will be essential as the project evolves further.

Report On: Fetch commits

Repo Commits Analysis

Development Team and Recent Activity

Team Members and Activities

zR (zRzRzRzRzRzRzR)
- Recent Activity:
- Merged multiple pull requests including GPU memory updates, loading video online, and optimizations for memory usage.
- Made significant updates to README files in multiple languages (English, Japanese, Chinese) to reflect recent changes and improvements.
- Worked on fixing bugs related to file names and frame padding.
- Contributed to the addition of new features like preloading captions and enhancements in the inference scripts.
- Total of 63 commits with 35,880 changes across 172 files in the last 30 days.
ArtificialZeng (Dr. Artificial曾小健)
- Recent Activity:
- Focused on fixing wrong file names in README files.
- Contributed a total of 2 commits with 8 changes across 2 files.
bertjiazheng
- Recent Activity:
- Worked on loading video online and fixing frame padding bugs.
- Total of 3 commits with 175 changes across 1 file.
tunglinwood
- Recent Activity:
- Updated README files, contributing to localization efforts.
- Total of 2 commits with 647 changes across 3 files.
huangshiyu13
- Recent Activity:
- Engaged in updating paper links and citations in the README.
- Total of 4 commits with 81 changes across 4 files.
HaiyiMei
- Recent Activity:
- Fixed a bug in cli_vae_demo.py.
- Total of 1 commit with 1 change across 1 file.
cly2625
- Recent Activity:
- Fixed out-of-index bugs in data_video.py.
- Total of 1 commit with 7 changes across 1 file.
tengjiayan20
- Recent Activity:
- Contributed to updates related to multi-GPU finetuning scripts.
- Total of 3 commits with 66 changes across 9 files.
eltociear
- Recent Activity:
- Added a Japanese README and made updates to other language versions.
- Total of 1 commit with 418 changes across 5 files.
yzy-thu
- Recent Activity:
- Minor updates contributing to the project.
- Total of 1 commit with 79 changes across 5 files.
wenyihong
- Recent Activity:
- Made minor updates to the README.
- Total of 1 commit with 2 changes across 1 file.
yvrjsharma
- Recent Activity:
- Updated README and contributed to localization efforts.
- Total of 1 commit with 14 changes across 4 files.
learningpro
- Recent Activity:
- Updated requirements.txt and fixed issues related to model paths.
- Total of 2 commits with 5 changes across 2 files.

Patterns and Themes

The majority of recent activity is driven by zR, who is heavily involved in both feature development and bug fixes, indicating a central role in the project.
There is a strong emphasis on documentation updates, particularly for localization (Japanese and Chinese), which reflects an effort to make the project accessible to a broader audience.
Collaboration is evident through multiple merged pull requests from various contributors, suggesting an active community engagement around the project.
Bug fixes are a recurring theme, highlighting ongoing maintenance efforts alongside feature enhancements.
The project shows a clear focus on optimizing performance, particularly regarding memory usage for GPU inference, which aligns with the project's goals for efficiency in video generation tasks.

Conclusions

The development team is actively engaged in enhancing the CogVideo and CogVideoX projects through continuous contributions that include both new features and critical bug fixes. The collaborative environment fosters community involvement while maintaining a focus on improving user accessibility and performance optimization.