CogVideo, an open-source text-to-video generation project, is grappling with user-reported memory allocation issues and demands for improved model performance, highlighting the need for better resource optimization and feature enhancements.
CogVideo and its successor, CogVideoX, are designed to synthesize high-quality videos from text descriptions using advanced transformer architectures. The project is maintained by a diverse team of contributors focused on enhancing functionality and accessibility.
Recent issues and pull requests (PRs) reflect a focus on optimizing performance and addressing user concerns. Notable open PRs include #210, which aims to prevent memory allocation errors in gradio_web_demo.py
, and #142, which fixes bugs related to model directory paths. These efforts indicate a trajectory towards improving usability for users with limited hardware resources.
cli_vae_demo.py
with 1 commit.data_video.py
with 1 commit.Memory Allocation Issues: Open PR #210 addresses critical memory allocation errors, reflecting ongoing challenges in resource management for users with limited VRAM.
Localization Efforts: Significant updates to README files in multiple languages indicate an emphasis on making the project accessible globally.
Model Performance Concerns: User-reported issues highlight dissatisfaction with current model performance, particularly when comparing SAT sampling results to Diffusers.
Community Engagement: The active participation of contributors and users suggests a vibrant community committed to improving the project's capabilities.
Feature Requests: There is a notable demand for new features such as image-to-video generation, indicating potential areas for future development.
Timespan | Opened | Closed | Comments | Labeled | Milestones |
---|---|---|---|---|---|
7 Days | 33 | 29 | 83 | 29 | 1 |
30 Days | 120 | 100 | 342 | 114 | 1 |
90 Days | 120 | 100 | 342 | 114 | 1 |
1 Year | 122 | 100 | 344 | 116 | 1 |
All Time | 158 | 131 | - | - | - |
Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.
Developer | Avatar | Branches | PRs | Commits | Files | Changes |
---|---|---|---|---|---|---|
zR | ![]() |
1 | 15/15/0 | 63 | 172 | 35880 |
tunglinwood | ![]() |
1 | 1/1/0 | 2 | 3 | 647 |
Ikko Eltociear Ashimine | ![]() |
1 | 1/1/1 | 1 | 5 | 418 |
Jia Zheng | ![]() |
1 | 2/2/0 | 3 | 1 | 175 |
Shiyu Huang | ![]() |
1 | 0/0/0 | 4 | 4 | 81 |
yangzy_thu | ![]() |
1 | 0/0/0 | 1 | 5 | 79 |
Jiayan Teng | ![]() |
1 | 0/0/0 | 3 | 9 | 66 |
Yuvraj Sharma | ![]() |
1 | 1/1/0 | 1 | 4 | 14 |
Dr. Artificial曾小健 | ![]() |
1 | 7/2/5 | 2 | 2 | 8 |
cly2625 | ![]() |
1 | 2/1/1 | 1 | 1 | 7 |
learningpro | ![]() |
1 | 1/1/0 | 2 | 2 | 5 |
Wenyi Hong | ![]() |
1 | 0/0/0 | 1 | 1 | 2 |
Haiyi | ![]() |
1 | 1/1/0 | 1 | 1 | 1 |
dudulu (icowan) | 0 | 1/0/1 | 0 | 0 | 0 | |
Rodrigo Antônio de Araújo (rodjjo) | 0 | 2/0/1 | 0 | 0 | 0 | |
Yuan-Man (Yuan-ManX) | 0 | 2/0/2 | 0 | 0 | 0 | |
Arturo Guerrero (arturogro) | 0 | 1/0/1 | 0 | 0 | 0 | |
None (glide-the) | 0 | 3/0/3 | 0 | 0 | 0 | |
None (jackiealex) | 0 | 0/1/0 | 0 | 0 | 0 | |
Faych Chen (neverbiasu) | 0 | 1/0/1 | 0 | 0 | 0 | |
Geeve George (GeeveGeorge) | 0 | 1/0/1 | 0 | 0 | 0 | |
Fate_nihility (CodeLyokoscj) | 0 | 2/0/1 | 0 | 0 | 0 | |
None (CharlesCNorton) | 0 | 1/0/1 | 0 | 0 | 0 | |
Kartikey Porwal (kartikeyporwal) | 0 | 1/0/1 | 0 | 0 | 0 |
PRs: created by that dev and opened/merged/closed-unmerged during the period
The GitHub repository for the CogVideo project has seen a notable uptick in activity, with 27 open issues currently logged. Recent issues highlight various user experiences, including technical challenges and feature requests, indicating an engaged user base actively seeking support and improvements.
Several issues reflect common themes, particularly around model performance and resource management. Notably, many users report encountering CUDA out-of-memory errors, suggesting that the current resource requirements may be too high for some setups. Additionally, there are numerous requests for enhancements related to model capabilities, such as image-to-video generation and improved inference efficiency.
Issue #209: OSError: t5-v1_1-xxl is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
Issue #207: Use case demonstration on single 3090ti
Issue #205: RuntimeError: GET was unable to find an engine to execute this computation
Issue #204: 5B model: Tried to allocate 56.50 GiB
Issue #203: Fail to download SAT 5B model
Issue #202: GPU usage can not always be 100%
Issue #201: SAT sampling results are worse than Diffusers sampling results
Issue #199: Finetune时GPU利用率波动很大 (GPU utilization fluctuates significantly during fine-tuning)
Issue #194: Work plan and enhancement / 工作计划和用户诉求 (Work plan and user requests)
Issue #193: 为啥生成的视频人物都是西方人面孔? (Why do the generated videos only feature Western faces?)
The recent issues indicate several recurring themes:
Resource Management: Many users are facing memory-related errors, particularly with CUDA out-of-memory issues when using larger models like the CogVideoX-5B. This suggests that the current documentation may need to provide clearer guidelines on hardware requirements or optimizations for users with limited resources.
Model Performance: Issues regarding the performance of different models (e.g., SAT vs. Diffusers) highlight user concerns about output quality and efficiency. There is a clear demand for improvements in how models handle various input types and resolutions.
Feature Requests: Users are actively requesting new features such as image-to-video generation capabilities and enhancements to existing functionalities like prompt optimization and video length adjustments.
Community Engagement: The variety of issues raised reflects an engaged community eager to contribute feedback and seek assistance, which is crucial for the iterative development of the project.
Overall, the current state of open issues illustrates both the potential of CogVideo as a tool for text-to-video generation and the challenges users face in leveraging its capabilities effectively.
The analysis of the pull requests (PRs) for the THUDM/CogVideo repository reveals a mix of ongoing development efforts, bug fixes, and feature enhancements aimed at improving the functionality and performance of the text-to-video generation models. Currently, there are two open PRs and a significant number of closed PRs, indicating active maintenance and community engagement.
PR #210: Remove to device to avoid memory allocation errors
Created by Rodrigo Antônio de Araújo, this PR aims to remove the .to(device)
method call in the gradio_web_demo.py
file to prevent memory allocation errors by applying memory settings first. This change is significant for users with limited VRAM.
PR #142: Bug fixes
Submitted by Fate_nihility, this PR addresses multiple bugs related to model directory paths in various README files and ensures that output directories exist. This is essential for user experience and correct functionality.
PR #208: Add an option to run gradio web demo with very low vram
Closed without merging due to redundancy with PR #206. It proposed a feature for low VRAM usage but was deemed unnecessary after reviewing existing contributions.
PR #206: GPU memory update
Merged successfully, this PR introduced optimized code to reduce video memory usage significantly, which aligns with the project's goal of making it accessible for users with lower-end hardware.
PR #196 & PR #195: fix wrong file name
Both merged PRs corrected file naming issues in Japanese and Chinese README files, highlighting ongoing efforts to maintain multilingual support.
PR #188: fix
This merged PR made minor corrections across multiple README files, indicating a focus on documentation quality.
PR #181: loading video online
Merged successfully, this PR addressed specific issues related to loading videos online, enhancing user experience.
PR #179: 5B model release
Merged successfully, this PR updated the rife_model.py
file as part of the larger release strategy for the new model.
PR #178: Cog video x dev
This merged PR included updates to the release draft readme and user guide, indicating preparation for a new version launch.
PR #173: CogVideoX-5B config
Merged successfully, this PR added new configuration options for the latest model, showcasing ongoing development in model capabilities.
PR #165: Add upscale model integration EIFE integration and batch processing for video frames
Not merged due to extensive changes required; it proposed significant enhancements but faced challenges in integration.
The pull requests submitted to the THUDM/CogVideo repository reflect a robust engagement from contributors focused on both improving functionality and addressing user concerns. The two open pull requests (#210 and #142) indicate active work on optimizing performance and fixing bugs that could hinder usability. Notably, PR #210's focus on memory allocation is particularly relevant given the project's emphasis on making high-quality video generation accessible even on lower-end hardware configurations.
A significant portion of closed pull requests (48 total) demonstrates a proactive approach to maintenance. Many of these closed PRs involve minor fixes or documentation updates (e.g., PRs #196, #195, and #188), which are crucial for ensuring that users have accurate information when using the software. The merging of several bug fixes indicates an ongoing commitment to improving user experience by resolving issues swiftly.
The presence of closed but not merged pull requests (e.g., PRs #165 and #208) suggests that while contributors are eager to enhance features such as VRAM optimization and batch processing capabilities, there may be challenges in integrating these changes into the existing codebase. The feedback from maintainers like zR highlights a collaborative environment where contributors are encouraged to refine their submissions based on project needs.
Moreover, recent updates focusing on GPU memory optimizations (e.g., PR #206) align well with user feedback regarding performance improvements. This responsiveness is critical in maintaining community trust and engagement. The project’s ability to adapt quickly to user needs—such as reducing VRAM requirements—demonstrates a clear understanding of its target audience's constraints.
In summary, the analysis reveals a vibrant development environment characterized by active contributions aimed at enhancing both functionality and usability. However, challenges remain regarding feature integration that may require additional collaboration between contributors and maintainers. Continued focus on documentation quality and responsiveness to user feedback will be essential as the project evolves further.
zR (zRzRzRzRzRzRzR)
ArtificialZeng (Dr. Artificial曾小健)
bertjiazheng
tunglinwood
huangshiyu13
HaiyiMei
cli_vae_demo.py
.cly2625
data_video.py
.tengjiayan20
eltociear
yzy-thu
wenyihong
yvrjsharma
learningpro
The development team is actively engaged in enhancing the CogVideo and CogVideoX projects through continuous contributions that include both new features and critical bug fixes. The collaborative environment fosters community involvement while maintaining a focus on improving user accessibility and performance optimization.