OSS Report: Stability-AI/generative-models

Aug. 17, 2024, 3:30 a.m. UTC This report was generated by Dispatch AI

Stability AI Improves SV4D Performance, Adds Demo Amid Ongoing Development

Stability AI's generative-models repository houses state-of-the-art AI models for tasks like text-to-image and image-to-video generation. The project has recently focused on significant improvements to their Stable Video 4D (SV4D) model.

In the past month, the development team has made substantial progress on SV4D, dramatically reducing its memory consumption from 40G to 20G and improving processing speed from 500s to 200s. They've also added a Gradio demo for easier testing and demonstration of SV4D capabilities, indicating a push towards user-friendly interfaces for their advanced models.

Recent Activity

Recent pull requests and issues cluster around SV4D improvements, documentation updates, and ongoing optimization efforts. PR #394, merged 15 days ago, encapsulates many of these changes, including the memory and speed optimizations for SV4D and the addition of the Gradio demo.

The development team's recent activities, in reverse chronological order:

chunhanyao-stable:
- Merged PR #394 for SV4D improvements
- Updated SV4D README and scripts
ymxie97:
- Collaborated on SV4D improvements in PR #394
- Worked on memory consumption reduction and speed improvements
- Added Gradio demo for SV4D
- Made README updates and script adjustments
Vikram Voleti (voletiv):
- Merged several PRs related to SV4D and SV3D
- Added SV4D code in PR #384, including new modules, configs, and sampling scripts
- Fixed documentation links
Andreas Blattmann:
- Updated SVD license
Tim Dockhorn (timudk):
- Removed deprecated scale_schedule_config

Of Note

The dramatic improvement in SV4D performance (50% memory reduction, 60% speed increase) represents a significant leap in model efficiency.
The addition of a Gradio demo for SV4D suggests a focus on making advanced models more accessible to a wider audience.
There's an ongoing effort to improve documentation and fix minor issues, indicating a commitment to user experience and code quality.
The repository maintains a modular, config-driven approach to model development, allowing for flexible combination of submodules.
Despite active development on newer models like SV4D, there's continued maintenance and updates for existing models such as SVD and SDXL, showing a commitment to the entire model ecosystem.

Quantified Reports

Quantify commits

Quantified Commit Activity Over 30 Days

Developer	Branches	PRs	Commits	Files	Changes
Vikram Voleti	2	0/0/0	4	18	3224
ymxie97	1	1/1/0	7	38	928
chunhanyao-stable	1	2/1/1	2	7	202
Vikram Voleti	0	3/3/0	0	0	0
Devansh Bisla (devansh20la)	0	0/0/1	0	0	0

_{PRs: created by that dev and opened/merged/closed-unmerged during the period}

Quantify Issues

Recent GitHub Issues Activity

Timespan	Opened	Closed	Comments	Labeled	Milestones
7 Days	2	0	0	2	1
30 Days	11	7	20	11	1
90 Days	30	9	25	30	1
1 Year	237	38	483	237	1
All Time	292	52	-	-	-

_{Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.}

Detailed Reports

Report On: Fetch issues

Here is a brief analysis of the GitHub issues for the Stability AI generative-models repository:

Recent Activity Analysis

Recent GitHub issue activity shows ongoing interest in and development of Stability AI's generative models, particularly around video generation capabilities like Stable Video Diffusion (SVD) and SV3D/SV4D. There is frequent discussion about running these models, troubleshooting errors, and requests for additional features or clarifications.

Some notable themes and issues include:

Requests for training code and instructions, especially for SVD and SDXL-Turbo models
Questions about model architectures, parameters, and technical details
Compatibility issues with different hardware setups, especially around VRAM requirements
Interest in expanding capabilities like multi-view generation and 3D reconstruction
Clarifications needed on licensing and open source plans for some models

Many issues relate to setup and environment problems, indicating the complexity of running these large models. There's also significant interest in fine-tuning and customizing models for specific use cases.

Issue Details

Recently created/updated issues:

#397 (open): Fixing issue in diffusion.py file (created 5 days ago)
#396 (open): Enabling Multi-GPU for SVD (created 5 days ago)
#393 (open): Out of memory error with SV4D on 40GB A100 (created 16 days ago, updated 7 days ago)
#395 (closed): Question about encoder weights for SV3D/SV4D (created 8 days ago, closed 8 days ago)
#389 (closed): Running SV4D on DAVIS dataset (created 22 days ago, closed 15 days ago)

These recent issues highlight ongoing work on model improvements, efforts to optimize resource usage, and interest in applying the models to various datasets and tasks.

Report On: Fetch pull requests

Overview

The dataset contains information on 43 open pull requests for the Stability-AI/generative-models repository, covering various improvements, fixes, and new features for their generative AI models.

Summary of Pull Requests

#394 (15 days ago): Merged changes to reduce memory consumption and speed up SV4D, added Gradio demo.

#391 (17 days ago): Updated README and sampling script for SV4D, not merged.

#386 (24 days ago): Fixed SV3D link in README.

#385 (24 days ago): Fixed links in README for SV4D.

#384 (24 days ago): Added SV4D code, including new scripts, configs, and modules.

#378 (38 days ago): Added documentation for Python path issue in streamlit demos.

#364 (71 days ago): Fixed array broadcasting error in video sampling script.

#331 (145 days ago): Fixed unassignment bug in video sampling script for JPEG images.

#327 (148 days ago): Fixed video writing issue using OpenCV instead of imageio.

#324 (148 days ago): Fixed typo in attention.py.

#321 (149 days ago): Fixed SVD image input bug and suggested using torchvision for video saving.

#319 (149 days ago): Added imageio-ffmpeg and pyav modules to requirements to fix corrupted video issue.

#310 (151 days ago): Added Gradio updates for SV3D.

#284 (179 days ago): Removed Triton requirement when running on Windows.

#278 (198 days ago): Added an unrelated image file.

#276 (201 days ago): Fixed grid initialization in img2img example.

#253 (242 days ago): Added simple import test to build script.

#252 (243 days ago): Added Hugging Face Gradio demo link to README.

#245 (248 days ago): Fixed IdentityFirstStage class to include encoder and decoder attributes.

#244 (249 days ago): Fixed undefined 'grid' variable in do_img2img() function.

#225 (256 days ago): Replaced string concatenations with f-strings in main.py.

#206 (261 days ago): Updated CI workflow to remove non-existent requirements file.

#195 (263 days ago): Proposed adoption of pip-compile for managing requirements.

#193 (263 days ago): Simplified requirements file management.

#183 (267 days ago): Removed more Torch version comparisons after dropping support for PyTorch < 2.0.

#165 (268 days ago): Adjusted video sampling demo to handle cases where width and height are less than 256.

#151 (269 days ago): Replaced remaining print() calls with logging calls in the library.

#150 (269 days ago): Replaced deprecated Logger.warn with Logger.warning.

#147 (269 days ago): Removed duplicate get_interactive_image function.

#146 (269 days ago): Proposed removal of star imports for better static analysis.

#104 (379 days ago): Added dev container configs for CPU and GPU use cases.

#103 (379 days ago): Added simple import test to build script, revealing issues with pt13.

#102 (379 days ago): Refactored helpers and Streamlit demo, adding new features and improving defaults.

#90 (385 days ago): Added visualization capabilities to display output images during testing.

#89 (385 days ago): Modified watermark detection code to use ThreadPoolExecutor for parallelization.

#81 (387 days ago): Added minimal demo (no details provided).

#79 (387 days ago): Updated minimal txt2img example using new APIs.

#78 (387 days ago): Proposed fallback to vanilla attention if xformers is not available.

#76 (387 days ago): Deduplicated sampling/demo code and refactored helpers.

#60 (389 days ago): Added simple, minimal txt2img command line example tool.

#54 (395 days ago): Proposed using ast.literal_eval() instead of eval() for safety in get_string_from_tuple.

#52 (395 days ago): Configured Ruff and Black for linting and formatting.

#50 (395 days ago): Proposed late-importing scipy to allow more minimal inference requirements.

#49 (395 days ago): Un-hardcoded "cuda" as default device name, allowing configuration via environment variable.

Analysis of Pull Requests

The pull requests for the Stability-AI/generative-models repository reveal several key themes and ongoing development efforts:

Performance Improvements: There's a significant focus on optimizing the models, particularly SV4D, for better memory consumption and speed (#394). This indicates that the team is working on making the models more efficient and accessible to users with varying hardware capabilities.
New Features and Models: The addition of SV4D code (#384) and related updates (#391, #385, #386) shows that Stability AI is actively expanding its model offerings. The inclusion of Gradio demos (#310, #252) also suggests an effort to make the models more accessible and interactive for users.
Bug Fixes: Many PRs address various bugs, ranging from simple typos (#324) to more complex issues like array broadcasting errors (#364) and image input bugs (#321, #331). This ongoing maintenance is crucial for the reliability and usability of the codebase.
Cross-platform Compatibility: Efforts to improve Windows support (#284) and device flexibility (#49) show a commitment to making the models accessible across different platforms and hardware setups.
Code Quality and Maintenance: There's a clear trend towards improving code quality through linting (#52), refactoring (#76, #102), and modernizing Python practices (#225). The move towards using logging instead of print statements (#151) also indicates a more professional approach to code structure.
Documentation and Usability: Several PRs focus on improving documentation (#378) and adding examples (#60), which is essential for the project's adoption and user understanding.
CI/CD and Development Environment: Updates to CI workflows (#206) and the addition of dev container configs (#104) show efforts to improve the development and testing process.
Dependency Management: There's ongoing discussion and work on improving how dependencies are managed (#195, #193), which is crucial for a project of this scale and complexity.

One notable area of potential improvement is the handling of PRs. Many older PRs remain open without clear resolution or recent activity. Implementing a more rigorous PR review and merge process could help maintain a cleaner, more up-to-date codebase.

The repository is clearly under active development, with a good balance between adding new features, improving existing functionality, and maintaining code quality. However, the large number of open PRs suggests that there might be a bottleneck in the review and merge process, which could potentially slow down the project's progress.

Report On: Fetch commits

Development Team and Recent Activity

Team Members and Activity

chunhanyao-stable:
- Merged PR #394 for SV4D changes, including memory optimization and Gradio demo addition
- Updated SV4D README and scripts
ymxie97:
- Worked on SV4D improvements:
- Reduced memory consumption (40G to 20G) and improved speed (500s to 200s)
- Added Gradio demo
- Fixed README and made minor adjustments to scripts
- Collaborated with chunhanyao-stable on SV4D changes
Vikram Voleti (voletiv):
- Merged several PRs related to SV4D and SV3D
- Added SV4D code, including new modules, configs, and sampling scripts
- Made fixes to links and documentation
Andreas Blattmann:
- Updated SVD license
Tim Dockhorn (timudk):
- Removed deprecated scale_schedule_config

Patterns and Themes

Focus on SV4D (Stable Video 4D):
- Major development effort on improving and optimizing SV4D
- Addition of Gradio demo for easier testing and demonstration
- Significant memory and speed optimizations
Collaboration:
- Close collaboration between ymxie97 and chunhanyao-stable on SV4D improvements
- Vikram Voleti overseeing and merging PRs from team members
Documentation and User Experience:
- Frequent updates to README and documentation
- Addition of demo applications (e.g., Gradio) for easier model testing
Ongoing Optimization:
- Continuous efforts to reduce memory consumption and improve processing speed
Modular Development:
- Addition of new modules and configs for SV4D, indicating a modular approach to model development
Quality Assurance:
- Regular merging of PRs after review
- Fixing of links, typos, and minor issues in documentation

The development team has been primarily focused on improving and releasing the SV4D (Stable Video 4D) model, with significant efforts in optimization, user interface development, and documentation updates. The work appears to be collaborative, with different team members contributing to various aspects of the model's development and release preparation.