Stability AI's generative-models repository houses state-of-the-art AI models for tasks like text-to-image and image-to-video generation. The project has recently focused on significant improvements to their Stable Video 4D (SV4D) model.
In the past month, the development team has made substantial progress on SV4D, dramatically reducing its memory consumption from 40G to 20G and improving processing speed from 500s to 200s. They've also added a Gradio demo for easier testing and demonstration of SV4D capabilities, indicating a push towards user-friendly interfaces for their advanced models.
Recent pull requests and issues cluster around SV4D improvements, documentation updates, and ongoing optimization efforts. PR #394, merged 15 days ago, encapsulates many of these changes, including the memory and speed optimizations for SV4D and the addition of the Gradio demo.
The development team's recent activities, in reverse chronological order:
chunhanyao-stable:
ymxie97:
Vikram Voleti (voletiv):
Andreas Blattmann:
Tim Dockhorn (timudk):
The dramatic improvement in SV4D performance (50% memory reduction, 60% speed increase) represents a significant leap in model efficiency.
The addition of a Gradio demo for SV4D suggests a focus on making advanced models more accessible to a wider audience.
There's an ongoing effort to improve documentation and fix minor issues, indicating a commitment to user experience and code quality.
The repository maintains a modular, config-driven approach to model development, allowing for flexible combination of submodules.
Despite active development on newer models like SV4D, there's continued maintenance and updates for existing models such as SVD and SDXL, showing a commitment to the entire model ecosystem.
Developer | Avatar | Branches | PRs | Commits | Files | Changes |
---|---|---|---|---|---|---|
Vikram Voleti | 2 | 0/0/0 | 4 | 18 | 3224 | |
ymxie97 | 1 | 1/1/0 | 7 | 38 | 928 | |
chunhanyao-stable | 1 | 2/1/1 | 2 | 7 | 202 | |
Vikram Voleti | 0 | 3/3/0 | 0 | 0 | 0 | |
Devansh Bisla (devansh20la) | 0 | 0/0/1 | 0 | 0 | 0 |
PRs: created by that dev and opened/merged/closed-unmerged during the period
Timespan | Opened | Closed | Comments | Labeled | Milestones |
---|---|---|---|---|---|
7 Days | 2 | 0 | 0 | 2 | 1 |
30 Days | 11 | 7 | 20 | 11 | 1 |
90 Days | 30 | 9 | 25 | 30 | 1 |
1 Year | 237 | 38 | 483 | 237 | 1 |
All Time | 292 | 52 | - | - | - |
Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.
Here is a brief analysis of the GitHub issues for the Stability AI generative-models repository:
Recent GitHub issue activity shows ongoing interest in and development of Stability AI's generative models, particularly around video generation capabilities like Stable Video Diffusion (SVD) and SV3D/SV4D. There is frequent discussion about running these models, troubleshooting errors, and requests for additional features or clarifications.
Some notable themes and issues include:
Many issues relate to setup and environment problems, indicating the complexity of running these large models. There's also significant interest in fine-tuning and customizing models for specific use cases.
Recently created/updated issues:
These recent issues highlight ongoing work on model improvements, efforts to optimize resource usage, and interest in applying the models to various datasets and tasks.
The dataset contains information on 43 open pull requests for the Stability-AI/generative-models repository, covering various improvements, fixes, and new features for their generative AI models.
#394 (15 days ago): Merged changes to reduce memory consumption and speed up SV4D, added Gradio demo.
#391 (17 days ago): Updated README and sampling script for SV4D, not merged.
#386 (24 days ago): Fixed SV3D link in README.
#385 (24 days ago): Fixed links in README for SV4D.
#384 (24 days ago): Added SV4D code, including new scripts, configs, and modules.
#378 (38 days ago): Added documentation for Python path issue in streamlit demos.
#364 (71 days ago): Fixed array broadcasting error in video sampling script.
#331 (145 days ago): Fixed unassignment bug in video sampling script for JPEG images.
#327 (148 days ago): Fixed video writing issue using OpenCV instead of imageio.
#324 (148 days ago): Fixed typo in attention.py.
#321 (149 days ago): Fixed SVD image input bug and suggested using torchvision for video saving.
#319 (149 days ago): Added imageio-ffmpeg and pyav modules to requirements to fix corrupted video issue.
#310 (151 days ago): Added Gradio updates for SV3D.
#284 (179 days ago): Removed Triton requirement when running on Windows.
#278 (198 days ago): Added an unrelated image file.
#276 (201 days ago): Fixed grid initialization in img2img example.
#253 (242 days ago): Added simple import test to build script.
#252 (243 days ago): Added Hugging Face Gradio demo link to README.
#245 (248 days ago): Fixed IdentityFirstStage class to include encoder and decoder attributes.
#244 (249 days ago): Fixed undefined 'grid' variable in do_img2img() function.
#225 (256 days ago): Replaced string concatenations with f-strings in main.py.
#206 (261 days ago): Updated CI workflow to remove non-existent requirements file.
#195 (263 days ago): Proposed adoption of pip-compile for managing requirements.
#193 (263 days ago): Simplified requirements file management.
#183 (267 days ago): Removed more Torch version comparisons after dropping support for PyTorch < 2.0.
#165 (268 days ago): Adjusted video sampling demo to handle cases where width and height are less than 256.
#151 (269 days ago): Replaced remaining print() calls with logging calls in the library.
#150 (269 days ago): Replaced deprecated Logger.warn with Logger.warning.
#147 (269 days ago): Removed duplicate get_interactive_image function.
#146 (269 days ago): Proposed removal of star imports for better static analysis.
#104 (379 days ago): Added dev container configs for CPU and GPU use cases.
#103 (379 days ago): Added simple import test to build script, revealing issues with pt13.
#102 (379 days ago): Refactored helpers and Streamlit demo, adding new features and improving defaults.
#90 (385 days ago): Added visualization capabilities to display output images during testing.
#89 (385 days ago): Modified watermark detection code to use ThreadPoolExecutor for parallelization.
#81 (387 days ago): Added minimal demo (no details provided).
#79 (387 days ago): Updated minimal txt2img example using new APIs.
#78 (387 days ago): Proposed fallback to vanilla attention if xformers is not available.
#76 (387 days ago): Deduplicated sampling/demo code and refactored helpers.
#60 (389 days ago): Added simple, minimal txt2img command line example tool.
#54 (395 days ago): Proposed using ast.literal_eval() instead of eval() for safety in get_string_from_tuple.
#52 (395 days ago): Configured Ruff and Black for linting and formatting.
#50 (395 days ago): Proposed late-importing scipy to allow more minimal inference requirements.
#49 (395 days ago): Un-hardcoded "cuda" as default device name, allowing configuration via environment variable.
The pull requests for the Stability-AI/generative-models repository reveal several key themes and ongoing development efforts:
Performance Improvements: There's a significant focus on optimizing the models, particularly SV4D, for better memory consumption and speed (#394). This indicates that the team is working on making the models more efficient and accessible to users with varying hardware capabilities.
New Features and Models: The addition of SV4D code (#384) and related updates (#391, #385, #386) shows that Stability AI is actively expanding its model offerings. The inclusion of Gradio demos (#310, #252) also suggests an effort to make the models more accessible and interactive for users.
Bug Fixes: Many PRs address various bugs, ranging from simple typos (#324) to more complex issues like array broadcasting errors (#364) and image input bugs (#321, #331). This ongoing maintenance is crucial for the reliability and usability of the codebase.
Cross-platform Compatibility: Efforts to improve Windows support (#284) and device flexibility (#49) show a commitment to making the models accessible across different platforms and hardware setups.
Code Quality and Maintenance: There's a clear trend towards improving code quality through linting (#52), refactoring (#76, #102), and modernizing Python practices (#225). The move towards using logging instead of print statements (#151) also indicates a more professional approach to code structure.
Documentation and Usability: Several PRs focus on improving documentation (#378) and adding examples (#60), which is essential for the project's adoption and user understanding.
CI/CD and Development Environment: Updates to CI workflows (#206) and the addition of dev container configs (#104) show efforts to improve the development and testing process.
Dependency Management: There's ongoing discussion and work on improving how dependencies are managed (#195, #193), which is crucial for a project of this scale and complexity.
One notable area of potential improvement is the handling of PRs. Many older PRs remain open without clear resolution or recent activity. Implementing a more rigorous PR review and merge process could help maintain a cleaner, more up-to-date codebase.
The repository is clearly under active development, with a good balance between adding new features, improving existing functionality, and maintaining code quality. However, the large number of open PRs suggests that there might be a bottleneck in the review and merge process, which could potentially slow down the project's progress.
chunhanyao-stable:
ymxie97:
Vikram Voleti (voletiv):
Andreas Blattmann:
Tim Dockhorn (timudk):
Focus on SV4D (Stable Video 4D):
Collaboration:
Documentation and User Experience:
Ongoing Optimization:
Modular Development:
Quality Assurance:
The development team has been primarily focused on improving and releasing the SV4D (Stable Video 4D) model, with significant efforts in optimization, user interface development, and documentation updates. The work appears to be collaborative, with different team members contributing to various aspects of the model's development and release preparation.