STORM is a Python-based system developed by Stanford OVAL for automated knowledge curation using LLMs, producing Wikipedia-like articles. Co-STORM extends this with human-AI collaboration. The project is active and evolving, with recent enhancements in retrieval methods and presentations at major conferences.
Issues:
Pull Requests:
Timespan | Opened | Closed | Comments | Labeled | Milestones |
---|---|---|---|---|---|
7 Days | 5 | 1 | 0 | 5 | 1 |
30 Days | 14 | 7 | 10 | 14 | 1 |
90 Days | 58 | 59 | 100 | 54 | 1 |
All Time | 133 | 102 | - | - | - |
Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.
Developer | Avatar | Branches | PRs | Commits | Files | Changes |
---|---|---|---|---|---|---|
patrick@cryptolock.ai | 1 | 0/0/0 | 2 | 2 | 65 | |
Yijia Shao | 1 | 0/0/0 | 2 | 1 | 6 | |
Eminem | 1 | 2/1/1 | 2 | 1 | 6 | |
Jaigouk Kim (jaigouk) | 0 | 0/0/1 | 0 | 0 | 0 | |
dFusion Dev (dfusion-dev) | 0 | 0/1/0 | 0 | 0 | 0 |
PRs: created by that dev and opened/merged/closed-unmerged during the period
Risk | Level (1-5) | Rationale |
---|---|---|
Delivery | 3 | The project faces moderate delivery risks due to a backlog of unresolved issues and prolonged open pull requests. For instance, PR #192 has been open for 28 days, and PR #155 for 59 days, indicating potential bottlenecks in the review process. Additionally, unresolved mobile UI issues (#241) could impact user experience if not addressed promptly. |
Velocity | 3 | The project's velocity is moderate, with limited commit activity and prolonged open pull requests. The recent commit activity shows minor updates and bug fixes, primarily by a few developers like Yijia Shao and Eminem. The lack of diverse branch activity and minimal pull request closures suggest potential bottlenecks in development processes. |
Dependency | 2 | Dependency risks are relatively low due to the modular architecture that supports various language models and retrieval modules. However, integration challenges with APIs (#171) highlight areas needing attention. The introduction of multiple retrievers in PR #155 could mitigate some dependency risks. |
Team | 3 | Team risks are moderate, with limited direct contributions from some team members and potential communication challenges indicated by prolonged open pull requests. The lack of comments on recent issues suggests limited discussion or collaboration, which could lead to misunderstandings or misaligned priorities. |
Code Quality | 3 | Code quality is maintained through regular formatting updates and minor refactoring efforts. However, minor issues like typos (#242) and unused parameters in PR #155 suggest areas for improvement. The focus on small bug fixes rather than substantial refactoring might indicate accumulating technical debt. |
Technical Debt | 3 | Technical debt is moderate, with ongoing maintenance efforts but limited substantial refactoring or architectural changes. The accumulation of unresolved issues over time indicates potential underlying problems that need addressing to prevent future complications. |
Test Coverage | 3 | Test coverage appears moderate, with some validation efforts mentioned in pull requests like PR #192. However, the lack of detailed testing practices in the README and limited coverage on new features suggest potential gaps in automated testing. |
Error Handling | 2 | Error handling is relatively robust, as seen in efforts to resolve specific bugs like Nonetype errors in PR #192. However, ongoing integration challenges with external libraries (#231) highlight areas where error handling could be improved. |
Recent GitHub issue activity for the STORM project shows a mix of bug reports, feature requests, and questions. Notably, there are several issues related to UI/UX problems on mobile devices (#241) and integration challenges with various APIs and models (#171, #133). A recurring theme is the need for better documentation and support for contributors (#93), as well as requests for multilingual support (#170, #169).
Missing Critical Information: Some issues highlight missing or unclear documentation, such as contribution guidelines (#93) and API integration details (#120).
Unaddressed Urgent Issues: Mobile UI issues (#241) remain unresolved, affecting user experience.
Common Themes: Many issues focus on integration with different language models and retrieval systems, indicating a strong community interest in expanding STORM's capabilities.
Human-AI Collaboration: Several issues discuss the collaborative features of Co-STORM, reflecting its significance in the project's evolution.
#242: [BUG] Minor Typo in engine.py
Line 261 – Incorrect Parameter Annotation
#241: [BUG] Frontend Mobile UI issues
#170: [Umbrella Issue] Multilingual Support
#239: [Question] How to change article style
These issues illustrate ongoing efforts to enhance STORM's functionality and accessibility, with particular attention to multilingual capabilities and user interface improvements.
article_generation.py
and storm_dataclass.py
. It includes minor code adjustments and adds several demo files.PR #218: Fixed a broken README link. A minor but necessary update for documentation accuracy.
PR #213: Refactored entity extraction logic, improving code efficiency.
PR #185: Integrated Co-STORM features, marking significant progress towards collaborative knowledge curation.
Long-standing Open PRs: Several open PRs have been pending for extended periods, suggesting possible bottlenecks in the review process or prioritization issues.
Closed Without Merge: Some PRs were closed without merging, often due to redirection or scope adjustments. This can lead to duplicated efforts if not managed carefully.
Active Maintenance and Updates: The project shows active maintenance with regular bug fixes and feature enhancements, reflecting a dynamic development environment.
Community Engagement: With numerous contributors and varied enhancements, the project benefits from an engaged community but may face challenges in managing diverse contributions effectively.
Documentation and Communication: Ensuring clear communication about pending updates (e.g., major changes affecting README) can help manage contributor expectations and streamline integration processes.
Overall, the project demonstrates robust development activity with ongoing improvements and community contributions, though it could benefit from more streamlined processes for handling long-standing pull requests.
knowledge_storm/rm.py
requests
for HTTP requests and os
for environment variables. It uses backoff
for retry logic, which is a good practice for handling transient errors.YouRM
, BingSearch
, etc.) are defined to handle different retrieval mechanisms. Each class extends dspy.Retrieve
, indicating a consistent design pattern.forward
are well-documented, explaining their purpose and return values clearly.is_valid_source
) is efficient.examples/storm_examples/run_storm_wiki_gpt.py
knowledge_storm/collaborative_storm/engine.py
CollaborativeStormLMConfigs
and CoStormRunner
.@dataclass
) for configuration management, which simplifies initialization and enhances readability.knowledge_storm/interface.py
InformationTable
, Retriever
, and modules related to article generation. This promotes extensibility.@abstractmethod
) enforces implementation in derived classes, ensuring consistency across different modules.requirements.txt
dspy_ai==2.4.9
), ensuring compatibility.The codebase demonstrates a high level of organization and adherence to software engineering best practices. It uses modular design patterns extensively, allowing easy customization and extension. Documentation within the code is thorough, aiding in understanding complex interactions between components. However, given the size of some files, consider refactoring into smaller modules to enhance maintainability further.
Yijia Shao (shaoyijia)
main
and dev-chinese
branches.Eminem (zhoucheng89)
Patrick (patrick@cryptolock.ai)
Adam Montgomery (montasaurus)
Hagen Hübel (itinance)
Bug Fixes and Maintenance: Recent activities have focused on fixing bugs, such as the SerperRM issue, and maintaining documentation by correcting typos and broken links.
Feature Enhancements: There is ongoing work to enhance features, such as adding support for AzureAISearch and refining existing modules like SerperRM.
Collaboration: Yijia Shao appears to be a central figure in collaboration, working closely with other developers like zhoucheng89.
Branch Activity: The main
branch is the primary focus for recent commits, with some activity in feature-specific branches like dev-chinese
.
The development team is actively engaged in both bug fixes and feature enhancements. Collaboration is evident among team members, particularly around resolving critical bugs. The focus seems to be on ensuring stability while incrementally adding new features.