Haystack, an end-to-end framework for building applications powered by large language models, has recently seen a notable increase in development activity, particularly around enhancing asynchronous capabilities and improving documentation. The project is spearheaded by deepset-ai and is widely used for tasks such as retrieval-augmented generation and semantic search.
Recent efforts have concentrated on resolving compatibility issues, enhancing user experience through better documentation, and introducing asynchronous execution in components to improve performance. The development team has been actively addressing bugs, implementing new features, and refining existing functionalities to ensure the framework remains robust and user-friendly.
Recent issues and pull requests highlight a concerted effort to address compatibility concerns and enhance functionality. Notable issues include #8284 regarding dependency conflicts with ChromaDB and #8280, which addresses a bug allowing invalid input types in components. These issues indicate ongoing challenges with dependency management and component reliability.
The development team has been active in both feature development and maintenance tasks:
sentence_window_retriever.py
with linting fixes; contributed to DocumentBuilder
branch.min_top_k
feature; extensive refactoring.ChatPromptBuilder
.Overall, Haystack's recent activities reflect a dynamic development phase focused on performance improvements, feature expansion, and community-driven enhancements, positioning it well for continued growth and adaptation in AI-driven applications.
Timespan | Opened | Closed | Comments | Labeled | Milestones |
---|---|---|---|---|---|
7 Days | 23 | 11 | 28 | 6 | 1 |
30 Days | 103 | 87 | 93 | 47 | 3 |
90 Days | 241 | 184 | 317 | 83 | 6 |
1 Year | 326 | 200 | 482 | 89 | 7 |
All Time | 3479 | 3345 | - | - | - |
Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.
Developer | Avatar | Branches | PRs | Commits | Files | Changes |
---|---|---|---|---|---|---|
David S. Batista | 3 | 4/2/1 | 28 | 18 | 1370 | |
Silvano Cerza | 5 | 5/4/0 | 16 | 21 | 1350 | |
Madeesh Kannan | 3 | 2/2/0 | 8 | 15 | 1086 | |
Stefano Fiorucci | 3 | 9/8/0 | 10 | 29 | 856 | |
Vladimir Blagojevic | 3 | 5/3/2 | 14 | 14 | 830 | |
Daria Fokina | 1 | 17/17/0 | 17 | 20 | 812 | |
Agnieszka Marzec | 1 | 17/18/0 | 18 | 20 | 694 | |
Sebastian Husch Lee | 1 | 2/2/0 | 2 | 12 | 382 | |
Amna Mubashar | 3 | 4/3/1 | 4 | 9 | 244 | |
Nicola Procopio | 1 | 1/2/0 | 2 | 8 | 165 | |
Mo Sriha (medsriha) | 1 | 1/0/0 | 4 | 3 | 131 | |
Tim Wellbrock | 1 | 1/1/0 | 1 | 3 | 126 | |
Corentin Meyer | 1 | 0/2/0 | 2 | 6 | 88 | |
Jon Strutz | 1 | 1/1/0 | 1 | 4 | 57 | |
Marie-Luise Klaus | 1 | 2/2/0 | 2 | 6 | 50 | |
Tobias Wochinger | 1 | 1/1/0 | 1 | 2 | 6 | |
dependabot[bot] | 1 | 2/2/0 | 2 | 3 | 6 | |
Souf G | 1 | 1/1/0 | 1 | 1 | 2 | |
Haystack Bot | 1 | 1/1/0 | 1 | 1 | 2 | |
Ulises M (lbux) | 0 | 1/0/0 | 0 | 0 | 0 | |
None (jlonge4) | 0 | 1/0/1 | 0 | 0 | 0 | |
None (jpatra72) | 0 | 1/0/0 | 0 | 0 | 0 | |
Ikko Eltociear Ashimine (eltociear) | 0 | 1/0/0 | 0 | 0 | 0 | |
Carlos Fernández (CarlosFerLo) | 0 | 1/0/1 | 0 | 0 | 0 | |
keval dekivadiya (kevaldekivadiya2415) | 0 | 1/0/1 | 0 | 0 | 0 |
PRs: created by that dev and opened/merged/closed-unmerged during the period
The GitHub repository for the Haystack project has seen a notable uptick in activity, with 134 open issues currently being tracked. Recent discussions indicate a focus on improving compatibility and functionality, particularly regarding integration with various models and document stores. A significant number of issues revolve around bugs, documentation improvements, and feature requests, reflecting an active engagement from the community.
Several issues highlight critical concerns, such as compatibility problems between different versions of dependencies (e.g., sentence-transformers
), which could lead to runtime errors. Additionally, there are multiple discussions about enhancing the documentation to clarify usage and implementation details for various components, indicating that users may be struggling with the current state of the documentation.
Issue #8284: ChromaDB stuck
farm-haystack
and haystack-ai
, leading to confusion about which version to use.Issue #8281: 🧪 Tools: experiment with different use cases
Issue #8280: component.set_input_types()
allows non-existing inputs
Issue #8279: SentenceWindowRetriever: option to return docs instead of merged text
Issue #8276: Option to enable structured outputs with OpenAI Generators
Issue #8255: OpenAIGenerator uses chat_completions endpoint causing errors.
Medium Priority Issues (P2):
Issue #8280: Bug related to setting input types incorrectly.
Documentation Issues: Several issues (e.g., #8262, #8261) focus on improving documentation clarity regarding component usage and expected parameters.
This analysis highlights both the challenges faced by users and the proactive steps being taken by contributors to enhance the framework's usability and functionality.
The analysis covers the latest pull requests (PRs) from the deepset-ai/haystack repository, focusing on their state, proposed changes, and significance within the context of the project. The repository currently has 15 open PRs and a history of closed PRs that demonstrate ongoing development and community engagement.
PR #8285: chore: update pipeline.py
pipeline.py
. PR #8283: initial import
PR #8279: feat: Extend core component machinery to support an async run method
PR #8256: fix: 1.x - nltk upgrade, use nltk.download('punkt_tab')
PR #8244: feat: Expose default_headers and add kwargs for Azure Client
PR #8233: feat: Add current date in UTC to PromptBuilder
PR #8193: feat: Adds support for zero-shot document classification (#7669)
PR #8176: feat: Add unsafe
init arg in ConditionalRouter
and OutputAdapter
PR #8079: feat: Added JSONToDocument component in converter components
Various PRs focused on cleaning up docstrings and improving documentation across different components (e.g., PRs #8229, #8219, etc.). These changes are crucial for maintaining clarity and usability of the codebase as it evolves.
The recent activity in the deepset-ai/haystack repository highlights several key themes:
Enhancements to Asynchronous Capabilities: The introduction of asynchronous methods (as seen in PR #8279) is a significant step towards improving performance, especially as applications scale and require non-blocking operations.
Dependency Management and Upgrades: Several PRs focus on upgrading dependencies (e.g., PR #8256). This is critical for maintaining security and compatibility with external libraries, particularly when dealing with NLP tools like NLTK that frequently update their APIs.
Feature Additions: New features such as zero-shot classification (PR #8193) and enhancements to existing components (like the Azure Client) reflect a commitment to expanding the functionality of Haystack, making it more versatile for users across different domains.
Documentation Improvements: A notable number of PRs are dedicated to cleaning up docstrings and enhancing documentation (e.g., PRs #8227, #8219). This is essential for fostering community contributions and ensuring that new users can effectively utilize the framework without extensive onboarding.
Community Engagement: The repository shows active discussions among contributors regarding best practices, such as handling deprecated methods (e.g., PR #8146) and ensuring backward compatibility (e.g., PR #8176). This collaborative spirit is vital for sustaining an open-source project.
Backwards Compatibility vs New Features: There is a balancing act between introducing new features and maintaining backward compatibility, as seen with the deprecation of certain methods while providing alternatives (e.g., PR #8206).
Testing Focus: Many recent PRs include unit tests or mention testing as part of their changes, indicating a strong emphasis on quality assurance within the development process.
In summary, the pull requests reflect a robust development cycle characterized by feature enhancement, dependency management, community collaboration, and a strong focus on documentation and testing practices. This positions Haystack well for future growth and adaptation in an evolving landscape of AI applications.
David S. Batista
sentence_window_retriever.py
with linting fixes and added release notes.DocumentBuilder
branch with multiple commits focusing on tests, refactoring, and cleaning up code.Souf G (gsouf)
Stefano Fiorucci (anakin87)
DOCXToDocument
JSON serializable.Jon Strutz (jonstrutz11)
Sebastian Husch Lee (sjrl)
min_top_k
feature to the TopPSampler.Daria Fokina (dfokina)
Madeesh Kannan (shadeMe)
Agnieszka Marzec (agnieszka-m)
Vladimir Blagojevic (vblagoje)
Silvano Cerza (silvanocerza)
Marie-Luise Klaus (faymarie)
ChatPromptBuilder
.Corentin Meyer (lambda-science)
Tim Wellbrock (twellck)
Haystack Bot
DOCXToDocument
and sentence_window_retriever
.Overall, the development team is actively engaged in maintaining a high level of productivity, ensuring that Haystack remains robust and adaptable to user needs while fostering community contributions.