The "Unstract" project by Zipstack is an open-source, no-code platform for intelligent document processing (IDP 2.0), leveraging large language models (LLMs) to create APIs and ETL pipelines. It supports integration with major LLM providers and vector databases, offering tools like Prompt Studio and Workflow Studio for automating document processes. The project is actively developed with a significant community presence, as evidenced by its GitHub activity.
Hari John Kuriakose
Gayathri
Chandrasekharan M
Praveen Kumar
Tahier Hussain
Timespan | Opened | Closed | Comments | Labeled | Milestones |
---|---|---|---|---|---|
7 Days | 0 | 0 | 0 | 0 | 0 |
30 Days | 3 | 1 | 8 | 1 | 1 |
90 Days | 10 | 8 | 21 | 2 | 1 |
All Time | 34 | 21 | - | - | - |
Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.
Developer | Avatar | Branches | PRs | Commits | Files | Changes |
---|---|---|---|---|---|---|
Hari John Kuriakose | ![]() |
2 | 0/0/0 | 2 | 4 | 4796 |
harini-venkataraman | ![]() |
3 | 2/2/1 | 6 | 23 | 3726 |
pre-commit-ci[bot] | ![]() |
4 | 0/0/0 | 6 | 11 | 3553 |
Praveen Kumar | ![]() |
3 | 2/1/1 | 8 | 19 | 1853 |
Chandrasekharan M | ![]() |
5 | 8/5/0 | 20 | 60 | 1522 |
Gayathri | ![]() |
4 | 3/2/0 | 16 | 48 | 660 |
Tahier Hussain | ![]() |
3 | 5/4/1 | 8 | 7 | 167 |
Athul | ![]() |
1 | 1/1/0 | 1 | 2 | 74 |
Hari John Kuriakose (hari-kuriakose) | 1 | 1/0/0 | 1 | 1 | 39 | |
vishnuszipstack | ![]() |
1 | 0/0/0 | 3 | 5 | 16 |
Ritwik G | ![]() |
2 | 0/0/0 | 3 | 5 | 13 |
jagadeeswaran-zipstack | ![]() |
1 | 1/1/0 | 1 | 1 | 2 |
Deepak K | ![]() |
0 | 0/0/0 | 0 | 0 | 0 |
PRs: created by that dev and opened/merged/closed-unmerged during the period
Risk | Level (1-5) | Rationale |
---|---|---|
Delivery | 3 | The project faces moderate delivery risks due to several unresolved issues and dependencies. Issues such as #1110 and #239 highlight challenges in cross-platform compatibility, which could delay delivery if not addressed. The backlog of unresolved issues and the complexity of certain pull requests, like the unified notifications system (#1089), further contribute to this risk. Additionally, the dependency on external services like Redis and Azure introduces potential points of failure that could impact delivery timelines. |
Velocity | 3 | The project's velocity is moderate, with a healthy level of commit activity but some bottlenecks in integration. Developers like Chandrasekharan M show high engagement, but the lack of merged pull requests suggests potential delays in code review or integration processes. The varied levels of contribution among developers and reliance on automated tools like pre-commit-ci[bot] also pose risks to maintaining consistent velocity. |
Dependency | 4 | The project has significant dependency risks due to its reliance on multiple external services and libraries. The use of Redis for message storage, Azure for cloud services, and various SDK updates introduce potential points of failure. The frequent version bumps in dependencies without comprehensive testing notes exacerbate this risk, as any issues with these dependencies could lead to failures in the system. |
Team | 3 | The team faces moderate risks related to communication and issue management. The low level of comments on issues and minimal labeling suggest potential communication challenges. However, the active maintenance and resolution of certain pull requests indicate effective collaboration in some areas. The presence of unresolved issues and bottlenecks in code review processes could lead to team burnout or conflict if not managed properly. |
Code Quality | 3 | Code quality is at moderate risk due to the complexity of recent pull requests and the lack of detailed testing documentation. While there are efforts to improve code quality through refactoring and optimizations, the absence of comprehensive testing details increases the likelihood of undetected bugs. Issues like document indexing (#1043) and workflow execution errors (#595) further highlight underlying stability problems that need addressing. |
Technical Debt | 4 | Technical debt is a significant risk due to recurring issues and complex migrations without automation. The backlog of unresolved issues and the need for schema migrations in several pull requests indicate accumulating technical debt. The reliance on manual processes for version upgrades and data migration (#1093) further contributes to this risk, potentially delaying delivery timelines if not addressed efficiently. |
Test Coverage | 4 | Test coverage poses a significant risk as many pull requests lack detailed testing notes or documentation. This gap increases the likelihood of undetected bugs affecting delivery timelines and code stability. The reliance on automated tools for maintaining code quality without comprehensive test coverage exacerbates this risk, as seen with recent SDK updates lacking thorough testing. |
Error Handling | 3 | Error handling is at moderate risk due to insufficient documentation and reliance on try-catch blocks for dynamic imports. While there are efforts to manage errors through hooks like useExceptionHandler, the lack of explicit error handling frameworks or libraries raises concerns about the effectiveness of error management strategies. Unresolved issues related to error handling (#1044) further highlight areas needing improvement. |
Recent GitHub issue activity for the Zipstack/unstract project shows a mix of bug reports, feature requests, and user inquiries. Notably, there are several issues related to installation and setup challenges, particularly on Windows and macOS environments (#1110, #239). There are also recurring themes around version upgrades and data migration (#1093), as well as issues with specific functionalities like document indexing and workflow execution (#1043, #595). The project seems to be actively maintained, with responses from developers addressing user concerns and providing updates or workarounds.
The ongoing activity suggests a responsive development team actively engaging with user-reported issues. However, certain recurring themes such as setup difficulties and integration challenges indicate areas where further improvements could enhance user experience.
TextChoices
, improving consistency.#1128: Version bump for unstract-sdk 0.57.0rc4
#1127: [FIX] Pass the right env value to the tool container while spawning
#1126: refactor: Dockerfile optimization to build faster
#1116: hotfix: UN-2107 MIME type validation for large files
Pending CLA Status:
Code Cleanup:
Version Management:
Testing and Quality Checks:
Documentation:
Overall, the project appears to be actively maintained with a focus on continuous improvement and optimization, as evidenced by recent PR activities.
backend/pdm.lock
default
and dev
groups indicates a separation between runtime and development dependencies.backend/pyproject.toml
>=3.9,<3.11.1
) is specific, likely due to compatibility issues with newer versions.backend/sample.env
DJANGO_SECRET_KEY
are included as placeholders; care should be taken to ensure these are not hard-coded in production environments.frontend/src/components/navigations/side-nav-bar/SideNavBar.jsx
useMemo
optimizes performance by avoiding unnecessary recalculations.frontend/src/components/navigations/top-nav-bar/TopNavBar.jsx
prompt-service/src/unstract/prompt_service/helper.py
plugins
) could be reconsidered for thread safety if this service is multi-threaded or multi-process.prompt-service/src/unstract/prompt_service/main.py
tools/classifier/src/config/properties.json
tools/text_extractor/src/config/properties.json
unstract/tool-registry/tool_registry_config/public_tools.json
Overall, the codebase demonstrates strong adherence to modern software engineering practices with clear separation of concerns across components. Dependency management through PDM ensures reproducibility while environment configurations are handled securely through .env
files. Both frontend and backend components show thoughtful design considerations around extensibility and maintainability.
Praveen Kumar (pk-zipstack)
pdm.lock
files.Tahier Hussain (tahierhussain)
Gayathri (gaya3-zipstack)
Chandrasekharan M (chandrasekharan-zipstack)
Harini Venkataraman (harini-venkataraman)
Athul (athul-rs)
Jagadeeswaran (jagadeeswaran-zipstack)
Hari John Kuriakose (hari-kuriakose)
Ritwik G (ritwik-g)
Vishnu S (vishnuszipstack)
Pre-commit-ci[bot]
Version Management: There is a strong focus on updating SDK versions and managing dependencies across multiple services, indicating ongoing maintenance and improvement efforts.
Frontend Enhancements: Several team members are actively working on improving the user interface and experience, suggesting a priority on usability and user engagement.
Backend Optimization: Efforts are being made to optimize backend processes, including Dockerfile optimizations for better caching and execution model enhancements for improved performance tracking.
Collaboration: Many commits are co-authored or involve merging from other branches, highlighting collaborative efforts within the team to integrate changes effectively.
Testing and Integration: There is an emphasis on testing, as seen from the setup of E2E tests and fixing unit test failures, ensuring reliability of the platform.
Overall, the development team is actively engaged in maintaining the platform's robustness while enhancing its features for better user experience and performance.