‹ Reports
The Dispatch

OSS Report: Zipstack/unstract


Unstract Faces Persistent Configuration Challenges Amidst Active Development

Unstract, a no-code platform for automating document processing using Large Language Models (LLMs), continues to face configuration and workflow execution challenges despite active development efforts. The project, backed by Zipstack, aims to simplify complex document handling through user-friendly APIs and ETL pipelines.

Recent activities highlight ongoing issues with environment configurations and integration with external services, as evidenced by unresolved critical issues such as login problems and API errors. These challenges suggest potential gaps in the platform's deployment process or documentation, impacting user experience.

Recent Activity

Recent issues and pull requests (PRs) indicate a focus on usability and integration challenges. Notable open issues include #595, which addresses persistent errors in API workflows due to missing files, and #551, which highlights difficulties in workflow step execution. These issues collectively point to stability concerns within the platform.

Development Team and Recent Activity

  1. Tahier Hussain

    • Recent Commits: 14 commits
    • Contributions: Optimized prompt output API calls; collaborated on various fixes.
  2. Chandrasekharan M

    • Recent Commits: 33 commits
    • Contributions: Improved error handling; engaged in code refactoring.
  3. Harini Venkataraman

    • Recent Commits: 16 commits
    • Contributions: Enhanced database connections; improved prompt studio features.
  4. Deepak K

    • Recent Commits: 13 commits
    • Contributions: Bumped SDK versions; enhanced prompt output handling.
  5. Muhammad Ali (ali)

    • Recent Commits: 14 commits
    • Contributions: Improved database connection logic; fixed workflow bugs.
  6. Rahul Johny

    • Recent Commits: 8 commits
    • Contributions: Enhanced authentication controller; improved error handling.
  7. Ritwik G

    • Recent Commits: 1 commit
    • Contributions: Minor environment settings adjustments.
  8. Jagadeeswaran Zipstack (jagadeeswaran-zipstack)

    • Recent Commits: 10 commits
    • Contributions: Enhanced UI components for document parsing.
  9. Vishnus Zipstack (vishnuszipstack)

    • Recent Commits: 12 commits
    • Contributions: Added MRQ functionality; updated frontend routing.
  10. Kirtiman Mishra (kirtimanmishrazipstack)

    • Recent Commits: 36 commits
    • Contributions: Focused on automation and bug fixes across services.
  11. Gayathri (gaya3-zipstack)

    • Recent Commits: 6 commits
    • Contributions: Improved token counting and logging.
  12. Athul (athul-rs)

    • Recent Commits: 4 commits
    • Contributions: Documentation updates and feature flag adjustments.

Of Note

Quantified Reports

Quantify Issues



Recent GitHub Issues Activity

Timespan Opened Closed Comments Labeled Milestones
7 Days 0 0 0 0 0
30 Days 3 3 18 0 1
90 Days 11 9 54 0 1
All Time 14 10 - - -

Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.

Quantify commits



Quantified Commit Activity Over 30 Days

Developer Avatar Branches PRs Commits Files Changes
Deepak K 1 12/13/0 13 52 7453
ali 6 10/7/1 14 85 5020
Chandrasekharan M 7 25/22/0 33 159 4706
vishnuszipstack 9 9/3/0 12 43 2265
harini-venkataraman 3 13/12/0 16 37 1824
Tahier Hussain 4 14/13/0 14 35 1762
Kirtiman Mishra 7 6/2/1 36 27 1319
github-actions[bot] 1 0/0/0 4 3 1110
Hari John Kuriakose (hari-kuriakose) 1 1/0/0 4 28 859
jagadeeswaran-zipstack 5 4/1/0 10 23 588
Rahul Johny 4 7/4/1 8 17 484
Gayathri 3 1/1/0 6 20 408
pre-commit-ci[bot] 4 0/0/0 5 12 106
Ritwik G 1 1/1/0 1 5 59
Athul 2 2/1/1 4 6 50
Jaseem Jas 1 0/0/0 2 5 14
Pixee OSS Assistant (pixeeai) 0 1/0/0 0 0 0

PRs: created by that dev and opened/merged/closed-unmerged during the period

Detailed Reports

Report On: Fetch issues



Recent Activity Analysis

The GitHub repository for Unstract has recently experienced a moderate level of activity, with 4 open issues currently under discussion. Notably, several issues highlight ongoing challenges users face with the platform, particularly around configuration and workflow execution. A recurring theme is the difficulty in setting up the application correctly, especially concerning environment configurations and integration with external services.

Several issues exhibit significant interaction among users and maintainers, indicating a collaborative effort to troubleshoot problems. However, there are also signs of unresolved critical issues, such as login problems and API errors that persist despite updates. This suggests a potential gap in the robustness of the platform's deployment process or documentation.

Issue Details

Most Recently Created Issues

  1. Issue #414: About access unstract web from other computer

    • Priority: Good first issue
    • Status: Open
    • Created: 86 days ago
    • Updated: 2 days ago
  2. Issue #595: fix: [ISSUE] "Error fetching data and indexing: [Errno 2] No such file or directory: '/data/INFILE.pdf'" in API workflow execution

    • Priority: Bug
    • Status: Open
    • Created: 26 days ago
    • Updated: 22 days ago
  3. Issue #551: fix: Unable to perform step execution in a workflow

    • Priority: Bug
    • Status: Open
    • Created: 43 days ago
  4. Issue #493: support for Lancedb as vectordb

    • Priority: Enhancement
    • Status: Open
    • Created: 60 days ago

Most Recently Updated Issues

  1. Issue #595 (last updated 22 days ago): Users are encountering persistent errors when executing API workflows due to missing files, indicating potential issues with the documentation or setup instructions.

  2. Issue #414 (last updated 2 days ago): Ongoing discussions about accessing the web interface from different hosts suggest that configuration guidance may be lacking.

  3. Issue #551 (last updated recently): Users report difficulties executing steps within workflows, which could hinder user experience significantly.

  4. Issue #469 (closed recently): Highlighted a critical bug related to login functionality that was resolved but reflects underlying issues with user onboarding.

Summary of Key Issues

The current open issues indicate a mix of bugs and enhancement requests, primarily focused on usability and integration challenges. The presence of multiple unresolved bugs related to API functionality and user authentication raises concerns about the platform's stability and user experience. The community's engagement in providing solutions and workarounds demonstrates active participation but also highlights the need for clearer documentation and more robust testing before releases.

Report On: Fetch pull requests



Overview

The analysis of the pull requests (PRs) for the Zipstack/unstract repository reveals a significant focus on enhancing functionality, optimizing performance, and addressing bugs within the platform. The current state shows 39 open PRs, with a mix of feature enhancements, bug fixes, and architectural changes aimed at improving user experience and system reliability.

Summary of Pull Requests

  1. PR #693: FIX: Optimized the Prompt Output API Calls

    • State: Open
    • Created: 0 days ago
    • Significance: Introduces a more efficient API call structure to reduce redundant calls when fetching prompt outputs, potentially breaking existing features due to architectural changes.
  2. PR #689: workflow manager workflow v2 delta changes

    • State: Open
    • Created: 1 day ago
    • Significance: Implements delta changes for multi-tenancy in the workflow manager without breaking existing features.
  3. PR #688: workflow manager endpoint v2 delta changes

    • State: Open
    • Created: 1 day ago
    • Significance: Similar to #689, this PR addresses endpoint changes for multi-tenancy but has failed quality checks due to high code duplication.
  4. PR #687: delta changes for api v2 multitenancy

    • State: Open
    • Created: 1 day ago
    • Significance: Aims to enhance API functionality for multi-tenancy with no expected breakage.
  5. PR #686: delta changes for pipeline v2 multitenancy

    • State: Open
    • Created: 1 day ago
    • Significance: Focuses on pipeline adjustments for multi-tenancy, ensuring no existing features are broken.
  6. PR #684: Added README file along with the API Deployment code

    • State: Open
    • Created: 2 days ago
    • Significance: Enhances documentation without impacting existing features.
  7. PR #680: multitenancy v2 delta changes for pr 535

    • State: Open
    • Created: 2 days ago
    • Significance: Addresses previously identified issues in PR #535 with no expected breakage.
  8. PR #678: fix/test-azure-gcs-wf-etl

    • State: Open
    • Created: 3 days ago
    • Significance: Implements error handling for Azure file systems without breaking existing functionality.
  9. PR #676: index-in-oss-without-plugin

    • State: Open
    • Created: 3 days ago
    • Significance: Addresses indexing issues but has failed quality checks due to high duplication rates.
  10. PR #675: prompt-studio helper changes for v2 porting

    • State: Open
    • Created: 3 days ago
    • Significance: Ports previous changes to version 2, with potential impacts on existing features.
  11. Additional PRs continue in a similar vein, focusing on improvements, fixes, and optimizations across various aspects of the platform.

Analysis of Pull Requests

The current set of open pull requests reflects a strong emphasis on improving the architecture and performance of the Unstract platform. Notably, many PRs are related to enhancing multi-tenancy capabilities, which is crucial for scaling the application to support multiple users or organizations effectively. This shift indicates a strategic direction towards accommodating a broader user base while maintaining performance and reliability.

Common Themes

  • Multi-Tenancy Enhancements: A significant number of PRs (e.g., #689, #688, #687) focus on implementing or refining multi-tenancy features. This is indicative of an ongoing effort to make Unstract more robust and scalable.
  • Performance Optimizations: Several PRs aim at optimizing API calls (e.g., #693) and reducing redundancy in database interactions (e.g., #678). These enhancements are critical as they directly impact user experience by reducing latency and improving responsiveness.
  • Documentation Improvements: The addition of README files and enhancements to existing documentation (e.g., PR #684) suggest an awareness of the importance of clear communication regarding usage and functionality, especially as new features are introduced.
  • Error Handling and Logging Improvements: Multiple PRs address error handling (e.g., #678, #690), which is essential for maintaining system integrity and providing users with meaningful feedback during failures.

Anomalies

  • Some PRs have encountered quality gate failures due to high duplication rates (e.g., PRs #689 and #688). This could indicate a need for better code review practices or refactoring efforts before merging.
  • The presence of multiple open PRs from the same contributor within a short timeframe may lead to merge conflicts if not managed carefully.

Lack of Recent Merge Activity

While there is significant activity in terms of open pull requests, there are also many closed ones that indicate ongoing development efforts. However, it is essential to ensure that these merges do not introduce regressions or conflicts with ongoing work in open PRs.

Conclusion

The current landscape of pull requests in the Zipstack/unstract repository showcases an active development environment focused on enhancing functionality through multi-tenancy improvements, performance optimizations, and better error handling mechanisms. Continued attention to quality control measures will be vital in ensuring that these enhancements do not inadvertently disrupt existing functionalities as the project evolves.

Report On: Fetch commits



Repo Commits Analysis

Development Team and Recent Activity

Team Members and Their Recent Activities

  1. Tahier Hussain

    • Recent Commits: 14 commits
    • Key Contributions:
    • Fixed issues in displaying cost for prompt runs.
    • Implemented optimizations for prompt output API calls.
    • Collaborated with multiple team members on various fixes and enhancements.
    • Collaborations: Worked closely with Rahul Johny and others on multiple features.
  2. Chandrasekharan M

    • Recent Commits: 33 commits
    • Key Contributions:
    • Addressed database connection issues and improved error handling.
    • Enhanced logging for workflow execution.
    • Engaged in significant refactoring and optimization of existing code.
    • Collaborations: Frequently co-authored with Harini Venkataraman and others.
  3. Harini Venkataraman

    • Recent Commits: 16 commits
    • Key Contributions:
    • Focused on fixing database connections and enhancing subquestion retrieval.
    • Worked on improvements related to the prompt studio.
    • Collaborations: Collaborated with Deepak K and others.
  4. Deepak K

    • Recent Commits: 13 commits
    • Key Contributions:
    • Bumped SDK versions and made various minor fixes across the platform.
    • Worked on enhancements for prompt output handling.
    • Collaborations: Co-authored several commits with other team members.
  5. Muhammad Ali (ali)

    • Recent Commits: 14 commits
    • Key Contributions:
    • Improved connection retry logic for database services.
    • Engaged in refactoring and fixing bugs related to workflows.
    • Collaborations: Worked alongside Chandrasekharan M and others.
  6. Rahul Johny

    • Recent Commits: 8 commits
    • Key Contributions:
    • Focused on enhancing the authentication controller and addressing various bugs.
    • Contributed to improvements in error handling across services.
    • Collaborations: Frequently worked with Tahier Hussain.
  7. Ritwik G

    • Recent Commits: 1 commit
    • Key Contributions: Minor contributions related to environment settings.
  8. Jagadeeswaran Zipstack (jagadeeswaran-zipstack)

    • Recent Commits: 10 commits
    • Key Contributions:
    • Enhanced UI components within the platform, particularly around document parsing and prompt cards.
    • Collaborations: Worked with multiple team members on UI-related tasks.
  9. Vishnus Zipstack (vishnuszipstack)

    • Recent Commits: 12 commits
    • Key Contributions:
    • Engaged in adding new features related to MRQ (Multi-Request Queue) functionality.
    • Contributed to routing updates in the frontend.
    • Collaborations: Collaborated with Deepak K and others.
  10. Kirtiman Mishra (kirtimanmishrazipstack)

    • Recent Commits: 36 commits
    • Key Contributions:
    • Focused on PDMLock automation, error handling, and various bug fixes across services.
    • Collaborations: Worked closely with multiple team members on automation tasks.
  11. Gayathri (gaya3-zipstack)

    • Recent Commits: 6 commits
    • Key Contributions:
    • Addressed issues related to token counting and logging improvements.
    • Collaborations: Collaborated with Chandrasekharan M on various tasks.
  12. Athul (athul-rs)

    • Recent Commits: 4 commits
    • Key Contributions:
    • Minor contributions focusing on documentation and feature flags.

Patterns, Themes, and Conclusions

  • The development team is actively engaged in fixing bugs, optimizing existing features, and implementing new functionalities, particularly around database connections, prompt handling, and user interface enhancements.
  • Collaboration is a significant theme, as many commits are co-authored, indicating a strong culture of teamwork within the group.
  • The focus on improving logging, error handling, and overall user experience suggests a commitment to maintaining a high-quality codebase while enhancing usability for end-users.
  • The recent activities reflect an ongoing effort to address technical debt through refactoring efforts while also pushing new features that align with user needs and project goals.

Overall, the team's recent activities demonstrate a proactive approach to software development, emphasizing quality, collaboration, and continuous improvement.