‹ Reports
The Dispatch

OSS Report: Dataherald/dataherald


Development Stagnation as Dataherald Project Awaits New Contributions

Dataherald, an open-source natural language-to-SQL engine, aims to democratize data access by allowing non-technical users to query SQL databases using natural language. Despite its promising utility for enterprise-level question answering, recent development activity has been minimal, with the last significant contributions occurring over a month ago.

Recent Activity

Recent issues and pull requests (PRs) highlight ongoing challenges and improvements in database connectivity and error handling. Issues such as #518 and #505 indicate persistent problems with database connections, while PRs like #513 address specific bugs in the API. The development team has focused on dependency updates to enhance security and performance, as seen in PRs #521 and #517. However, the lack of recent commits suggests a need for renewed focus and contributions.

Development Team Activities

  1. dependabot[bot]

    • Automated dependency updates across various branches.
    • Last activity: 26 days ago.
  2. Ashvin (ashvin-a)

    • Added documentation for environment variables.
    • Last activity: 39 days ago.
  3. Amir A. Zohrenejad (aazo11)

    • Updated .env.example file and fixed formatting issues.
    • Last activity: 72 days ago.
  4. Daniel Martin (daniel309)

    • Fixed sorting table relevance scores.
    • Last activity: 56 days ago.
  5. tecz

    • Fixed a typo in the example .env file.
    • Last activity: 57 days ago.
  6. Dennis Paul (dnnspaul)

    • Made S3 parameters more dynamic.
    • Last activity: 74 days ago.
  7. Ikko Eltociear Ashimine (eltociear)

  8. Dishen (DishenWang2023)

    • Worked on Stripe integration fixes and other enhancements.
    • Last activity: 132 days ago.
  9. Ryan Watts (rwatts3)

    • Worked on user authentication improvements.
    • Last activity: 89 days ago.
  10. Juan Valacco (valakJS)

    • Contributed to Docker setup and deployment workflows.
    • Last activity: 136 days ago.
  11. Mohammadreza Pourreza (MohammadrezaPourreza)

    • Addressed Azure OpenAI issues.
    • Last activity: 206 days ago.
  12. Juan Carlos José Camacho (jcjc712)

    • Added SQL Server support in Admin console.
    • Last activity: 201 days ago.
  13. Ainesh Pandey (dh-datateam-ainesh)

    • Added test scripts to monorepo.
    • Last activity: 181 days ago.

Of Note

Quantified Reports

Quantify commits



Quantified Commit Activity Over 30 Days

Developer Avatar Branches PRs Commits Files Changes
None (dependabot[bot]) 1 1/0/1 1 1 2

PRs: created by that dev and opened/merged/closed-unmerged during the period

Quantify Issues



Recent GitHub Issues Activity

Timespan Opened Closed Comments Labeled Milestones
7 Days 0 0 0 0 0
30 Days 0 0 0 0 0
90 Days 3 2 3 3 1
All Time 41 38 - - -

Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.

Detailed Reports

Report On: Fetch issues



Recent Activity Analysis

Recent GitHub issue activity for the Dataherald project shows a mix of open and closed issues, with a focus on bug fixes, feature requests, and user support. Notably, there are several issues related to database connectivity and error handling, indicating ongoing challenges in these areas. The project also sees active community engagement, with users contributing to discussions and solutions.

Anomalies and Themes

  • Database Connectivity Issues: Several issues (#518, #505, #407) highlight problems with database connections, particularly with specific characters in connection URIs or compatibility with certain database versions. This suggests a recurring theme of connectivity challenges that may require more robust error handling or documentation improvements.

  • Error Handling and Documentation: Issues like #471 and #443 point to gaps in error handling and documentation. Users have reported unclear error messages and missing documentation for certain features, indicating a need for better guidance and more informative error reporting.

  • Feature Requests for LLM Integration: There is significant interest in integrating custom and open-source LLMs (#439, #259), reflecting a broader trend towards customizable AI solutions. This aligns with the project's goal of supporting diverse AI models for SQL generation.

  • Community Contributions: The project benefits from active community involvement, as seen in discussions around feature enhancements (#449) and support for additional databases (#231). This engagement is crucial for the project's evolution and responsiveness to user needs.

Issue Details

Recently Created Issues

  • #518: High priority; Open; Created 53 days ago; Last updated 52 days ago. Issue with special characters in PostgreSQL connection URI causing errors.

Recently Updated Issues

  • #518: High priority; Open; Last updated 52 days ago. Continues to be a critical issue affecting database connectivity.

Notable Closed Issues

  • #505: Closed 81 days ago. Resolved issue where data sources could not be added via the UI.
  • #493: Closed 86 days ago. Addressed missing Apache 2.0 LICENSE file referenced in README.md.
  • #475: Closed 116 days ago. Fixed error when using the Create SQL generation API.

The issues reflect ongoing efforts to improve database connectivity, enhance error handling, and expand LLM integration capabilities. The project's active maintenance and community engagement are evident in the resolution of various technical challenges and feature requests.

Report On: Fetch pull requests



Overview

The provided data includes a list of open and closed pull requests (PRs) for the Dataherald project, an open-source natural language-to-SQL engine. The PRs cover various updates, bug fixes, and feature enhancements across multiple components of the project.

Summary of Pull Requests

Open Pull Requests

  1. #521: Bumps langchain-community from 0.0.25 to 0.2.9 in the services/engine. This update includes several minor and patch changes, such as adding new features, fixing bugs, and improving existing functionalities.

  2. #520: Updates next from 13.4.10 to 14.1.1 in the services/admin-console. This upgrade involves numerous changes and improvements to the Next.js framework.

  3. #517: Upgrades braces from 3.0.2 to 3.0.3 in the services/slackbot, addressing a vulnerability issue.

  4. #514: Updates ws from 7.5.9 to 7.5.10 in the services/slackbot, which includes a bug fix for a crash issue.

  5. #513: Fixes a broken PUT request for updating database connections in the API by correcting an attribute error.

  6. #501: Upgrades express from 4.18.2 to 4.19.2 in the services/slackbot, including security improvements and bug fixes.

  7. #500: Updates follow-redirects from 1.15.2 to 1.15.6 in the services/slackbot, enhancing security by dropping proxy authorization across hosts.

  8. #499: Bumps requests from 2.31.0 to 2.32.2 in the services/enterprise, addressing security vulnerabilities and improving performance.

  9. #498: Updates langchain from 0.0.230 to 0.1.0 in the services/enterprise, introducing new features and enhancements.

  10. #490: Upgrades pymysql from 1.1.0 to 1.1.1 in the services/engine, fixing a vulnerability related to SQL injection.

Closed Pull Requests

  1. #519: Added documentation for environment variables of the engine.

  2. #516: Introduced a new environment variable for specifying the embedding model in the engine.

  3. #515: Fixed sorting of table relevance scores in output.

  4. #511: Corrected a typo in the enterprise service's example .env file.

  5. #510: (Not merged) Attempted to bump langchain-community version but was superseded by #521.

  6. #509: (Not merged) Work-in-progress on adding a new semantic layer agent.

  7. #508: Updated .env.example.

  8. #507: Fixed regression in s3.py.

  9. #506: Made S3 parameters more dynamic for compatibility with alternatives like MinIO.

  10. #502: Fixed disabled functions for organization creation with Stripe integration.

Analysis of Pull Requests

The pull requests reflect ongoing efforts to maintain and enhance the Dataherald project, focusing on dependency updates, bug fixes, and feature additions across its various components.

Themes and Commonalities

A significant portion of the PRs involves updating dependencies to newer versions, primarily addressing security vulnerabilities and performance improvements (e.g., #521, #520, #517). These updates are crucial for maintaining software integrity and ensuring compatibility with other libraries and frameworks used within the project.

Another recurring theme is enhancing functionality through new features or improvements, such as adding support for different embedding models (#516) or improving sorting mechanisms (#515). These changes aim to refine user experience and expand the project's capabilities.

Anomalies and Disputes

Some PRs were not merged due to being superseded by newer updates or requiring further development (e.g., #510, #509). This indicates active project management where priorities are reassessed based on evolving requirements or better solutions emerging during development cycles.

Feature Development

The introduction of new environment variables (#516) and documentation updates (#519) highlights efforts to improve configurability and user guidance, making it easier for users to deploy and customize Dataherald according to their needs.

Security Focus

Several PRs address security concerns directly by updating vulnerable dependencies (#517, #501) or fixing potential issues within the codebase (#490). This focus on security ensures that Dataherald remains a reliable tool for enterprise-level applications where data integrity is paramount.

Overall, these pull requests demonstrate a well-managed open-source project with active contributions aimed at enhancing functionality, maintaining security standards, and improving user experience through thoughtful updates and documentation enhancements.

Report On: Fetch commits



Development Team and Recent Activity

Team Members and Activities

  1. Ashvin (ashvin-a)

    • Worked on adding documentation for environment variables of the engine.
    • Added a new environment variable for embedding model in the engine.
    • Last activity was 39 days ago.
  2. Amir A. Zohrenejad (aazo11)

    • Updated .env.example file, fixed formatting issues, and updated formatter to use the official black formatter.
    • Fixed regression in s3.py.
    • Last activity was 72 days ago.
  3. Daniel Martin (daniel309)

    • Fixed sorting table relevance scores.
    • Last activity was 56 days ago.
  4. tecz

    • Fixed a typo in the example .env file.
    • Last activity was 57 days ago.
  5. Dennis Paul (dnnspaul)

    • Worked on making S3 parameters more dynamic.
    • Last activity was 74 days ago.
  6. Ikko Eltociear Ashimine (eltociear)

    • Fixed a typo in agent_prompts.py.
    • Updated README.md for authentication typo.
    • Last activity was 95 days ago.
  7. Dishen (DishenWang2023)

    • Worked on multiple tasks including fixing stripe disabled functions, adding disable stripe option, updating enterprise env var example, and improving env.example files.
    • Added organization dump file migration script and restored OpenAI key feature.
    • Involved in various other bug fixes and feature enhancements over several months.
    • Last activity was 132 days ago.
  8. Ryan Watts (rwatts3)

    • Worked on user authentication using sub in auth service.
    • Last activity was 89 days ago.
  9. Juan Valacco (valakJS)

    • Involved in numerous activities including Docker setup, multi-schema support, enhancing front-end docs, and fixing various deployment issues.
    • Active across many areas of the project with significant contributions to infrastructure and deployment workflows.
    • Last activity was 136 days ago.
  10. Mohammadreza Pourreza (MohammadrezaPourreza)

    • Worked on fixing Azure OpenAI issues and updating the engine with the newest changes.
    • Last activity was 206 days ago.
  11. Juan Carlos José Camacho (jcjc712)

    • Added support for SQL Server in Admin console, fixed Redshift dialect, and made various other backend improvements.
    • Last activity was 201 days ago.
  12. Ainesh Pandey (dh-datateam-ainesh)

    • Added test scripts and support files to monorepo.
    • Last activity was 181 days ago.
  13. dependabot[bot]

    • Automated dependency updates across various branches including updates to langchain-community, next, braces, and ws.
    • Recent activities include a commit 26 days ago.

Patterns, Themes, and Conclusions

  • The team has been actively involved in both feature development and bug fixes across multiple areas of the project, indicating a balanced focus on both innovation and maintenance.
  • There is a strong emphasis on infrastructure improvements, particularly around Docker setups, environment variables, and deployment workflows, suggesting an ongoing effort to streamline development processes and improve deployment efficiency.
  • Collaboration is evident through co-authored commits, indicating teamwork across different features and bug fixes.
  • Dependency management is being handled through automated tools like Dependabot, ensuring that the project stays up-to-date with external libraries and frameworks.
  • The project has seen a wide range of activities from documentation updates to complex feature implementations like multi-schema support, highlighting its comprehensive development approach.