‹ Reports
The Dispatch

OSS Report: infiniflow/ragflow


RAGFlow Faces Critical Bugs in Core Functionalities Amidst Active Development

RAGFlow, an open-source Retrieval-Augmented Generation engine, aims to enhance document understanding using Large Language Models. It faces critical issues in embedding models and document parsing, affecting core functionalities.

Recent Activity

Recent issues highlight significant challenges with embedding models (#2527, #2506) and parsing errors (#2519, #2527). These indicate potential instability in key features. Additionally, user interface inconsistencies (#2514) suggest problems with language settings.

Development Team and Recent Contributions

  1. Kevin Hu (KevinHuSh)

    • Commits: 52
    • Refactored xinference and retrieval of multi-turn conversation.
  2. Alvin Cage (AlvinCage)

    • Commits: 1
    • Updated README_zh.md.
  3. Chenbing (muzilib)

    • Commits: 2
    • Fixed errors in user settings and API key generation.
  4. Yungongzi (yungongzi)

    • Commits: 1
    • Fixed API key generation error for VolcEngine.
  5. Fachuan Bai (baifachuan)

    • Commits: 4
    • Worked on new features including storage icon display.
  6. Feiue (liuhua)

    • Commits: 12
    • Contributed to session management and document SDK updates.
  7. JobSmithManipulation

    • Commits: 8
    • Focused on performance improvements related to document SDK.
  8. Dada Hsueh (dadahsueh)

    • Commits: 1
    • Fixed a bug related to superuser password encoding.
  9. Michał Kiełtyka (Defozo)

    • Commits: 1
    • Minor documentation updates.
  10. Writinwaters

    • Commits: 9
    • Engaged in documentation updates and minor fixes.
  11. Guoyuhao2330 (lidp)

    • Commits: 19
    • Contributed to new features for data retrieval.
  12. Hangters (黄腾)

    • Commits: 18
    • Added support for various cloud services and fixed bugs.

The team is actively addressing bugs and enhancing features, with Kevin Hu leading significant contributions.

Of Note

Quantified Reports

Quantify Issues



Recent GitHub Issues Activity

Timespan Opened Closed Comments Labeled Milestones
7 Days 39 8 44 3 1
14 Days 96 48 140 6 1
30 Days 189 90 326 13 1
All Time 1153 696 - - -

Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.

Quantify commits



Quantified Commit Activity Over 30 Days

Developer Avatar Branches PRs Commits Files Changes
balibabu 1 47/47/0 54 82 7952
黄腾 1 14/13/1 18 38 2557
JobSmithManipulation 1 11/8/3 8 16 2348
Valdanito (Valdanitooooo) 1 4/4/0 4 19 2318
liuhua 1 13/11/2 12 25 2269
Kevin Hu 1 46/45/1 52 46 965
lidp 1 16/15/1 19 24 854
Fachuan Bai 1 4/4/0 4 26 615
writinwaters 1 9/9/0 9 15 325
Wang Baoling 1 2/2/0 2 2 110
Zhichang Yu 1 2/1/0 1 7 92
Toro 1 2/2/0 2 2 12
dependabot[bot] 1 11/4/7 4 2 12
yangbo.zhou 1 1/1/0 1 1 6
Dada Hsueh 1 1/1/0 1 1 6
Michał Kiełtyka 1 1/1/0 1 3 5
LIU HAO 1 1/1/0 1 1 5
dearjane 1 0/0/0 1 1 4
zhuhao 1 0/0/0 1 1 4
wwwlll 1 2/1/1 1 1 4
_Chenbing 1 2/2/0 2 2 3
yungongzi 1 1/1/0 1 1 3
Andrey 1 0/0/0 1 1 3
Wang 1 1/1/0 1 1 3
Vitaliy Groshev 1 1/1/0 1 1 2
AlvinCage 1 1/1/0 1 1 2
Jia Chen 1 1/1/0 1 1 2
Yuhao Tsui (cyhasuka) 0 1/0/1 0 0 0
移山搬砖派 (AbbottKilig) 0 2/0/2 0 0 0
None (yixiang1120) 0 2/0/2 0 0 0
narendra (narendra-bluebash) 0 1/0/1 0 0 0

PRs: created by that dev and opened/merged/closed-unmerged during the period

Detailed Reports

Report On: Fetch issues



Recent Activity Analysis

The GitHub repository for RAGFlow has seen significant recent activity, with a total of 458 open issues. The latest issues span a variety of topics, including bugs, feature requests, and questions about functionality. Notably, there are recurring themes around embedding models, parsing errors, and user interface challenges.

Several issues highlight critical bugs that affect the core functionalities of the application, such as embedding errors and problems with document parsing. Additionally, there is a noticeable concern regarding the integration of various models and APIs, suggesting that users are facing difficulties in leveraging the full capabilities of RAGFlow.

Issue Details

Recent Issues

  1. Issue #2528: [Question]: Access interface of @login_required, always get unauthorized error 401

    • Priority: Low
    • Status: Open
    • Created: 0 days ago
  2. Issue #2527: [Bug]: I can get the result by search, but I can't get the answer by chatting with the same knowledge base

    • Priority: High
    • Status: Open
    • Created: 0 days ago
  3. Issue #2523: [Feature Request]: Integrates jina-embeddings-v3-a-frontier-multilingual-embedding-model

    • Priority: Medium
    • Status: Open
    • Created: 2 days ago
  4. Issue #2522: [Question]: How to make overlapping chunking?

    • Priority: Low
    • Status: Open
    • Created: 2 days ago
  5. Issue #2519: [Question]: Error at parsing files uploaded in demo

    • Priority: Medium
    • Status: Open
    • Created: 3 days ago
  6. Issue #2518: [Question]: Is there an update plan for the open-source deepdoc model on Hugging Face?

    • Priority: Low
    • Status: Open
    • Created: 3 days ago
  7. Issue #2516: [Feature Request]: Configurable for excel, html table or row based text

    • Priority: Medium
    • Status: Open
    • Created: 3 days ago
  8. Issue #2514: [Bug]: Initial language is English, but the UI is in Chinese

    • Priority: High
    • Status: Open
    • Created: 3 days ago
  9. Issue #2513: [Question]: Error at parsing files uploaded in demo

    • Priority: Medium
    • Status: Open
    • Created: 3 days ago
  10. Issue #2506: [Question]: Qwen2-72B-Instruct-GPTQ-Int4 of Xinference not listed in System model settings

    • Priority: Medium
    • Status: Open
    • Created: 4 days ago

Analysis of Themes and Complications

Common Themes:

  • Many issues revolve around embedding models and their integration (e.g., Issues #2527 and #2506), indicating potential instability or lack of clarity in how to utilize these features effectively.
  • Parsing errors are frequently reported (e.g., Issues #2527 and #2519), suggesting that users are encountering significant challenges when attempting to process documents.
  • User interface inconsistencies (e.g., Issue #2514) indicate that language settings may not be functioning as intended, which could hinder usability for non-English speakers.

Notable Anomalies:

  • The presence of unresolved critical bugs (like those affecting parsing and embedding) could impact user trust and satisfaction.
  • The high volume of open issues (458) may overwhelm contributors and maintainers, potentially leading to delays in addressing urgent problems.

Overall, while RAGFlow shows promise with its rich feature set, the current state of unresolved issues suggests a need for focused efforts on stability and user experience improvements.

Report On: Fetch pull requests



Overview

The analysis of the pull requests (PRs) for the RAGFlow project reveals a dynamic development environment with a focus on continuous improvement, feature enhancement, and community engagement. The project has a significant number of closed PRs, indicating active maintenance and development efforts.

Summary of Pull Requests

Open Pull Requests

  • PR #2525: Rework Dockerfile.scratch
    • Significance: A major refactor of the Dockerfile to optimize image size and dependency management.
    • Notable Changes: Introduction of multi-stage Docker builds, removal of conda, replacement of pip with poetry, addition of missing dependencies, and fixing package version conflicts.

Closed Pull Requests

  • PR #2521: refine xinference

    • Significance: Refinement of the xinference module to address issue #1588.
    • Notable Changes: Adjustments in model initialization to ensure correct base URL formatting.
  • PR #2520: refine retrieval of multi-turn conversation

    • Significance: Enhancement of multi-turn conversation retrieval mechanisms to resolve issues #2362 and #2484.
    • Notable Changes: Modifications in dialog service to improve conversation handling.
  • PR #2517: make excel parsing configurable

    • Significance: Introduction of configurability in Excel parsing as per issue #2516.
    • Notable Changes: Conditional parsing logic based on configuration settings.
  • PR #2515: refactor(API): Split SDK class to optimize code structure

    • Significance: Refactoring of the API SDK class for better code organization and readability.
    • Notable Changes: Splitting of SDK functionalities into more granular classes and improving parameter validation messages.
  • PR #2511: rm key set in xinference

    • Significance: Bug fix related to issue #2492.
    • Notable Changes: Removal of hardcoded key setting in xinference initialization.
  • PR #2510: fix self deployed llm lost

    • Significance: Resolution of issue #2506 regarding self-deployed LLM visibility.
    • Notable Changes: Adjustments in LLM listing logic to include self-deployed models correctly.

Analysis of Pull Requests

The PRs reflect a robust development process characterized by:

  1. Active Maintenance and Feature Development: The frequency and variety of PRs indicate ongoing efforts to enhance RAGFlow's capabilities. Recent PRs focus on refining existing features, optimizing performance, and introducing new functionalities like configurable Excel parsing and improved multi-turn conversation handling.

  2. Community Engagement: Contributions from various developers suggest a vibrant community involvement. The quick turnaround from PR creation to closure/merging indicates an efficient review process, likely facilitated by active maintainers who are responsive to community contributions.

  3. Focus on Quality and Optimization: Several PRs aim at refactoring code for better structure, readability, and performance. This is evident from PRs like #2515, which splits SDK functionalities for clarity, and PRs addressing specific bugs or optimization opportunities (#2511, #2510).

  4. Adaptation to User Needs: The introduction of configurable options (e.g., Excel parsing) shows responsiveness to user feedback or requirements. This adaptability is crucial for maintaining relevance and usability in diverse application scenarios.

  5. Technical Challenges and Solutions: The presence of bug fixes (#2511, #2510) alongside feature enhancements highlights ongoing technical challenges that the development team is actively addressing. This is a normal part of software evolution but requires diligent effort to ensure stability alongside growth.

In conclusion, RAGFlow's development activity as reflected in these PRs demonstrates a healthy project lifecycle with active contributions aimed at enhancing functionality, optimizing performance, and ensuring quality through rigorous maintenance efforts.

Report On: Fetch commits



Repo Commits Analysis

Development Team and Recent Activity

Team Members and Recent Contributions

  1. Kevin Hu (KevinHuSh)

    • Commits: 52
    • Changes: 965 across 46 files
    • Recent Activities:
    • Refactored xinference and retrieval of multi-turn conversation.
    • Implemented several bug fixes and new features, including configurable Excel parsing and TTS support.
    • Collaborated with multiple team members on various PRs.
  2. Alvin Cage (AlvinCage)

    • Commits: 1
    • Changes: 2 across 1 file
    • Recent Activities:
    • Updated README_zh.md in collaboration with Kevin Hu.
  3. Chenbing (muzilib)

    • Commits: 2
    • Changes: 3 across 2 files
    • Recent Activities:
    • Fixed errors in user settings and API key generation.
  4. Yungongzi (yungongzi)

    • Commits: 1
    • Changes: 3 across 1 file
    • Recent Activities:
    • Fixed API key generation error for VolcEngine.
  5. Fachuan Bai (baifachuan)

    • Commits: 4
    • Changes: 615 across 26 files
    • Recent Activities:
    • Worked on new features including storage icon display and variable renaming for storage.
  6. Feiue (liuhua)

    • Commits: 12
    • Changes: 2269 across 25 files
    • Recent Activities:
    • Contributed to multiple new features and bug fixes, including session management and document SDK updates.
  7. JobSmithManipulation

    • Commits: 8
    • Changes: 2348 across 16 files
    • Recent Activities:
    • Focused on performance improvements and bug fixes related to document SDK.
  8. Dada Hsueh (dadahsueh)

    • Commits: 1
    • Changes: 6 across 1 file
    • Recent Activities:
    • Fixed a bug related to superuser password encoding.
  9. Michał Kiełtyka (Defozo)

    • Commits: 1
    • Changes: 5 across 3 files
    • Recent Activities:
    • Minor updates related to documentation.
  10. Writinwaters

    • Commits: 9
    • Changes: 325 across 15 files
    • Recent Activities:
    • Engaged in documentation updates and minor fixes.
  11. Guoyuhao2330 (lidp)

    • Commits: 19
    • Changes: 854 across 24 files
    • Recent Activities:
    • Contributed significantly to new features, including components for data retrieval.
  12. Hangters (黄腾)

    • Commits: 18
    • Changes: 2557 across 38 files
    • Recent Activities:
    • Focused on adding support for various cloud services and fixing bugs.
  13. Other contributors include Chunshan-Theta, fashioncj, LiuHao-1443, yangboz, dearjane, netandreus, hwzhuhao, Valdanitooooo, with varying contributions primarily focused on bug fixes and feature enhancements.

Patterns and Themes

  • The majority of recent activities focus on bug fixes, feature enhancements, and refactoring efforts aimed at improving code structure and performance.
  • Kevin Hu is the most active contributor, indicating a leadership role in development.
  • Collaboration is evident among team members, particularly in PRs where multiple authors are noted.
  • A significant emphasis is placed on improving the API structure and documentation, suggesting ongoing efforts to enhance usability for developers.
  • The introduction of new features such as TTS support and improved document handling reflects a commitment to expanding the project's capabilities.
  • The presence of numerous open issues indicates active engagement from the community but may also highlight challenges in development or resource allocation.

Overall, the development team is actively engaged in enhancing the RAGFlow project through collaborative efforts focused on both fixing existing issues and implementing new features.