‹ OSS Reports
The Dispatch

NVIDIA Generative AI Examples Faces Dependency Challenges Amid Active Development

The NVIDIA Generative AI Examples repository has encountered notable dependency management issues, as reflected in recent user-reported errors and ongoing pull requests. This project serves as a resource for developers integrating NVIDIA's software into generative AI systems, offering workflows for RAG pipelines and model fine-tuning.

Recent Activity

Recent issues highlight recurring dependency conflicts and installation errors. Notable issues include #197 (TesseractNotFoundError) and #196 (document ingestion failure due to missing Poppler), indicating setup challenges. The development team is actively addressing these through updates to documentation and requirements files.

Development Team and Recent Activity

  1. LynseyFabel

  2. Nikhil Kulkarni (nv-nikkulkarni)

    • Updated Gradio version to 4.43.0 (4 days ago).
    • Documentation updates (10 days ago).
  3. Swastika Dutta (sduttanv)

    • Added NeMo Retriever Text Reranking support (5 days ago).
  4. Andreas Ennemoser (chiefenne)

  5. Jay Rodge (jayrodge)

    • Updated README for branding guidelines (12 days ago).
    • Added Multimodal RAG example (16 days ago).
  6. Katherine Huang (katherineh123)

    • Extensive updates for digital human security analyst project (+82,654 lines).
  7. Daniel Glogowski (dglogo)

    • Moved Nemo evaluator notebook, updated microservices (21-23 days ago).
  8. Nick Reamaroon (nreamaroon)

    • Updated microservice versions for RAG pipelines (21 days ago).
  9. Marc (phrocker)

  10. Zenodia Charpy (Zenodia)

    • Refactored Langchain agent notebook for llama-3.1 (24 days ago).
  11. Rohan Rao (rohrao)

    • Updated 5-minute RAG examples (25 days ago).
  12. Shubhadeep Das (shubhadeepd)

    • Contributed to v0.8.0 release with significant changes (+29,947 lines).

Of Note

This analysis underscores the need for improved dependency management and timely resolution of community-reported issues to enhance user experience and maintain engagement.

Quantified Reports

Quantify Issues



Recent GitHub Issues Activity

Timespan Opened Closed Comments Labeled Milestones
7 Days 4 1 2 3 1
30 Days 8 7 7 5 1
90 Days 9 9 8 6 1
All Time 43 21 - - -

Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.

Quantify commits



Quantified Commit Activity Over 30 Days

Developer Avatar Branches PRs Commits Files Changes
katherineh123 1 1/1/0 15 327 82654
Shubhadeep Das 1 1/1/0 1 300 29947
Sumit Bhattacharya 1 0/0/0 1 104 28112
Jay Rodge 1 3/3/0 4 11 721
Zenodia Charpy 1 2/1/1 1 1 489
LynseyFabel 2 1/1/0 2 1 294
Rohan Rao 1 1/1/0 4 3 136
Swastika Dutta 1 1/1/0 1 2 68
Daniel Glogowski 2 2/1/0 3 21 11
Nick Reamaroon 1 1/1/0 1 1 6
Nikhil Kulkarni 1 3/3/0 3 3 6
Marc 1 1/1/0 1 1 2
Andreas Ennemoser 1 1/1/0 1 1 2
THANAKON HAUNAONG (bang78945) 0 0/0/1 0 0 0
Harold Cobo (ItsaFugazi) 0 1/0/1 0 0 0
Mitesh Patel (patelmiteshn) 0 1/0/0 0 0 0
Mike McKiernan (mikemckiernan) 0 0/1/0 0 0 0
None (dnandakumar-nv) 0 1/0/0 0 0 0
meiranp-nvidia (meiranp-nvidia) 0 0/1/0 0 0 0
None (dependabot[bot]) 0 6/0/6 0 0 0
Gnanaprakash R (gnanaprakash-ravi) 0 1/0/0 0 0 0
Chris Alexiuk (chrisalexiuk-nvidia) 0 1/0/0 0 0 0

PRs: created by that dev and opened/merged/closed-unmerged during the period

Detailed Reports

Report On: Fetch issues



Recent Activity Analysis

The NVIDIA/GenerativeAIExamples repository currently has 22 open issues, with recent activity indicating a mix of installation errors, library conflicts, and user inquiries about functionality. Notably, issues such as #197 (TesseractNotFoundError) and #196 (Failed to ingest document) highlight common setup problems that users face, particularly related to dependencies and environment configuration. A recurring theme in the issues is the need for clearer documentation on installation requirements and troubleshooting steps, as many users report missing dependencies or configuration errors.

Issue Details

Most Recently Created Issues

  1. Issue #197: TesseractNotFoundError

    • Priority: High
    • Status: Open
    • Created: 1 day ago
    • Description: User reports an error indicating that Tesseract is not installed or not in the PATH.
  2. Issue #196: ERROR:example:Failed to ingest document due to exception Unable to get page count.

    • Priority: High
    • Status: Open
    • Created: 1 day ago
    • Description: User encounters an error related to document ingestion, specifically needing Poppler installed.
  3. Issue #195: Conflict between the fitz and pymupdf

    • Priority: Medium
    • Status: Open
    • Created: 4 days ago
    • Description: Identifies a conflict in requirements.txt between fitz and pymupdf, proposing a PR for resolution.

Most Recently Updated Issues

  1. Issue #119: Unable to upload files in Q&A Chatbot's RAG service

    • Priority: Medium
    • Status: Open
    • Created: 128 days ago; Edited 3 days ago
    • Description: User struggles with file uploads in a chatbot service, with proposed modifications to requirements and Dockerfile.
  2. Issue #162: Aurora Mpox Sentinela OMS

    • Priority: Low
    • Status: Open
    • Created: 26 days ago
    • Description: User shares code snippets related to data collection and analysis modules.
  3. Issue #158: When I run /RetrievalAugmentedGeneration/examples/developer_rag/chains.py

    • Priority: Medium
    • Status: Open
    • Created: 34 days ago
    • Description: User reports an error related to API key requirements while running a specific example.

Summary of Themes and Commonalities

  • Many issues revolve around dependency management and installation errors, particularly with libraries like Tesseract and Poppler.
  • Users frequently request clarification on documentation regarding setup processes and troubleshooting steps.
  • There is a notable trend of users encountering conflicts in requirements.txt, suggesting that dependency management could be improved.
  • Several discussions indicate a need for better handling of user errors within the application, particularly when incorrect configurations lead to frontend failures or server crashes.

This analysis highlights the importance of robust documentation and clear communication regarding installation requirements to enhance user experience and reduce the number of open issues.

Report On: Fetch pull requests



Report on Pull Requests

Overview

The NVIDIA Generative AI Examples repository currently has 8 open pull requests (PRs) and a total of 144 closed PRs. The open PRs focus on various enhancements, bug fixes, and updates to dependencies, showcasing ongoing development efforts to improve the repository's functionality and usability.

Summary of Pull Requests

  1. PR #194: Update requirements.txt

    • State: Open
    • Created: 4 days ago
    • Description: Resolves a conflict between fitz and pymupdf libraries in the requirements.txt file for the Multimodal RAG example. The fitz dependency was removed to ensure compatibility with pymupdf.
    • Significance: This change is crucial for maintaining the functionality of the example code without dependency conflicts.
  2. PR #186: Added langChain RAG notebook

    • State: Open
    • Created: 9 days ago
    • Description: Introduces a new notebook that demonstrates a copilot application using LangChain, integrating various models and providing a Gradio-based UI.
    • Significance: This addition expands the repository's offerings by showcasing practical applications of generative AI technologies.
  3. PR #185: Updating Evaluator Notebook

    • State: Open
    • Created: 9 days ago
    • Description: Modifies the Evaluator Notebook based on recent changes to the LM Eval Harness.
    • Significance: Ensures that evaluation processes remain up-to-date with the latest standards and practices.
  4. PR #183: Remove OpenAI dependency

    • State: Open
    • Created: 11 days ago
    • Description: Replaces the OpenAI library with ChatNVIDIA for interacting with NIMs, updating documentation and demo notebooks accordingly.
    • Significance: This transition reflects a strategic shift towards using NVIDIA's own tools, potentially enhancing performance and integration.
  5. PR #168: Nemo top-level folder restructure

    • State: Open
    • Created: 24 days ago
    • Description: Restructures folders related to NeMo for better organization and clarity.
    • Significance: Improved organization can enhance developer experience and ease navigation within the repository.
  6. PR #155: Update notebook examples

    • State: Open
    • Created: 47 days ago
    • Description: Fixes issues in multiple notebooks related to deprecated model names and container startup instructions.
    • Significance: Addresses usability concerns, ensuring that examples work as intended for users.
  7. PR #148: Knowledge Graph RAG setup fixes

    • State: Open
    • Created: 65 days ago
    • Description: Documents necessary changes to run the Knowledge Graph RAG example on a fresh Ubuntu installation, including external dependencies.
    • Significance: Enhances accessibility for new users attempting to set up the project.
  8. PR #110: Multiple file and session management added

    • State: Open
    • Created: 148 days ago
    • Description: Implements structured management for multiple user sessions and files within the application.
    • Significance: This enhancement improves user experience by allowing concurrent interactions without conflicts.

Analysis of Pull Requests

The current state of open pull requests in the NVIDIA Generative AI Examples repository reflects a focused effort on improving usability, enhancing functionality, and ensuring compatibility with evolving technologies in generative AI. The majority of these PRs are aimed at addressing specific issues or adding new features that align with user needs and community feedback.

Common Themes

A notable theme among the open pull requests is dependency management and conflict resolution. For instance, PR #194 effectively resolves library conflicts that could hinder functionality, while PR #183 transitions away from an external dependency (OpenAI) towards an internal solution (ChatNVIDIA). This shift not only simplifies dependency management but also aligns with NVIDIA's strategic goals of promoting its own tools.

Another commonality is the enhancement of user experience through improved documentation and examples. PRs such as #186 (adding a new LangChain notebook) and #155 (updating existing notebooks) demonstrate an ongoing commitment to providing clear, practical examples that facilitate user engagement with the technology.

Anomalies

Despite the active development reflected in these PRs, there remains a significant number of open issues (30), which may indicate challenges in addressing all community feedback or bugs promptly. This could lead to frustration among users seeking timely resolutions or enhancements.

Additionally, several older PRs remain unmerged or unresolved, such as PR #110 regarding session management which was created nearly five months ago. This delay may suggest resource constraints or prioritization issues within the development team.

Lack of Recent Merge Activity

While there is a healthy volume of open PRs, it is concerning that many have been open for several days without merging or significant reviewer activity. This could indicate bottlenecks in the review process or potential disagreements among contributors regarding implementation details.

Conclusion

Overall, the current landscape of pull requests in the NVIDIA Generative AI Examples repository indicates an active development environment focused on improving user experience through enhanced documentation, feature additions, and strategic shifts in dependencies. However, attention should be given to resolving older PRs and addressing community issues more promptly to maintain engagement and satisfaction among users.

Report On: Fetch commits



Repo Commits Analysis

Development Team and Recent Activity

Team Members and Activities

  1. LynseyFabel

    • Recent Activity: Restructured and rewrote README.md (4 days ago) with significant changes (+99, -48 lines).
    • Collaborations: None reported.
  2. Nikhil Kulkarni (nv-nikkulkarni)

    • Recent Activity:
    • Updated Gradio version to 4.43.0 (4 days ago), fixing issue #188.
    • Updated documentation to delete ingested file using UI (10 days ago).
    • Updated Docker compose plugin version (10 days ago).
    • Collaborations: None reported.
  3. Swastika Dutta (sduttanv)

    • Recent Activity: Added support for NeMo Retriever Text Reranking in O-RAN chatbot (5 days ago), making extensive changes (+38, -24 lines).
    • Collaborations: None reported.
  4. Andreas Ennemoser (chiefenne)

    • Recent Activity: Fixed a typo in README.md (10 days ago).
    • Collaborations: None reported.
  5. Jay Rodge (jayrodge)

    • Recent Activity:
    • Updated README files to align with branding guidelines (12 days ago).
    • Added Multimodal RAG example to community projects (16 days ago), contributing significantly (+697, -6 lines).
    • Collaborations: Worked on pull requests with other team members.
  6. Katherine Huang (katherineh123)

    • Recent Activity:
    • Engaged in extensive updates including the digital human security analyst project, leading to a massive number of changes (+82,654 lines) across multiple files (15 commits in total over the last 30 days).
    • Reverted previous changes related to the digital human security analyst project (17 days ago).
    • Collaborations: Collaborated with various team members on multiple pull requests.
  7. Daniel Glogowski (dglogo)

    • Recent Activity:
    • Moved Nemo evaluator notebook and removed unnecessary folders (23 days ago).
    • Contributed to updates for LLM, Embedding, and Reranking microservices used by RAG pipelines (21 days ago).
    • Collaborations: Worked with other team members on pull requests.
  8. Nick Reamaroon (nreamaroon)

    • Recent Activity: Updated versions for microservices used by RAG pipelines (21 days ago).
    • Collaborations: None reported.
  9. Marc (phrocker)

    • Recent Activity: Updated README.md (23 days ago).
    • Collaborations: None reported.
  10. Zenodia Charpy (Zenodia)

    • Recent Activity: Refactored Langchain agent notebook to use llama-3.1 (24 days ago), contributing significant changes (+257, -232 lines).
    • Collaborations: Worked on pull requests with other team members.
  11. Rohan Rao (rohrao)

    • Recent Activity: Multiple updates related to the 5-minute RAG examples including README and main.py adjustments (25 days ago).
    • Collaborations: None reported.
  12. Shubhadeep Das (shubhadeepd)

    • Recent Activity: Significant contributions including upstream changes for v0.8.0 release and other updates totaling +29,947 lines across multiple files.
    • Collaborations: Worked on various pull requests with other team members.

Patterns and Themes

  • The development team is actively engaged in enhancing documentation and adding features, particularly around the NeMo framework and RAG pipelines.
  • Katherine Huang's contributions stand out due to their volume and impact, indicating a focus on substantial project components.
  • Collaboration is evident among team members through pull requests, suggesting a cooperative environment despite individual contributions.
  • The recent activities show a balance between feature development, bug fixes, and documentation improvements, reflecting a comprehensive approach to project maintenance.
  • The high number of open issues and pull requests may indicate ongoing challenges or areas for improvement within the project’s lifecycle management.

Conclusion

The NVIDIA Generative AI Examples repository reflects an active development environment with significant contributions from various team members focusing on both feature enhancements and documentation improvements. The collaborative nature of the team's work suggests effective communication and shared goals within the project framework.