‹ Reports
The Dispatch

GitHub Repo Analysis: Generic


Software Project Analysis Summary

TensorRT-LLM is a project that focuses on optimizing large language models for inference on NVIDIA GPUs. While the project seems to be under active development, it also has some pending issues and concerns that can impact its overall stability and usefulness.

State of the Project

  1. Installation and build issues abound, with users encountering problems across different environments, such as Windows, Docker containers, and AWS instances. [#32, #23, #18, #22, #45]
  2. The project has compatibility issues with different models causing performance-related issues. [#24, #49, #47, #29, #27]
  3. Incorrect outputs and inadequate documentation create uncertainties for users aiming to use the toolbox. [#53, #37, #52, #39]
  4. Dependency problems, especially outdated ones, are prevalent in the project. [#16]
  5. Many open issues and pull requests yet to be resolved, leaving users uncertain about the project's future progress.

Future Trajectory and Recommendations

TensorRT-LLM has potential given its usefulness for launching large language models on NVIDIA GPUs. However, its continued success hinges on developers' efforts to:

  1. Improve the build and installation process and ensure compatibility across different platforms and environments.
  2. Address performance and compatibility issues with various models.
  3. Update dependencies to avoid issues related to outdated libraries.
  4. Provide clearer, more comprehensive documentation to ease the learning curve for new users.
  5. Prioritize resolving open issues and pull requests to maintain user trust and engagement.

Overall, addressing these issues could enhance TensorRT-LLM's credibility as a reliable toolbox and garner further interest from the open-source community.

Detailed Reports

Report on issues



Software Project Issue Analysis

Themes

The primary themes that revolve around the issues in this software project include:

  • Installation and Build Issues: Majority of the issues revolve around installation and building challenges. Users seem to be facing problems in building the project from the source(#32). Building failures, particular issues with Docker, and compatibility issues with NVIDIA drivers(#23) also seem commonplace. There are problems regarding building the software in different environments such as AWS(#32), Windows(#18), different LINUX distributions, and in different Docker containers(#22). One issue reports about specific CUDA version requirement(#45).

  • Model and Performance related Issues: These issues mostly revolve around the compatibility, functionality, and performance of various models with TensorRT-LLM. For example, issues (#24, #49, #47, #29, #27) have been reported about GPT-2, vLLM, Mistral 7B, and RWKV's compatibility and performance. Bug related to output in GPT2 example was reported(#53).

  • Dependency Problems: Particularly, in Issue (#16), a user highlights an outdated dependency (transformers==4.31.0) which is causing problems.

  • Other Problems: These include unclear guidance about how to run TritonServer (#39), wrong outputs in examples (#53, 37), and request for new releases or wheels (#49, 18, 52).

Significant Problems

One of the significant problems appears to be the compatibility issues with certain drivers, CUDA versions(#23, #45) and working across different platforms(#22, #18). The build and installation issues expose areas that are needed for more robust testing(#32).

Performance issues (#29, #24) raise concerns about the tool's scalability and efficiency. Inadequate documentation or lack of clear guidance that is causing issues(#52, #39) suggests the need for better user-guides, exemplars, or tutorials.

Major Uncertainties

Major uncertainties lie with the project's compatibility with models and necessary software dependencies utilized by the users (such as PyTorch and different GPT variants) (#27, #49, #47, #16). The compatibility and performance of various models with the tool is uncertain.

Another uncertainty lies in unknown response time for resolving these issues, particularly those affecting the tool’s usability in users' environments.

Worrying Anomalies

Worrying anomalies include the problem with output results (#53, #37) and the issue of specific build running out of memory (#29). These issues highlight potential defects in the software that will require attention to ensure it functions as expected.

Another worrying issue is out-of-date dependencies, implying that the software may not be up to date or optimized with the latest libraries(#16).

Newly Identified Issue

The newly identified issue seems to be related to Speed comparison with vllm(#24), indicating a potentially new user requirement or comparison base that the current project may not fully cater to.

Comparing Open and Closed Issues

Comparing open and closed issues, it appears that most of the issues are still open. The closed issues primarily appear to revolve around build and installation issues, incompatibility with certain GPUs or models, and requests for certain features. Majority of the issues seem to be centered on difficulties with installation and building, compatibility issues with dependence, and specific model support problems.

Thus, it seems that the core problems spotted in resolved issues are still recurring and have not been fully addressed.

Report on pull requests



Summary and Analysis

Open Pull Requests:

  1. Fix link jump in windows readme.md: This pull request resolves an issue with a hyperlink in the windows readme.md file that jumps to an incorrect location. An active discussion is present, with appreciation from contributors for the fix. The stated intention is to merge this after handling synchronization differences between the internal repository and the release branch that directly merging may cause.

  2. fix Forward Compatibility mode is UNAVAILABLE error: There is an issue with the BASH_ENV default variable value being overwrited in the base image. This pull request attempts to fix this problem.

  3. Bump onnx from 1.12.0 to 1.13.0: This pull request is looking to update the onnx dependency from 1.12.0 to 1.13.0. The detailed discussion or response from reviewers are yet to be seen.

Notable Themes:

  1. Link and Reference Fixes: There is an ongoing theme of updating dead, incorrect or non-performing hyperlinks in markdown files. Notably, these changes have been towards readme and documentations files. This indicates that the project maintainers are working on improving their documentation and readability for users.

  2. Update Dependencies: Another theme is working to update the version of key dependencies, as seen in the current open pull request to bump onnx version.

Closed Pull Requests:

In the closed pull requests, there are many examples of various fixes and updates. This includes documentation improvements like updating deadlinks and small issues. Additionally, there are several instances of updating libraries and dependencies, specifically on aarch64 libraries, batch manager libraries, and attempts towards updating the TensorRT-LLM code itself. A key takeaway from closed requests is that the project seems active with regular updates and attempts at improvement.

Worrisome Anomalies / Major Uncertainties:

There doesn't seem to be any significant anomalies or major uncertainties within the pull requests. Given the context is limited to the pull requests themselves, it does provide a sense of a well-managed and organized project with active contributors and effective issue-handling. It would be useful to keep an eye on how quickly pull requests are reviewed and merged, as a potential sign of project health.

Report on README



Summary

TensorRT-LLM is a toolbox for Large Language Models (LLMs), developed to build TensorRT engines that efficiently perform inference on NVIDIA GPUs. The project is designed with a Python API similar to PyTorch and supports models on a broad range of GPU configurations. It also offers support for various quantization modes, INT4 or INT8 weights, and an implementation of the SmoothQuant technique.

Key points of attention:

  • TensorRT-LLM requires building from source, install instruction can be found in 'Installation' section.
  • Docker container for TensorRT-LLM is mentioned but is not yet available.
  • Certain functionalities are not enabled for all models/ GPUs.
  • Some memory-related issues are reported while building models, which can be resolved by reducing batch size, input, and output lengths or enabling plugins.
  • An inconsistency between mention of TensorRT 9.1.0.4 in 'Release notes' and 9.1 in badges at project start.