‹ Reports
The Dispatch

GitHub Repo Analysis: Helicone/helicone


Executive Summary

The Helicone project is an open-source observability platform for large language models (LLMs), designed to facilitate monitoring, evaluation, and experimentation with AI models. Developed by Helicone and incubated in Y Combinator's Winter 2023 batch, the platform integrates with AI services like OpenAI and Anthropic, offering features such as agent tracing and prompt management. The project is in a state of active development with a focus on enhancing user experience and expanding functionality.

Recent Activity

Development Team Members:

  1. LinaLam
  2. Justin Torre (chitalian)
  3. Kavin Valli (kavinvalli)
  4. Nathan Baschez (nbashaw)
  5. Colegott Dank (colegottdank)
  6. Use-tusk[bot]
  7. Koshyviv

Recent Commits and PRs:

Recent activities indicate a strong focus on both feature development and content updates, with ongoing efforts to address bugs and improve documentation.

Risks

Of Note

Overall, Helicone is progressing well with active development and community involvement but must address documentation and deployment challenges to ensure broader adoption and smoother user experiences.

Quantified Reports

Quantify issues



Recent GitHub Issues Activity

Timespan Opened Closed Comments Labeled Milestones
7 Days 4 0 0 0 1
30 Days 9 1 0 1 1
90 Days 29 4 53 5 1
1 Year 66 61 104 31 1
All Time 168 126 - - -

Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.

Rate pull requests



2/5
The pull request consists of minor formatting changes in a markdown file, specifically adding a space and adjusting indentation. These changes are insignificant in terms of functionality or documentation improvement. The PR lacks substantial content or impact, making it notably trivial. It doesn't introduce any new features, bug fixes, or significant documentation enhancements. Therefore, it is rated as 'Needs work' due to its minimal contribution to the project.
[+] Read More
2/5
The pull request addresses a minor typographical error, changing 'editting' to 'editing' in a documentation file. While correcting typos is important for maintaining professionalism and clarity, this change is insignificant in terms of impact on the overall project. It does not introduce new features, fix critical bugs, or enhance functionality. Therefore, it is rated as needing work due to its trivial nature.
[+] Read More
3/5
The pull request focuses on updating meta-descriptions and open-graph images across multiple pages, which is a necessary task for improving SEO and social media sharing. However, the changes are relatively minor and do not introduce any new features or significant improvements to the codebase. The PR is well-executed with no apparent flaws, but it remains an unremarkable update in terms of impact. Therefore, it deserves an average rating of 3.
[+] Read More
3/5
The pull request introduces a substantial number of changes across multiple files, indicating a potentially significant update to the project. However, the changes appear to be primarily focused on refactoring and renaming, with many lines added and removed without clear documentation or explanation of the impact. While there are new features like usage tracking for evaluators, the overall significance and quality of the changes are not immediately clear from the diff alone. This makes it an average pull request that could benefit from more detailed documentation or testing to highlight its importance.
[+] Read More
3/5
The pull request involves significant changes to the dashboard, including the addition of new graph components and updates to existing ones. While it introduces new features and refactors some parts of the code, the description suggests that the work is incomplete and some tasks are deferred. The changes are substantial but not exceptional, with a mix of additions and modifications across several files. The PR appears to be a work in progress rather than a polished, final submission, which aligns with an average rating.
[+] Read More
3/5
This pull request introduces a new model 'o1' and its pinned version with associated costs and details. The changes are straightforward, adding new entries to an existing data structure without altering existing logic or functionality. While the addition is clear and well-structured, it lacks complexity or significant impact on the overall project, making it an average update. It does not introduce any notable flaws, but also does not stand out as a significant enhancement.
[+] Read More
3/5
The pull request addresses a specific issue by enhancing the accordion scroll behavior in two components. The changes are minor, involving the addition of a useRef hook and a scrollIntoView function to improve user experience. While these updates are beneficial, they are not particularly complex or groundbreaking. The PR is well-executed but lacks significant impact or innovation, making it an average contribution.
[+] Read More
3/5
The pull request adds a new blog post about Gemini 2.0 Flash, including comprehensive content and multiple images. The changes are primarily documentation-related, with a few minor code adjustments to integrate the new blog entry. While the content is well-structured and informative, it does not introduce significant code changes or enhancements to the project itself. The PR is valuable for content addition but lacks technical depth or complexity, making it an average contribution.
[+] Read More
4/5
The pull request addresses a significant security flaw by moving authorization checks from the client side to the server side, which is crucial for preventing unauthorized access. The implementation is thorough, checking user tiers and usage limits effectively. However, while the change is important and well-executed, it lacks additional documentation or tests that could further enhance its robustness and maintainability. Overall, it's a commendable improvement but could benefit from more comprehensive validation.
[+] Read More
4/5
The pull request introduces a well-structured feature for soft deleting properties, which is significant for managing property visibility without altering the existing schema. The implementation includes a new database table, API endpoints, and UI components, showcasing thoroughness and attention to detail. However, while the functionality is comprehensive, it lacks extensive testing details or performance considerations, which prevents it from being rated as exemplary. Overall, it's a well-executed and meaningful addition to the project.
[+] Read More

Quantify commits



Quantified Commit Activity Over 14 Days

Developer Avatar Branches PRs Commits Files Changes
Kavin Valli 4 18/17/1 20 142 134065
Justin Torre 5 19/18/1 50 378 113260
colegottdank 2 10/10/0 11 71 104271
LinaLam 2 11/11/0 14 139 4260
Nathan Baschez 1 1/1/0 1 1 20
koshyviv 1 0/1/0 1 1 3
use-tusk[bot] 1 2/1/1 1 1 2
None (zm1355) 0 1/0/0 0 0 0
Shoaib Akhtar (STAR-173) 0 1/0/0 0 0 0
Ikko Eltociear Ashimine (eltociear) 0 1/0/0 0 0 0
Rupali Kavale (coderquill) 0 1/0/0 0 0 0
None (josh-sophon) 0 1/0/0 0 0 0

PRs: created by that dev and opened/merged/closed-unmerged during the period

Quantify risks



Project Risk Ratings

Risk Level (1-5) Rationale
Delivery 4 The project faces significant delivery risks due to a backlog of unresolved issues and a slow resolution rate. Over the last 90 days, 29 issues were opened with only 4 closed, indicating a growing backlog. The lack of structured issue management, as evidenced by minimal labeling and milestone setting, further exacerbates this risk. High-priority bugs like #3082 and #3081 require immediate attention to prevent further degradation of user experience.
Velocity 4 The project's velocity is at risk due to a disparity in contributions among developers and potential bottlenecks in the review process. While some developers like Justin Torre are highly productive, others show minimal activity, indicating uneven workload distribution. The high volume of changes without corresponding merges suggests potential delays in integrating new features, impacting overall project velocity.
Dependency 3 Dependency risks are moderate due to integration challenges with external systems like Azure OpenAI and Anthropic. Issues related to Docker deployment and self-hosting also indicate technical debt accumulation. While there are efforts to maintain comprehensive API documentation, the reliance on external systems requires careful management to prevent integration failures.
Team 3 The team faces moderate risks related to workload distribution and communication gaps. The disparity in commit activity among team members suggests potential engagement issues or uneven workload distribution. Additionally, the presence of incomplete or unclear pull requests indicates possible inefficiencies in the review process or communication gaps within the team.
Code Quality 3 Code quality risks are moderate due to a mix of well-executed security improvements and trivial updates with minimal impact. Significant changes like moving authorization checks server-side improve security but lack comprehensive documentation or tests, which could affect maintainability. The presence of trivial pull requests indicates a focus on minor corrections rather than substantial development progress.
Technical Debt 4 Technical debt is accumulating due to unresolved Docker-related errors and frequent changes without corresponding updates in test coverage. The high volume of changes by a few developers without timely integration increases the risk of technical debt if these changes are not managed effectively. Documentation gaps also contribute to this risk, as seen in issues like #3081 and #3079.
Test Coverage 4 Test coverage is insufficient across several areas, as highlighted by the absence of additional tests for significant security improvements and new features. The lack of detailed testing information for major pull requests like #3092 raises concerns about the robustness of new functionalities. This gap poses risks for catching bugs and regressions effectively.
Error Handling 4 Error handling is inadequate, with several issues highlighting poor error messages or handling mechanisms. Bugs such as improperly formatted API error responses (#3057) suggest insufficient error handling practices. The absence of dynamic content handling in files like 'evaluate.tsx' further underscores this risk.

Detailed Reports

Report On: Fetch issues



Recent Activity Analysis

Recent GitHub issue activity for the Helicone project reveals a focus on addressing bugs, enhancing features, and improving documentation. Notably, several issues pertain to integration challenges with various AI platforms, such as Azure OpenAI and Anthropic, indicating ongoing efforts to streamline these processes. There are also multiple reports of discrepancies in token counting and API responses, suggesting a need for more robust error handling and validation mechanisms.

A significant anomaly is the recurring theme of documentation inconsistencies, particularly concerning integration instructions and feature usage. This has led to user confusion and implementation errors, highlighting the importance of maintaining up-to-date and clear documentation. Additionally, issues related to Docker deployment and self-hosting indicate potential barriers for users attempting to deploy Helicone independently.

Issue Details

  • #3082: [Bug]: Bad UX when someone accesses sessions with a wrong session id

    • Priority: High
    • Status: Open
    • Created: 2 days ago
  • #3081: [Bug]: Enumerate all required components

    • Priority: High
    • Status: Open
    • Created: 3 days ago
  • #3080: [Bug]: Docker compose, running locally

    • Priority: Medium
    • Status: Open
    • Created: 3 days ago
  • #3079: [Bug]: Pre-reqs - Really?

    • Priority: Medium
    • Status: Open
    • Created: 3 days ago
  • #3057: [Bug]: Improperly Formatted Error from API Call

    • Priority: Medium
    • Status: Open
    • Created: 9 days ago
  • #2977: [Bug]: Docker account creation error in supabase

    • Priority: Medium
    • Status: Open (Edited recently)
    • Created: 32 days ago
  • #2463: [Bug]: Helicone Helm Chart Repository Returns 404

    • Priority: Low
    • Status: Open (Edited recently)
    • Created: 126 days ago

The most recent issues primarily focus on bugs affecting user experience and deployment processes. Critical issues like #3082 highlight significant user interface problems that could hinder usability. Issues such as #3081 and #3080 reflect ongoing challenges with local deployment and component enumeration, which are crucial for seamless integration and operation. The persistence of these issues suggests a need for enhanced testing and documentation efforts to ensure smoother user experiences.

Report On: Fetch pull requests



Analysis of Pull Requests for Helicone Project

Open Pull Requests

#3092: Add Custom Properties Soft Delete Functionality

  • State: Open
  • Created: 0 days ago
  • Details: This PR introduces a significant new feature allowing properties to be soft-deleted, which involves adding a new table and endpoints for managing property visibility. The implementation seems comprehensive, with updates across multiple components and API endpoints.
  • Notable Points: The PR is very recent and involves a database migration, which could have implications for deployment and backward compatibility. It requires careful review to ensure that the new functionality integrates well with existing features.

#3090: docs: update resell-a-model.mdx

  • State: Open
  • Created: 1 day ago
  • Details: A minor documentation update correcting a typo.
  • Notable Points: While minor, maintaining accurate documentation is crucial for user understanding and project credibility.

#3089: fix: update accordion scroll behavior in course generator

  • State: Open
  • Created: 1 day ago
  • Details: Fixes an issue related to scroll behavior in the course generator component.
  • Notable Points: This PR addresses a specific bug (#2981), which suggests it was identified as a priority fix. It should be tested thoroughly to ensure it resolves the issue without introducing new bugs.

#3088: New blog: gemini-2.0 Flash

  • State: Open
  • Created: 2 days ago
  • Details: Adds a new blog post about "gemini-2.0 Flash" with accompanying images and metadata.
  • Notable Points: Content updates like this are important for keeping the community engaged and informed. The visual elements need to be checked for quality and relevance.

#3085: thread deepmind

  • State: Open
  • Created: 2 days ago
  • Details: Appears to be an incomplete or placeholder PR with unclear objectives.
  • Notable Points: This PR lacks clarity and purpose. It should be either completed with clear objectives or closed if not needed.

Recently Closed Pull Requests

#3091: New blog: openai o3

  • State: Closed (Merged)
  • Created/Closed: 0 days ago
  • Details: Introduced a new blog post about "openai o3" with images and metadata.
  • Significance: Quick turnaround indicates efficient content management. Ensures timely dissemination of information.

#3087 & #3086 & #3084 & #3083 & #3078 & #3077 & #3076 & #3075 & #3073 & #3072 & #3071 & #3070 & #3068 & #3067 & #3065 & #3064 & #3063 & #3062:

These PRs were closed within the last few days, indicating active development and maintenance. Notably:

  • Several PRs (#3087, #3086, etc.) involved UI/UX improvements, bug fixes, and content updates, reflecting ongoing efforts to enhance user experience and platform stability.
  • PRs like #3068 ("life of pi") suggest experimental or playful content additions, which can engage users but should be balanced with core functionality improvements.

Notable Issues

  1. Unmerged Closed PRs (#3052):

    • The presence of closed but unmerged PRs like #3052 suggests potential issues during review or testing phases that prevented merging. These should be revisited to determine if they can be resolved or if they highlight underlying problems needing attention.
  2. Older Open PRs (#3016, etc.):

    • Some open PRs have been pending for weeks (#3016). These may indicate low-priority tasks or bottlenecks in the review process that could benefit from additional resources or prioritization.

Recommendations

  1. Prioritize reviewing and merging critical feature updates like #3092 to maintain momentum on key functionalities.
  2. Address any ambiguities in open PRs such as #3085 to prevent clutter and ensure all contributions are purposeful.
  3. Monitor the impact of recent merges on system stability and user feedback to quickly address any unforeseen issues.
  4. Re-evaluate older open PRs to decide whether they should be prioritized or closed if no longer relevant.

Overall, the Helicone project is actively maintained with a focus on both feature development and content creation, which is essential for community engagement and platform growth.

Report On: Fetch Files For Assessment



Source Code Assessment

File: bifrost/app/blog/blogs/openai-o3/metadata.json

  • Structure and Content: This JSON file contains metadata for a blog post about OpenAI's new O3 model. It includes multiple title fields with the same content, a description, an image path, reading time, author, date, and a badge.
  • Quality: The file is well-structured for its purpose. However, having three identical title fields (title, title1, title2) seems redundant and could be streamlined to a single field unless there is a specific reason for this duplication.

File: bifrost/app/pi/total-cost/page.tsx

  • Structure and Functionality: This React component uses several hooks and libraries (e.g., react-query, recharts) to fetch and display cost data over time in a bar chart. It handles API key validation and redirects if the API key is invalid.
  • Quality: The code is well-organized with clear separation of concerns. The use of hooks for data fetching and state management is appropriate. However, some inline styles could be extracted to CSS classes for better maintainability.

File: docs/swagger.json

  • Structure and Content: This file is a Swagger/OpenAPI specification detailing the API endpoints, request/response schemas, and other configurations.
  • Quality: The file appears comprehensive and well-documented, providing essential information for API consumers. Given its size (over 17k lines), maintaining this file could be challenging; automated tools should be used to ensure consistency and accuracy.

File: valhalla/jawn/src/controllers/public/piController.ts

  • Structure and Functionality: This TypeScript controller defines several endpoints related to sessions, organization names, total costs, total requests, and costs over time. It uses TSOA decorators for routing and security.
  • Quality: The code is cleanly structured with proper error handling. The use of comments to explain the purpose of certain operations (e.g., session cleanup) is helpful. Consideration for potential performance issues (e.g., random session deletion) shows good foresight.

File: web/components/templates/users/UserMetrics.tsx

  • Structure and Functionality: This React component displays user metrics using charts. It includes UI elements like checkboxes and dropdowns for user interaction.
  • Quality: The component is well-designed with reusable sub-components (Chart). State management using hooks is appropriate. Consider optimizing the rendering logic by memoizing components where necessary to improve performance.

File: bifrost/app/community/communityPage.tsx

  • Structure and Functionality: This React component implements a tabbed interface to switch between different community-related views (Projects, Integrations, Customers).
  • Quality: The component is straightforward with clear logic for tab selection. The use of utility functions like clsx for conditional class names enhances readability.

File: bifrost/lib/clients/jawnTypes/private.ts

  • Structure and Content: This TypeScript file contains type definitions generated by an OpenAPI tool. It defines interfaces for various API paths and components.
  • Quality: As an auto-generated file, it should not be manually edited. Ensure that any changes to the API are reflected in the OpenAPI specification to keep this file up-to-date.

File: valhalla/jawn/src/managers/UserManager.ts

  • Structure and Functionality: This manager handles user metrics queries using ClickHouse for database operations. It includes methods for fetching user metrics overview and detailed metrics.
  • Quality: The code demonstrates good use of abstraction by separating query construction from execution. Error handling is consistent across methods. Consider adding more comments to explain complex query logic.

File: bifrost/app/changelog/changes/20241219-user-metrics/src.mdx

  • Structure and Content: This markdown file documents new features related to user histograms in the application.
  • Quality: The content is concise and informative, highlighting key features and use cases effectively. Ensure that this documentation is linked appropriately in user-facing areas of the application.

File: bifrost/packages/cost/providers/openai/fine-tuned-models.ts

  • Structure and Content: This TypeScript file defines cost structures for various fine-tuned OpenAI models.
  • Quality: The structure is simple but effective for defining model costs. Comments at the top provide clear instructions on editing restrictions, which helps maintain consistency across related files.

Overall, the source code files are well-organized with clear separation of concerns. There are opportunities for minor improvements in code maintainability through refactoring redundant elements or optimizing rendering logic in React components.

Report On: Fetch commits



Repo Commits Analysis

Development Team and Recent Activity

Team Members and Their Activities:

  1. LinaLam

    • Worked on blog content, adding new blogs and updating metadata.
    • Recent commits include work on the "openai o3 draft" and "Comparing Crewai vs Dify AI" blogs.
    • Collaborated with other team members on various documentation updates.
  2. Justin Torre (chitalian)

    • Focused on various bug fixes, feature enhancements, and infrastructure improvements.
    • Significant contributions to the "Pi" feature, user histogram functionality, and onboarding processes.
    • Collaborated with Kavin Valli on experiments and prompt management features.
  3. Kavin Valli (kavinvalli)

    • Worked on improving the experiments module, including input wrapping fixes and performance enhancements.
    • Contributed to the online evaluations feature and various UI improvements.
    • Collaborated with Justin Torre on several features related to experiments.
  4. Nathan Baschez (nbashaw)

    • Minor contribution involving updates to fine-tuned models.
  5. Colegott Dank (colegottdank)

    • Engaged in UI enhancements and SEO improvements.
    • Worked on caching mechanisms and latency comparisons.
  6. Use-tusk[bot]

    • Made minor improvements related to session details fetching limits.
  7. Koshyviv

    • Updated documentation related to Docker setup.

Patterns, Themes, and Conclusions:

  • The development team is actively working on enhancing both frontend and backend functionalities, with a strong focus on improving user experience through UI/UX enhancements and bug fixes.
  • There is a continuous effort to expand the platform's capabilities with new features such as user histograms, online evaluations, and prompt management enhancements.
  • Collaboration among team members is evident in shared tasks like experiments module improvements and onboarding process refinements.
  • Documentation updates are frequent, indicating an emphasis on maintaining comprehensive guides for users.
  • The project shows a balanced mix of feature development, bug fixing, and infrastructure optimization, reflecting a mature development process aimed at scaling the platform's capabilities while ensuring stability.