‹ Reports
The Dispatch

GitHub Repo Analysis: keephq/keep


Executive Summary

The "keephq/keep" project is an open-source platform for alert management and AIOps, designed to streamline incident management through features like alert deduplication, correlation, and integration with various monitoring tools. It is actively maintained with a robust community interest, evidenced by over 5,927 stars and 714 forks. The project is in a state of active development with frequent updates and enhancements, focusing on both backend improvements and user interface enhancements.

Recent Activity

Team Members and Activities (Reverse Chronological Order)

  1. Vladimir Filonov

    • Fixed alert formatting for AzureMonitoring.
    • Worked on parallel test execution and database migrations.
  2. Matvey Kukuy

    • Enhanced analytics features and demo mode.
    • Improved provider integrations.
  3. Tal Borenstein

    • Focused on PagerDuty improvements and UI enhancements.
    • Refactored API client.
  4. Shahar Glazner

    • Contributed to mobile UI features and demo experience.
    • Worked on API verbosity.
  5. Kirill Chernakov

    • Involved in UI refactoring and server-side sign-in page improvements.
  6. Jay Kumar

    • Improved deduplication page and manual alert enrichment.
  7. Furkan Pehlivan

    • Added retry mechanism to Google Chat provider.
  8. Hayk Davtyan

    • Fixed a typo in the README.
  9. Dependabot[bot]

    • Managed dependency updates for packages like aiohttp.

Patterns and Themes

Risks

Of Note

  1. AI Enhancements: The introduction of AI-powered features such as custom LLMs (#2704) indicates a strategic direction towards more intelligent alert management solutions.
  2. User Experience Focus: Multiple PRs aim to improve UX, such as ticketing modal enhancements (#2667) and topology UX improvements (#2661), reflecting a commitment to user-centric design.
  3. Performance Optimization: Efforts to reduce loading times (#2413) suggest a proactive approach to enhancing platform efficiency.

Quantified Reports

Quantify issues



Recent GitHub Issues Activity

Timespan Opened Closed Comments Labeled Milestones
7 Days 68 53 22 4 1
30 Days 192 139 36 8 1
90 Days 309 198 185 40 1
1 Year 366 198 293 66 1
All Time 984 815 - - -

Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.

Rate pull requests



2/5
The pull request is a draft and lacks completion in several areas, as indicated by the unchecked items in the 'Checks' section. It introduces significant changes with a large number of lines added and removed, which increases the complexity and risk of introducing bugs. The CI checks have failed, suggesting potential issues with the code. Additionally, there is no clear description of the changes or their impact, making it difficult to assess their significance or quality. Overall, it requires further refinement and testing before it can be considered a good contribution.
[+] Read More
3/5
The pull request introduces a Next.js build caching step to the e2e workflow, which is a useful enhancement for improving build efficiency. However, it remains a draft and lacks thorough documentation or testing evidence to assess its impact fully. The PR is relatively small in scope with minimal code changes, and while it addresses a specific issue (#2036), it does not demonstrate significant complexity or innovation. The coverage report indicates a decrease, but this may not be directly related to the changes introduced. Overall, it's an average contribution that could benefit from further development and refinement.
[+] Read More
3/5
This pull request is significant in scope, as indicated by the XXL size label, and involves switching alert queries to use a new 'LastAlert' system. It includes numerous code changes across multiple files, with substantial additions and deletions. The PR addresses issue #2623 and is dependent on another PR (#2473). However, the complexity and size may introduce risks of bugs or integration issues. The description lacks detail on testing or potential impacts, and the checklist indicates that code style adherence and documentation updates are pending. Overall, while the change is potentially beneficial, the execution appears average due to incomplete checks and documentation.
[+] Read More
3/5
The pull request introduces a feature to read the system's default theme, which is a moderately significant change. The implementation is functional and adheres to standard coding practices. However, it lacks comprehensive documentation updates and does not fully adhere to the project's code style guidelines as indicated by the unchecked items in the checklist. Additionally, while all modified lines are covered by tests, the overall project coverage is low, which might affect confidence in the change. Thus, it is an average contribution with room for improvement.
[+] Read More
3/5
The pull request improves the user experience of a ticketing modal by adding a tooltip and ensuring the form resets upon closing. While these changes enhance usability, they are relatively minor in scope and impact. The code adheres to project style and includes necessary updates, but lacks significant innovation or complexity. Overall, it's a solid contribution with no major flaws, justifying an average rating.
[+] Read More
3/5
The pull request addresses a specific bug related to SQL to CEL conversion by adding validation for parsed queries. The changes are minor, involving only a few lines of code, and the PR is labeled as size XS, indicating its limited scope. While it fixes an issue, the impact seems relatively small, and the PR lacks documentation updates or extensive testing details. Overall, it is an average contribution that resolves a particular problem without introducing new features or significant improvements.
[+] Read More
3/5
The pull request improves the user experience of the topology feature by refactoring code to use incidents instead of alerts, which seems to be a necessary change for the application's functionality. It also includes updates for better tooltip positioning. However, the PR lacks documentation updates and adherence to code style checks, as indicated by unchecked items in the checklist. The changes are moderate in significance, neither introducing groundbreaking features nor being trivial. Therefore, it merits an average rating.
[+] Read More
3/5
The pull request introduces a minor feature by adding an environment variable for configuring a local LLM with OpenAI, which is a useful addition but not particularly significant or complex. The changes are minimal, affecting only two lines across two files, and the documentation update is straightforward. There are no apparent bugs or security issues, but the impact of the change is limited. Overall, the PR is average and unremarkable, fitting the criteria for a rating of 3.
[+] Read More
4/5
The pull request introduces a significant change by transitioning the alert management system from using alert IDs to fingerprints, which is a meaningful architectural improvement. It includes comprehensive updates across various components, such as database models, API routes, and tests. The PR also addresses potential issues with detailed refactor suggestions and validation improvements. However, the PR is quite large (XXL size), which could have been broken down into smaller parts for easier review. Overall, it's a well-executed feature addition with thorough documentation and testing.
[+] Read More
4/5
The pull request addresses a significant issue by updating workflow parsing to correctly use 'id' and 'name', adds validation for configuration errors, and includes comprehensive test cases. The changes improve the robustness of the workflow handling by ensuring unique UUIDs are generated when 'id' is not provided, and marking workflows with configuration errors as invalid. However, while the changes are well-implemented and tested, they are not exceptionally groundbreaking or complex, which is why it does not receive a perfect score.
[+] Read More

Quantify commits



Quantified Commit Activity Over 14 Days

Developer Avatar Branches PRs Commits Files Changes
Kirill Chernakov 3 18/16/2 25 251 20771
Shahar Glazner 1 17/16/2 16 354 12753
Tal 2 27/26/1 29 174 5722
Matvey Kukuy 1 13/11/2 11 42 1757
Jay Kumar 1 9/7/1 7 21 1749
Vladimir Filonov 3 5/5/0 28 25 883
dependabot[bot] 1 2/2/0 2 3 464
Furkan Pehlivan 1 2/1/1 1 1 31
Hayk Davtyan 1 1/1/0 1 1 4
metakotix (mrkito) 0 2/0/1 0 0 0
Vishwanath Martur (vishwamartur) 0 1/0/0 0 0 0
Rajesh Jonnalagadda (rajesh-jonnalagadda) 0 1/0/1 0 0 0

PRs: created by that dev and opened/merged/closed-unmerged during the period

Quantify risks



Project Risk Ratings

Risk Level (1-5) Rationale
Delivery 3 The project demonstrates active management of issues and pull requests, with a high volume of activity. However, the closure rate of issues has decreased over the past year, indicating a potential backlog that could impact delivery timelines. The introduction of significant architectural changes, such as the transition from alert IDs to fingerprints, suggests progress towards achieving project goals but requires careful documentation and testing to ensure smooth integration. Additionally, incomplete pull requests and dependency on external systems pose risks to delivery.
Velocity 3 The project exhibits a high level of development activity with substantial contributions from key developers. This indicates strong velocity but also poses risks of burnout and uneven workload distribution. The recent decrease in issue closure rates suggests potential slowdowns in addressing problems. While the presence of automated dependency updates is beneficial, their low frequency may affect overall velocity. The high volume of changes requires thorough testing and documentation to maintain momentum without accumulating technical debt.
Dependency 4 The project's functionality heavily relies on external systems and integrations, such as AI models and observability tools. This introduces significant dependency risks, as any changes or failures in these dependencies could impact reliability and performance. The presence of recurring bugs related to provider integrations further highlights potential weaknesses in managing dependencies effectively. While efforts are made to update dependencies through automated tools like Dependabot, the infrequency of these updates suggests gaps in keeping dependencies current.
Team 3 The team demonstrates active collaboration with significant contributions from multiple developers. However, the uneven distribution of work among team members poses risks related to potential burnout and bottlenecks if key contributors become unavailable. The lack of comprehensive documentation for some changes could hinder knowledge transfer and team dynamics. Additionally, the high volume of open issues indicates a potential strain on resources, which could affect team morale and effectiveness.
Code Quality 3 The project shows efforts to maintain code quality through modular design and error handling practices. However, the presence of incomplete pull requests and lack of detailed documentation for some changes pose risks to code quality. The use of wildcard imports in critical modules can lead to namespace pollution, affecting maintainability. While logging practices are robust, improvements in dynamic configuration management are needed to enhance code quality further.
Technical Debt 3 The rapid pace of development poses risks of accumulating technical debt if changes are not thoroughly reviewed and tested. Efforts are made to manage technical debt through features like OpenTelemetry instrumentation for performance monitoring. However, the lack of explicit references to testing practices or test cases leaves uncertainty about the comprehensiveness of test coverage, which is crucial for identifying and addressing technical debt early.
Test Coverage 4 While there are indications of good testing practices through detailed logging and exception handling, explicit references to test cases or coverage metrics are lacking. This leaves uncertainty about the comprehensiveness of test coverage, which is critical for catching bugs and regressions early. The recent update to support parallel test execution is a positive step but needs more emphasis on ensuring thorough test coverage across all components.
Error Handling 3 Error handling practices are generally robust, with try-except blocks implemented across critical functions. However, there are areas for improvement in ensuring all exceptions are consistently caught and logged to aid debugging. The lack of dynamic configuration validation poses risks in error handling as misconfigurations might not be caught early. Enhancements in alert management capabilities suggest ongoing efforts to improve error handling but require thorough validation and testing.

Detailed Reports

Report On: Fetch issues



Recent Activity Analysis

The recent activity in the "keephq/keep" GitHub repository shows a dynamic and active development process with a focus on both feature enhancements and bug fixes. Over the past few days, there have been multiple issues created and closed, indicating ongoing maintenance and improvement efforts.

Notable anomalies include several issues related to bugs in provider integrations, such as #2703 concerning WebSocket connection issues and #2700 about exception handling in workflows for PagerDuty. These issues highlight potential areas of instability or complexity within the integration components. Additionally, there are several feature requests like #2704 for using custom LLMs, which suggest an interest in expanding the platform's flexibility and customization options.

A recurring theme among the issues is the enhancement of user experience and interface, as seen in requests for better navigation (#2535) and improved incident management (#2593). There is also a focus on performance optimization, with issues like #2413 aiming to reduce loading times.

Issue Details

Most Recently Created Issues

  • #2708: [πŸ› Bug]: AzureMonitoring can send data with alertRule = NULL

    • Priority: High
    • Status: Closed
    • Created: 0 days ago
    • Updated: 0 days ago
  • #2707: [πŸ‘¨πŸ»β€πŸ’» Internal]: better analytics

    • Priority: Medium
    • Status: Closed
    • Created: 0 days ago
    • Updated: 0 days ago
  • #2704: [βž• Feature]: Using myself LLM

    • Priority: Medium
    • Status: Open
    • Created: 1 day ago

Most Recently Updated Issues

  • #2708: [πŸ› Bug]: AzureMonitoring can send data with alertRule = NULL

    • Priority: High
    • Status: Closed
    • Created: 0 days ago
    • Updated: 0 days ago
  • #2707: [πŸ‘¨πŸ»β€πŸ’» Internal]: better analytics

    • Priority: Medium
    • Status: Closed
    • Created: 0 days ago
    • Updated: 0 days ago
  • #2703: [πŸ› Bug]: Unexpected token '<', "<!DOCTYPE "... is not valid JSON unable to connect websocket on port forwarding and via Go Teleport

    • Priority: High
    • Status: Open
    • Created: 1 day ago

The project continues to evolve with a strong emphasis on addressing bugs promptly while also incorporating new features that enhance its capabilities. The active engagement from contributors and maintainers ensures that the platform remains robust and responsive to user needs.

Report On: Fetch pull requests



Pull Request Analysis

Open Pull Requests

  1. #2705: feat: Proposing new features for AI. Using myself LLM

    • State: Open
    • Created: 1 day ago
    • Labels: Documentation, size:XS
    • Notable Issues:
    • The Contributor License Agreement (CLA) has not been signed by all contributors, which is a blocker for merging.
    • The PR introduces a new environment variable OPENAI_BASE_URL, which needs to be documented and reviewed for potential impacts on deployment configurations.
  2. #2685: fix: workflow id and name usage

    • State: Open
    • Created: 2 days ago
    • Labels: size:M
    • Notable Issues:
    • The PR addresses a significant issue related to workflow parsing and validation, which is crucial for the stability of workflow management.
    • There are no linked issues, which could help in tracking the problem this PR aims to solve.
  3. #2667: chore: improve ticketing modal ux

    • State: Open
    • Created: 3 days ago
    • Labels: UI, size:M
    • Notable Issues:
    • The PR aims to enhance user experience but lacks detailed documentation on what specific UX improvements are being made.
    • It closes issue #2647, indicating it addresses a known problem.
  4. #2662: fix: fix sql to cel conversion

    • State: Open
    • Created: 3 days ago
    • Labels: size:XS
    • Notable Issues:
    • This PR fixes a conversion issue that could impact data processing workflows.
    • It closes issue #2599, suggesting it resolves a previously identified bug.
  5. #2661: fix: improve topology ux

    • State: Open
    • Created: 3 days ago
    • Labels: UI, size:M
    • Notable Issues:
    • The PR focuses on UX improvements but lacks specific details on the changes made.
    • It closes issue #2597, indicating it addresses a known UX concern.
  6. #2651: feat: read system default theme

    • State: Open
    • Created: 4 days ago
    • Labels: Feature, UI, size:M
    • Notable Issues:
    • Introduces a feature to read system default themes, which could enhance user experience by aligning with system preferences.
    • Closes issue #2592, suggesting it fulfills a feature request or enhancement.
  7. #2644: refactor: server side sign in redirects

    • State: Open (Draft)
    • Created: 4 days ago
    • Labels: size:XL
    • Notable Issues:
    • This is a large refactor that could have significant impacts on authentication flows.
    • The draft status indicates it's still under development or review.
  8. #2638: feat: switch all alert queries to use lastalert (dnm)

    • State: Open
    • Created: 5 days ago
    • Labels: Feature, size:XXL
    • Notable Issues:
    • Marked as "Do Not Merge" before another PR (#2473) is resolved.
    • Represents a substantial change in how alerts are queried, which requires careful review and testing.
  9. #2609: [WIP] ci: add nextjs build step to e2e action

    • State: Open (Draft)
    • Created: 6 days ago
    • Labels: size:S, lgtm
    • Notable Issues:
    • Work in progress to improve CI/CD processes by adding Next.js build steps.
    • Needs further development and testing before merging.
  10. #2473: feat: change relation between alerts and incidents to work with fingerprints instead of alert ids

    • State: Open
    • Created: 17 days ago
    • Labels: Feature, size:XXL
    • Notable Issues:
    • This PR introduces a major change in how alerts are associated with incidents, moving from alert IDs to fingerprints.
    • It has received extensive feedback and suggestions for improvement.

Recently Closed Pull Requests

  1. #2709: Fixed alert formatting for AzureMonitoring provider; merged quickly after creation.
  2. #2706: Introduced better analytics; merged quickly after creation.
  3. #2701: Improved exception handling in PagerDuty workflows; merged quickly after creation.
  4. #2699: Fixed an issue with PagerDuty body not being recognized as hash; merged quickly after creation.
  5. #2696: Corrected a typo in README.md; merged quickly after creation.

Notable Closed Without Merge

  • None of the closed PRs were notable for being closed without merging recently.

Summary

The project is actively maintained with numerous open pull requests addressing various aspects such as bug fixes, feature enhancements, and refactoring efforts. Notably, several open PRs focus on improving user experience and addressing critical bugs related to workflows and data processing. The recently closed PRs indicate quick resolution of issues and continuous improvement of the platform's features and documentation.

Report On: Fetch Files For Assessment



Source Code Assessment

1. azuremonitoring_provider.py

  • Structure and Clarity: The file is well-organized, with clear class definitions and methods. The use of constants for mapping Azure Monitor severities and statuses to the internal format enhances readability and maintainability.
  • Functionality: The _format_alert method is central to converting Azure Monitor alerts into the internal format. It extracts essential fields like severity, status, and timestamps efficiently.
  • Documentation: The class and methods are documented, providing a clear understanding of their purpose. However, the docstring for validate_config mentions Prometheus instead of Azure Monitor, which might be a copy-paste error.
  • Error Handling: There is minimal error handling in methods like _format_alert, where assumptions are made about the presence of certain keys in the input dictionary.
  • Overall Quality: High-quality code with minor documentation issues. Consider enhancing error handling for robustness.

2. db.py

  • Structure and Clarity: This file is extensive, covering various database operations. The use of SQLAlchemy ORM is appropriate for managing database interactions.
  • Functionality: The file contains numerous functions for CRUD operations on workflows, alerts, incidents, etc. It also includes utility functions for session management.
  • Documentation: Functions are generally well-documented, explaining their purpose and parameters. However, given the file's length, a module-level overview could help navigate its contents.
  • Error Handling: There is some error handling using try-except blocks, particularly in database operations. However, more granular error handling could improve reliability.
  • Performance Considerations: The use of context managers for session management is good practice. Consider reviewing query performance given the potential complexity of some operations.
  • Overall Quality: Comprehensive and well-organized but could benefit from additional documentation and refined error handling.

3. report_uptime.py

  • Structure and Clarity: The file is concise and focused on reporting uptime to PostHog. It uses asynchronous programming effectively with asyncio.
  • Functionality: The report_uptime_to_posthog function periodically reports uptime metrics, leveraging environment variables for configuration.
  • Documentation: Functions are documented with clear descriptions of their purpose.
  • Error Handling: Minimal error handling; consider adding checks around network requests to PostHog.
  • Overall Quality: High-quality code with clear functionality but could benefit from enhanced error handling.

4. workflows.client.tsx

  • Structure and Clarity: The React component is structured logically, using hooks like useState and useRef effectively.
  • Functionality: Implements client-side logic for managing workflows, including fetching data via SWR and handling file uploads.
  • Documentation: Inline comments explain significant logic changes or features added to the component.
  • Error Handling: Error states are managed using state variables like fileError. Consider more robust error handling for network requests.
  • Overall Quality: Well-written component with clear separation of concerns but could improve in error handling robustness.

5. pagerduty_provider.py

  • Structure and Clarity: This file is large but well-organized into classes and methods that handle different aspects of PagerDuty integration.
  • Functionality: Supports both alerting and incident management through PagerDuty's API, including OAuth2 authentication flow.
  • Documentation: Methods are generally well-documented, though some complex methods could benefit from more detailed explanations.
  • Error Handling: There is some error handling around network requests and token refresh logic. More comprehensive checks could enhance reliability.
  • Security Considerations: Sensitive information like API keys should be handled securely; ensure no logging of sensitive data inadvertently occurs.
  • Overall Quality: High-quality integration code with comprehensive functionality but could improve in documentation detail and security practices.

6. dependencies.py

  • Structure and Clarity: A small utility module primarily focused on extracting request bodies based on content type.
  • Functionality: Provides utility functions for request processing and Pusher client initialization based on environment variables.
  • Documentation: Functions are documented with clear descriptions of their purpose.
  • Error Handling: Basic error handling is present when parsing JSON bodies; consider expanding this to cover other potential errors.
  • Overall Quality: Simple yet effective utility module; consider enhancing error handling.

7. pyproject.toml

  • Structure and Clarity: Clearly structured configuration file specifying project metadata, dependencies, scripts, etc.
  • Functionality: Defines dependencies using Poetry; ensures version constraints are specified appropriately.
  • Documentation/Comments: Generally self-explanatory due to standard TOML format used by Poetry projects.
  • Overall Quality: Well-maintained configuration file; ensure dependencies are regularly updated to mitigate security vulnerabilities.

8. .github/workflows/test-pr.yml

  • Structure and Clarity: YAML workflow file structured clearly with jobs defined for testing purposes using GitHub Actions.
  • Functionality: Sets up environments using Docker services like MySQL and Elasticsearch for testing purposes; leverages caching strategies effectively.
  • Documentation/Comments: Inline comments explain specific steps or configurations within the workflow.
  • Overall Quality: Comprehensive CI/CD pipeline configuration; ensure it aligns with current testing requirements.

9. conftest.py

  • Structure and Clarity: Contains pytest fixtures for setting up test environments using Docker services; well-organized by fixture scope/type.
  • Functionality: Provides fixtures for database sessions, context management, service responsiveness checks (MySQL/Keycloak/Elastic), etc., facilitating comprehensive testing setups.
  • Documentation/Comments: Fixtures are documented succinctly; consider adding more context where fixture logic is complex or non-obvious.
  • Overall Quality: Robust test setup module supporting diverse testing needs; ensure fixtures remain aligned with evolving test requirements.

10. poetry.lock

  • Not directly reviewed as it’s auto-generated by Poetry based on dependencies specified in pyproject.toml.

11. README.md

  • Provides a comprehensive overview of the project’s purpose, features, supported integrations, getting started instructions, etc., catering well to new users/contributors seeking an introduction or guidance on usage/contribution processes.

This assessment highlights strengths across files while identifying areas where improvements could enhance robustness (e.g., error handling) or clarity (e.g., documentation).

Report On: Fetch commits



Repo Commits Analysis

Development Team and Recent Activity

Team Members and Activities

  • Vladimir Filonov

    • Recent work includes fixing alert formatting for AzureMonitoring, parallel test execution, and merging branches. Active in multiple branches with focus on database migrations and alert management.
    • Collaborated with Kirill Chernakov on database-related tasks.
  • Matvey Kukuy

    • Worked on better analytics, AI demo features, multiple bypass keys, and demo mode enhancements. Engaged in improving documentation and handling provider integrations.
    • Collaborated with Tal Borenstein on several features.
  • Tal Borenstein (talboren)

    • Focused on PagerDuty improvements, Slack provider support, incident timeline issues, and various bug fixes. Involved in UI enhancements and API client refactoring.
    • Frequent collaboration with Shahar Glazner and Kirill Chernakov.
  • Hayk Davtyan

    • Made a minor contribution by fixing a typo in the README.
  • Shahar Glazner

    • Contributed to API verbosity, mobile UI features, business hours feature, and various UI improvements. Worked on enhancing the demo experience.
    • Co-authored commits with Tal Borenstein and Matvey Kukuy.
  • Kirill Chernakov (Kiryous)

    • Involved in UI refactoring, improving alert table functionalities, and handling authentication configurations. Worked on server-side sign-in page refactor.
    • Collaborated with Tal Borenstein and Vladimir Filonov.
  • Jay Kumar (35C4n0r)

    • Worked on deduplication page improvements, manual alert enrichment, and service level operations for PagerDuty.
    • Co-authored commits with Tal Borenstein.
  • Furkan Pehlivan (pehlicd)

    • Added retry mechanism to Google Chat provider to handle rate limiting.
  • Dependabot[bot]

    • Managed dependency updates for packages like aiohttp.

Patterns and Themes

  • The team is actively working on improving both backend functionalities (e.g., database migrations, API enhancements) and frontend UI components (e.g., alert tables, mobile UI).
  • There is a strong focus on enhancing integrations with third-party providers such as PagerDuty, Slack, Azure Monitoring, etc.
  • Collaboration is frequent among team members, especially between Tal Borenstein, Shahar Glazner, Kirill Chernakov, and Matvey Kukuy.
  • The project is under active development with a high volume of commits addressing both new features and bug fixes.
  • Documentation updates are ongoing to support new features and integrations.