The "keephq/keep" project is an open-source platform for alert management and AIOps, designed to streamline incident management through features like alert deduplication, correlation, and integration with various monitoring tools. It is actively maintained with a robust community interest, evidenced by over 5,927 stars and 714 forks. The project is in a state of active development with frequent updates and enhancements, focusing on both backend improvements and user interface enhancements.
Vladimir Filonov
Matvey Kukuy
Tal Borenstein
Shahar Glazner
Kirill Chernakov
Jay Kumar
Furkan Pehlivan
Hayk Davtyan
Dependabot[bot]
Timespan | Opened | Closed | Comments | Labeled | Milestones |
---|---|---|---|---|---|
7 Days | 68 | 53 | 22 | 4 | 1 |
30 Days | 192 | 139 | 36 | 8 | 1 |
90 Days | 309 | 198 | 185 | 40 | 1 |
1 Year | 366 | 198 | 293 | 66 | 1 |
All Time | 984 | 815 | - | - | - |
Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.
Developer | Avatar | Branches | PRs | Commits | Files | Changes |
---|---|---|---|---|---|---|
Kirill Chernakov | 3 | 18/16/2 | 25 | 251 | 20771 | |
Shahar Glazner | 1 | 17/16/2 | 16 | 354 | 12753 | |
Tal | 2 | 27/26/1 | 29 | 174 | 5722 | |
Matvey Kukuy | 1 | 13/11/2 | 11 | 42 | 1757 | |
Jay Kumar | 1 | 9/7/1 | 7 | 21 | 1749 | |
Vladimir Filonov | 3 | 5/5/0 | 28 | 25 | 883 | |
dependabot[bot] | 1 | 2/2/0 | 2 | 3 | 464 | |
Furkan Pehlivan | 1 | 2/1/1 | 1 | 1 | 31 | |
Hayk Davtyan | 1 | 1/1/0 | 1 | 1 | 4 | |
metakotix (mrkito) | 0 | 2/0/1 | 0 | 0 | 0 | |
Vishwanath Martur (vishwamartur) | 0 | 1/0/0 | 0 | 0 | 0 | |
Rajesh Jonnalagadda (rajesh-jonnalagadda) | 0 | 1/0/1 | 0 | 0 | 0 |
PRs: created by that dev and opened/merged/closed-unmerged during the period
Risk | Level (1-5) | Rationale |
---|---|---|
Delivery | 3 | The project demonstrates active management of issues and pull requests, with a high volume of activity. However, the closure rate of issues has decreased over the past year, indicating a potential backlog that could impact delivery timelines. The introduction of significant architectural changes, such as the transition from alert IDs to fingerprints, suggests progress towards achieving project goals but requires careful documentation and testing to ensure smooth integration. Additionally, incomplete pull requests and dependency on external systems pose risks to delivery. |
Velocity | 3 | The project exhibits a high level of development activity with substantial contributions from key developers. This indicates strong velocity but also poses risks of burnout and uneven workload distribution. The recent decrease in issue closure rates suggests potential slowdowns in addressing problems. While the presence of automated dependency updates is beneficial, their low frequency may affect overall velocity. The high volume of changes requires thorough testing and documentation to maintain momentum without accumulating technical debt. |
Dependency | 4 | The project's functionality heavily relies on external systems and integrations, such as AI models and observability tools. This introduces significant dependency risks, as any changes or failures in these dependencies could impact reliability and performance. The presence of recurring bugs related to provider integrations further highlights potential weaknesses in managing dependencies effectively. While efforts are made to update dependencies through automated tools like Dependabot, the infrequency of these updates suggests gaps in keeping dependencies current. |
Team | 3 | The team demonstrates active collaboration with significant contributions from multiple developers. However, the uneven distribution of work among team members poses risks related to potential burnout and bottlenecks if key contributors become unavailable. The lack of comprehensive documentation for some changes could hinder knowledge transfer and team dynamics. Additionally, the high volume of open issues indicates a potential strain on resources, which could affect team morale and effectiveness. |
Code Quality | 3 | The project shows efforts to maintain code quality through modular design and error handling practices. However, the presence of incomplete pull requests and lack of detailed documentation for some changes pose risks to code quality. The use of wildcard imports in critical modules can lead to namespace pollution, affecting maintainability. While logging practices are robust, improvements in dynamic configuration management are needed to enhance code quality further. |
Technical Debt | 3 | The rapid pace of development poses risks of accumulating technical debt if changes are not thoroughly reviewed and tested. Efforts are made to manage technical debt through features like OpenTelemetry instrumentation for performance monitoring. However, the lack of explicit references to testing practices or test cases leaves uncertainty about the comprehensiveness of test coverage, which is crucial for identifying and addressing technical debt early. |
Test Coverage | 4 | While there are indications of good testing practices through detailed logging and exception handling, explicit references to test cases or coverage metrics are lacking. This leaves uncertainty about the comprehensiveness of test coverage, which is critical for catching bugs and regressions early. The recent update to support parallel test execution is a positive step but needs more emphasis on ensuring thorough test coverage across all components. |
Error Handling | 3 | Error handling practices are generally robust, with try-except blocks implemented across critical functions. However, there are areas for improvement in ensuring all exceptions are consistently caught and logged to aid debugging. The lack of dynamic configuration validation poses risks in error handling as misconfigurations might not be caught early. Enhancements in alert management capabilities suggest ongoing efforts to improve error handling but require thorough validation and testing. |
The recent activity in the "keephq/keep" GitHub repository shows a dynamic and active development process with a focus on both feature enhancements and bug fixes. Over the past few days, there have been multiple issues created and closed, indicating ongoing maintenance and improvement efforts.
Notable anomalies include several issues related to bugs in provider integrations, such as #2703 concerning WebSocket connection issues and #2700 about exception handling in workflows for PagerDuty. These issues highlight potential areas of instability or complexity within the integration components. Additionally, there are several feature requests like #2704 for using custom LLMs, which suggest an interest in expanding the platform's flexibility and customization options.
A recurring theme among the issues is the enhancement of user experience and interface, as seen in requests for better navigation (#2535) and improved incident management (#2593). There is also a focus on performance optimization, with issues like #2413 aiming to reduce loading times.
#2708: [π Bug]: AzureMonitoring can send data with alertRule = NULL
#2707: [π¨π»βπ» Internal]: better analytics
#2704: [β Feature]: Using myself LLM
#2708: [π Bug]: AzureMonitoring can send data with alertRule = NULL
#2707: [π¨π»βπ» Internal]: better analytics
#2703: [π Bug]: Unexpected token '<', "<!DOCTYPE "... is not valid JSON unable to connect websocket on port forwarding and via Go Teleport
The project continues to evolve with a strong emphasis on addressing bugs promptly while also incorporating new features that enhance its capabilities. The active engagement from contributors and maintainers ensures that the platform remains robust and responsive to user needs.
#2705: feat: Proposing new features for AI. Using myself LLM
OPENAI_BASE_URL
, which needs to be documented and reviewed for potential impacts on deployment configurations.#2685: fix: workflow id and name usage
#2667: chore: improve ticketing modal ux
#2662: fix: fix sql to cel conversion
#2661: fix: improve topology ux
#2651: feat: read system default theme
#2644: refactor: server side sign in redirects
#2638: feat: switch all alert queries to use lastalert (dnm)
#2609: [WIP] ci: add nextjs build step to e2e action
#2473: feat: change relation between alerts and incidents to work with fingerprints instead of alert ids
The project is actively maintained with numerous open pull requests addressing various aspects such as bug fixes, feature enhancements, and refactoring efforts. Notably, several open PRs focus on improving user experience and addressing critical bugs related to workflows and data processing. The recently closed PRs indicate quick resolution of issues and continuous improvement of the platform's features and documentation.
azuremonitoring_provider.py
_format_alert
method is central to converting Azure Monitor alerts into the internal format. It extracts essential fields like severity, status, and timestamps efficiently.validate_config
mentions Prometheus instead of Azure Monitor, which might be a copy-paste error._format_alert
, where assumptions are made about the presence of certain keys in the input dictionary.db.py
report_uptime.py
asyncio
.report_uptime_to_posthog
function periodically reports uptime metrics, leveraging environment variables for configuration.workflows.client.tsx
useState
and useRef
effectively.fileError
. Consider more robust error handling for network requests.pagerduty_provider.py
dependencies.py
pyproject.toml
.github/workflows/test-pr.yml
conftest.py
poetry.lock
pyproject.toml
.README.md
This assessment highlights strengths across files while identifying areas where improvements could enhance robustness (e.g., error handling) or clarity (e.g., documentation).
Vladimir Filonov
Matvey Kukuy
Tal Borenstein (talboren)
Hayk Davtyan
Shahar Glazner
Kirill Chernakov (Kiryous)
Jay Kumar (35C4n0r)
Furkan Pehlivan (pehlicd)
Dependabot[bot]