‹ Reports
The Dispatch

OSS Report: microsoft/qlib


Qlib Development Faces Challenges with Data Handling and Model Integration

Qlib, an open-source AI-oriented quantitative investment platform by Microsoft, continues to evolve with active user engagement and development efforts. However, recent activities highlight ongoing challenges in data handling and model integration, as evidenced by numerous user-reported issues.

Recent Activity

Recent issues and pull requests (PRs) indicate a focus on resolving data normalization errors and improving model integration. Users have reported complications with downloading datasets from Yahoo Finance and performance issues with high-frequency trading data. The development team has been addressing these concerns through various bug fixes and enhancements.

Team Members and Recent Activity

  1. you-n-g

    • 0 days ago: Updated README.md with data examples.
    • 21 days ago: Contributed to LLM-driven Auto Quant Factory feature.
    • 56 days ago: Worked on nested data loader.
  2. Linlang (SunsetWolf)

    • 16 days ago: Fixed image display issues in README.md.
    • 56 days ago: Worked on nested data loader.
  3. Young (afe.young@gmail.com)

    • 51 days ago: Updated models for dataset alignment.
  4. shenguanjiejie

    • 0 days ago: Updated README.md with data examples.

The development team is actively collaborating on documentation improvements and feature enhancements, particularly focusing on user experience and onboarding.

Of Note

Quantified Reports

Quantify Issues



Recent GitHub Issues Activity

Timespan Opened Closed Comments Labeled Milestones
7 Days 2 0 2 0 1
30 Days 5 2 2 1 1
90 Days 26 7 20 2 1
1 Year 144 94 176 8 1
All Time 904 690 - - -

Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.

Quantify commits



Quantified Commit Activity Over 30 Days

Developer Avatar Branches PRs Commits Files Changes
Linlang 2 1/1/0 2 2 36
Another 1 1/1/0 1 1 6
you-n-g 1 1/1/0 2 1 3
None (Finorita) 0 1/0/1 0 0 0
Juanxi Tian (tianshijing) 0 0/0/1 0 0 0

PRs: created by that dev and opened/merged/closed-unmerged during the period

Detailed Reports

Report On: Fetch issues



Recent Activity Analysis

The Qlib project has seen a significant amount of recent activity, with 214 open issues currently logged. This includes a mix of bugs, questions, and enhancement requests, indicating ongoing user engagement and development efforts. Notably, there are recurring themes around data handling, model training issues, and feature requests for improved functionality.

Several issues exhibit anomalies or complications; for instance, there are multiple reports of errors related to data normalization and downloading datasets, particularly from Yahoo Finance. Additionally, users have raised concerns about the performance of various models and the handling of high-frequency trading data. The presence of numerous questions regarding model integration and custom data suggests that users are actively trying to adapt Qlib to their specific needs but are encountering challenges.

Issue Details

Below are some of the most recently created and updated issues:

  1. Issue #1845: Position和BasePosition代码的一点小瑕疵,不是bug!

    • Priority: Low
    • Status: Open
    • Created: 0 days ago
    • Updated: N/A
  2. Issue #1844: ModuleNotFoundError: No module named 'numpy._core'

    • Priority: High
    • Status: Open
    • Created: 7 days ago
    • Updated: 1 day ago
  3. Issue #1843: AttributeError: 'LocalDatasetProvider' object has no attribute '_dataset_uri'

    • Priority: Medium
    • Status: Open
    • Created: 11 days ago
    • Updated: N/A
  4. Issue #1828: ModuleNotFoundError: No module named raise when make a change in code

    • Priority: Medium
    • Status: Open
    • Created: 45 days ago
    • Updated: 8 days ago
  5. Issue #1826: kernels get killed OOM when running 1min data REG_CN

    • Priority: High
    • Status: Open
    • Created: 51 days ago
    • Updated: 8 days ago
  6. Issue #1818: dump_bin DumpDataUpdate mode append data error

    • Priority: Medium
    • Status: Open
    • Created: 64 days ago
    • Updated: 8 days ago
  7. Issue #1815: Deprecated class Text of module typing

    • Priority: Low
    • Status: Open
    • Created: 67 days ago
    • Updated: 8 days ago
  8. Issue #1525: Apple M1 not supported

    • Priority: High
    • Status: Open
    • Created: 459 days ago
    • Updated: 14 days ago

Important Themes

  • There is a clear focus on resolving issues related to data handling and model integration.
  • Users are experiencing challenges with high-frequency data processing and normalization.
  • The need for better documentation and examples is frequently mentioned, particularly regarding custom implementations and advanced features.
  • Many users are seeking assistance with error messages related to model training and backtesting processes.

This analysis highlights the active engagement within the Qlib community, as well as the ongoing challenges faced by users in adapting the platform to their specific use cases.

Report On: Fetch pull requests



Overview

The analysis of the pull requests (PRs) for the Qlib project reveals a total of 25 open PRs, with a focus on bug fixes, enhancements, and documentation updates. The PRs cover a range of topics including security updates, new features, and improvements to existing functionalities.

Summary of Pull Requests

  1. PR #1829: Update urllib3 to fix security issue

    • State: Open
    • Created: 44 days ago
    • Description: Updates the urllib3 dependency to address a security vulnerability.
    • Significance: Critical for maintaining security standards in the project.
  2. PR #1817: add dockerfile

    • State: Open
    • Created: 65 days ago
    • Description: Introduces a Dockerfile for easier deployment.
    • Significance: Enhances usability by allowing users to run the application in a containerized environment.
  3. PR #1790: fixing issue 1780

    • State: Open
    • Created: 101 days ago
    • Description: Fixes a specific issue related to model loss functions.
    • Significance: Addresses a user-reported bug that could affect model performance.
  4. PR #1677: Fix the empty price_s case and self.instruments in SBBStrategyEMA

    • State: Open
    • Created: 314 days ago
    • Description: Resolves issues with empty data handling in trading strategies.
    • Significance: Improves robustness of trading strategies against data inconsistencies.
  5. PR #1673: Improve pit performance

    • State: Open
    • Created: 316 days ago
    • Description: Enhances performance of data access methods in the pit data structure.
    • Significance: Aims to optimize data retrieval processes, crucial for high-frequency trading applications.
  6. PR #1666: fixamount

    • State: Open
    • Created: 322 days ago
    • Description: Adjusts logic in position management during trades.
    • Significance: Ensures accurate handling of stock amounts during transactions.
  7. PR #1661: fix duplicate log

    • State: Open
    • Created: 331 days ago
    • Description: Fixes an issue with duplicate logging messages.
    • Significance: Enhances logging clarity and reduces noise in logs.
  8. PR #1617: Bump cryptography from 36.0.1 to 41.0.3

    • State: Open
    • Created: 395 days ago
    • Description: Updates cryptography library for security improvements.
    • Significance: Important for maintaining secure dependencies.
  9. PR #1614: Bump certifi from 2021.10.8 to 2023.7.22

    • State: Open
    • Created: 402 days ago
    • Description: Updates certifi library for improved SSL certificate handling.
    • Significance: Enhances security related to HTTPS requests.
  10. PR #1587: Add algorithm trading example

    • State: Open
    • Created: 420 days ago
    • Description: Introduces an example for algorithmic trading using Qlib.
    • Significance: Provides practical usage examples for users.

Analysis of Pull Requests

The open pull requests for Qlib reflect several key themes and areas of focus within the project:

Security and Dependency Management

A significant number of PRs are dedicated to updating dependencies such as urllib3, cryptography, and certifi. These updates are crucial for maintaining the security integrity of the software, especially given the increasing scrutiny on open-source projects regarding their vulnerability management practices. For instance, PRs like #1829 and #1617 directly address known vulnerabilities, which is essential for user trust and compliance with best practices in software development.

Feature Enhancements

Several PRs aim to enhance functionality or introduce new features, such as the addition of a Dockerfile (#1817) and algorithm trading examples (#1587). These enhancements not only improve usability but also broaden the scope of what can be achieved with Qlib, making it more appealing to potential users and contributors.

Bug Fixes and Stability Improvements

Bug fixes are a recurring theme across many PRs, including those addressing specific issues like duplicate logging (#1661) or empty data handling (#1677). This focus on stability is critical as it ensures that users can rely on Qlib for consistent performance in their quantitative trading strategies.

Documentation Updates

Documentation-related PRs are also prevalent, indicating an ongoing effort to improve user guidance and support materials. For example, PRs such as #1810 and #1751 focus on correcting typos or enhancing installation instructions, which are vital for user onboarding and reducing barriers to entry.

Community Engagement

The presence of comments from community members suggests an active engagement process where contributors are encouraged to discuss changes openly. This collaborative atmosphere is beneficial for fostering innovation and ensuring that multiple perspectives are considered when implementing changes.

Anomalies and Concerns

Despite the positive trends, there are notable concerns regarding PRs that have remained open for extended periods without merges or activity (e.g., PR #1673). This could indicate potential bottlenecks in the review process or resource allocation issues within the project team. Addressing these concerns promptly is essential to maintain momentum and community interest.

In summary, while Qlib's pull requests demonstrate a healthy level of activity focused on security, feature enhancement, bug fixing, and documentation improvement, attention should be given to expediting reviews and merges to sustain community engagement and project growth.

Report On: Fetch commits



Development Team and Recent Activity

Team Members and Recent Activity

1. Linlang (SunsetWolf)

  • Recent Commits:
    • 16 days ago: Fixed image display issues in README.md and pytorch_hist.py.
    • 56 days ago: Worked on nested data loader, including tests and fixing errors.
  • Collaborations: Co-authored several commits with you-n-g and others.

2. you-n-g

  • Recent Commits:
    • 0 days ago: Updated README.md with data examples.
    • 21 days ago: Contributed to the LLM-driven Auto Quant Factory feature.
    • 56 days ago: Collaborated on the nested data loader.
  • Collaborations: Frequently co-authored with Linlang and Young.

3. Young (afe.young@gmail.com)

  • Recent Commits:
    • 51 days ago: Made significant updates to the model for both datasets, including aligning with previous results and initializing models.
  • Collaborations: Worked closely with you-n-g on multiple features.

4. shenguanjiejie

  • Recent Commits:
    • 0 days ago: Updated README.md with data examples.
  • Collaborations: Minimal activity but involved in recent updates.

5. Finorita & tianshijing

  • No recent commits or activities reported.

Patterns, Themes, and Conclusions

  • The team is actively working on documentation improvements, particularly in the README files, which indicates a focus on enhancing user experience and onboarding.
  • Significant collaborative efforts are evident, especially between you-n-g, Linlang, and Young, suggesting a strong team dynamic in tackling complex features like the LLM-driven Auto Quant Factory and nested data loaders.
  • Recent activities show a blend of bug fixes and feature enhancements, indicating a balanced approach to maintaining code quality while introducing new functionalities.
  • The frequency of commits from Linlang and you-n-g highlights their central roles in ongoing development efforts, particularly in addressing bugs and implementing new features.
  • The lack of activity from some team members (Finorita & tianshijing) may suggest varying levels of engagement or focus within the team.

Overall, the development team demonstrates a proactive approach to both feature development and maintenance, ensuring that the Qlib platform remains robust and user-friendly.