‹ Reports
The Dispatch

OSS Report: duckdb/pg_duckdb


DuckDB's Integration with PostgreSQL Faces Compatibility and Stability Challenges Amidst Active Development

pg_duckdb, a Postgres extension integrating DuckDB's analytics engine, is seeing active development focused on compatibility with newer Postgres versions and addressing critical bugs. The project aims to enhance high-performance analytics within Postgres environments.

Recent Activity

Recent issues and pull requests (PRs) in the duckdb/pg_duckdb repository indicate a focus on compatibility enhancements and bug fixes. Notable issues include #121 regarding query cancellation failures and #120 about resource leaks causing crashes. These highlight ongoing stability concerns. Enhancement requests for spatial data support (#141) and concurrent access (#134) suggest expanding capabilities.

Development Team and Recent Activities

  1. Jelte Fennema-Nio (JelteF)

    • Added NOTICE file for license attribution.
    • Improved make installcheck efficiency.
    • Implemented automatic DuckDB execution for specific functions.
    • Enhanced EXPLAIN ANALYZE support.
  2. Y. (Y--)

    • Improved error handling in parameterized queries.
    • Developed DuckDB singleton per connection process.
    • Worked on catalog sync features.
  3. mkaruza

    • Added read functions with optional arguments.
    • Improved caching for HTTPFS extensions.
  4. Heikki Linnakangas (hlinnaka)

    • Worked on PostgreSQL 17 compatibility.
  5. Thijs (Tishj)

    • Fixed type mappings and enhanced CI workflows.
  6. Japin Li (japinli)

    • Fixed variable name typo.
  7. Jonathan Dance (wuputah)

    • Improved documentation and contribution guidelines.
  8. Mark (Mytherin)

    • Changed license to MIT.

Of Note

Quantified Reports

Quantify Issues



Recent GitHub Issues Activity

Timespan Opened Closed Comments Labeled Milestones
7 Days 7 7 11 5 1
30 Days 26 12 43 14 2
90 Days 41 23 68 27 2
All Time 60 33 - - -

Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.

Quantify commits



Quantified Commit Activity Over 30 Days

Developer Avatar Branches PRs Commits Files Changes
Jelte Fennema-Nio 7 22/18/1 38 122 33335
mkaruza 2 5/4/0 5 88 7906
Y. 6 8/8/0 26 51 6851
Thijs 3 4/3/0 22 28 2291
Jonathan Dance (JD) 1 5/5/0 5 3 313
Mark 1 1/1/0 1 1 203
Heikki Linnakangas 1 1/1/0 1 6 17
Japin Li 1 1/1/0 1 1 4
liu shengsong (lss602726449) 0 1/0/0 0 0 0
None (brettgriffin-md) 0 1/0/0 0 0 0

PRs: created by that dev and opened/merged/closed-unmerged during the period

Detailed Reports

Report On: Fetch issues



Recent Activity Analysis

The recent activity in the duckdb/pg_duckdb GitHub repository shows a total of 27 open issues, with notable discussions surrounding compatibility enhancements and bug fixes. A significant focus has emerged on ensuring compatibility with newer versions of Postgres, particularly Postgres 15 and 16, as well as addressing critical bugs related to query execution and resource management. The presence of multiple enhancement requests indicates an active interest in expanding the extension's capabilities, particularly in spatial data support and concurrent access.

Several issues highlight urgent concerns, such as #121 regarding query cancellation not functioning correctly and #120 about resource leaks leading to crashes. These issues suggest that while the project is evolving, there are underlying stability and performance concerns that need to be addressed promptly. Additionally, the discussions reveal a theme of community engagement, with contributors actively seeking to resolve issues through pull requests and collaborative dialogue.

Issue Details

Recently Created Issues

  1. Issue #150: Add compatibility with Postgres 16 in Windows x86-64

    • Priority: Enhancement
    • Status: Open
    • Created: 1 day ago
  2. Issue #145: Add compatibility with Postgres 15

    • Priority: Enhancement
    • Status: Open
    • Created: 5 days ago
    • Updated: 4 days ago
  3. Issue #141: Spatial extension support

    • Priority: Enhancement
    • Status: Open
    • Created: 5 days ago
    • Updated: 4 days ago
  4. Issue #134: Does this extension support concurrent access to DuckDB from multiple connections?

    • Priority: Question
    • Status: Open
    • Created: 8 days ago
    • Updated: 7 days ago
  5. Issue #121: Query cancellation not working

    • Priority: Bug
    • Status: Open
    • Created: 12 days ago
    • Updated: 4 days ago

Recently Updated Issues

  1. Issue #120: Resource leak on error, eventual crash on out of memory

    • Priority: Bug
    • Status: Open
    • Created: 12 days ago
    • Updated: 4 days ago
  2. Issue #118: Error with extended query protocol

    • Priority: Bug
    • Status: Open
    • Created: 12 days ago
    • Updated: 7 days ago
  3. Issue #116: Case insensitivity mismatch between DuckDB and Postgres can cause issues

    • Priority: Bug, Long-term
    • Status: Open
    • Created: 13 days ago
    • Updated: 1 day ago
  4. Issue #113: Use separated libduckdb.so rpm / deb

    • Priority: Enhancement
    • Status: Open
    • Created: 13 days ago
    • Updated: 1 day ago
  5. Issue #106: Misleading error when using read_xxx functions with invalid URLs

    • Priority: Enhancement
    • Status: Open
    • Created: 18 days ago
    • Updated: 7 days ago

This selection of issues illustrates both ongoing development efforts and critical areas requiring attention, particularly around compatibility and stability as the project integrates more deeply with Postgres functionalities.

Report On: Fetch pull requests



Overview

The analysis of the pull requests (PRs) for the duckdb/pg_duckdb repository reveals a total of 12 open PRs, with significant ongoing work aimed at enhancing the integration of DuckDB with PostgreSQL. The changes span various aspects, including architectural improvements, testing frameworks, and feature additions that support advanced query capabilities.

Summary of Pull Requests

Open Pull Requests

  • PR #147: Make DuckDB connection a singleton to support extended protocol
    This PR introduces a singleton pattern for DuckDB connections per PostgreSQL connection, enabling support for the extended protocol. It addresses issues related to query execution consistency and error handling but raises concerns regarding potential state issues when errors occur.

  • PR #143: Add pytest-based testing framework
    This PR adds a Python-based testing framework using pytest, which allows more flexible application-style testing compared to the existing pgregress. It aims to enhance test coverage for features like EXPLAIN ANALYZE.

  • PR #142: Add support for using http to connect S3
    This PR proposes to allow HTTP connections to S3, expanding the flexibility of data access methods. Review comments suggest enhancements in default settings for SSL usage.

  • PR #140: Vendor in Postgres ruleutils and handle public.read_csv
    This PR vendors in Postgres' ruleutils.c, enabling better query generation for DuckDB compatibility. It aims to address search path issues and improve query handling.

  • PR #128: feat: update readme
    A minor update to the README file, likely improving documentation clarity.

  • PR #97: Add the PostgresStorageExtension to DuckDB
    This PR introduces a new extension that is still in early development stages but aims to enhance storage capabilities within DuckDB.

  • PR #88: Local object cache for cached_httpfs extension
    This PR implements caching mechanisms for remote objects accessed via HTTPFS, though it raises concerns about concurrency and locking behavior during caching operations.

  • PR #75: Minimal create table support
    A work-in-progress PR that lays the groundwork for creating tables in DuckDB, indicating ongoing efforts to expand functionality.

  • PR #71: WIP IndexOnlyScan
    A draft PR focused on implementing an index-only scan feature, showcasing ongoing performance optimization efforts.

  • PR #63: Parallel Scan with multiple processes
    Another draft PR aimed at enhancing performance through parallel processing capabilities in scanning operations.

  • PR #57: SHLIB_LINK: prefer new libs to old libs
    A minor fix addressing linker errors when older versions of libraries are present during builds.

  • PR #53: Docker image
    This draft PR aims to create a Docker image for pg_duckdb, facilitating easier deployment and testing environments.

Closed Pull Requests

A total of 78 closed PRs indicate a robust development cycle with many successful merges addressing various issues, enhancements, and features. Notable closed PRs include those that added essential functionalities like EXPLAIN ANALYZE support (#136), filtering with VARCHAR (#137), and improvements in installation processes (#139).

Analysis of Pull Requests

The current landscape of open pull requests indicates a strong focus on enhancing the integration between DuckDB and PostgreSQL. The introduction of singleton connections (PR #147) is particularly significant as it addresses performance and consistency issues that arise from managing multiple connections. This architectural change not only improves efficiency but also aligns with modern best practices in resource management within database systems.

The addition of a pytest-based testing framework (PR #143) represents a strategic move towards improving code quality and reliability. By allowing more sophisticated application-level tests, this framework can help identify edge cases and ensure that new features do not introduce regressions. The emphasis on testing is further supported by other PRs focusing on specific functionalities such as EXPLAIN ANALYZE and filtering with VARCHAR.

Another notable theme is the ongoing effort to enhance data access capabilities through improved support for cloud storage solutions (e.g., HTTP connections to S3 in PR #142). This aligns with the project's goal of providing seamless integration with modern data architectures, allowing users to leverage cloud storage without complex ETL processes.

However, there are also potential concerns highlighted in several open PRs. For instance, the singleton connection approach may lead to issues if not managed correctly, particularly regarding error states (as noted in PR #147). Similarly, the caching mechanism proposed in PR #88 raises questions about concurrency control during simultaneous access attempts. These points underscore the importance of rigorous testing and validation as new features are integrated into the system.

The presence of multiple draft PRs focused on performance optimizations (e.g., parallel scans and index-only scans) indicates an active pursuit of efficiency improvements. These efforts are crucial as they directly impact user experience and system responsiveness, particularly in data-intensive applications where performance is paramount.

In conclusion, the current set of pull requests reflects a dynamic development environment focused on enhancing functionality, improving performance, and ensuring reliability within the pg_duckdb project. The community's engagement through discussions and reviews further strengthens this collaborative effort towards building a robust PostgreSQL extension that leverages DuckDB's powerful analytics capabilities.

Report On: Fetch commits



Repo Commits Analysis

Development Team and Recent Activity

Team Members and Recent Activities

  1. Jelte Fennema-Nio (JelteF)

    • Recent Commits: 38 commits across 7 branches.
    • Notable Contributions:
    • Added a NOTICE file for PostgreSQL license attribution.
    • Improved make installcheck to utilize the -j flag, enhancing build efficiency.
    • Implemented automatic DuckDB execution for DuckDB-only functions, increasing usability.
    • Enhanced support for EXPLAIN ANALYZE and added tests for various features.
    • Collaborated with Y. on multiple features, including extended protocol support and catalog synchronization.
  2. Y. (Y--)

    • Recent Commits: 26 commits across 6 branches.
    • Notable Contributions:
    • Added support for filtering with varchar and improved error handling in parameterized queries.
    • Worked on making DuckDB a singleton per connection process and implemented catalog sync background worker features.
    • Collaborated with JelteF on several enhancements, including CTAS support using DuckDB with PostgreSQL tables.
  3. mkaruza

    • Recent Commits: 5 commits across 2 branches.
    • Notable Contributions:
    • Contributed to the addition of read functions supporting optional arguments and improved caching mechanisms for HTTPFS extensions.
  4. Heikki Linnakangas (hlinnaka)

    • Recent Commits: 1 commit related to PostgreSQL 17 compatibility.
  5. Thijs (Tishj)

    • Recent Commits: 22 commits across 3 branches.
    • Notable Contributions:
    • Made various improvements to the codebase, including fixing type mappings and enhancing CI workflows.
  6. Japin Li (japinli)

    • Recent Commits: 1 commit fixing a variable name typo.
  7. Jonathan Dance (wuputah)

    • Recent Commits: 5 commits focused on documentation improvements and contribution guidelines.
  8. Mark (Mytherin)

    • Recent Commits: 1 commit changing the license to MIT.

Patterns, Themes, and Conclusions

  • Active Collaboration: The team exhibits strong collaboration, particularly between JelteF and Y., who frequently work together on features that enhance the integration of DuckDB with PostgreSQL.
  • Focus on Usability Enhancements: Recent activities highlight a concerted effort to improve user experience through better error handling, automatic feature enablement, and performance optimizations in build processes.
  • Testing and Quality Assurance: There is a significant emphasis on adding tests for new features, ensuring robustness in the integration of DuckDB functionalities within PostgreSQL.
  • Continuous Improvement of Documentation: Team members are actively updating documentation to reflect changes in functionality and installation procedures, which is crucial for community engagement and contribution.
  • Diverse Feature Set Development: The project is expanding its capabilities with features like catalog synchronization, support for additional data types, and enhanced query processing, indicating a forward-looking roadmap aimed at improving analytics performance.

Overall, the development team is actively engaged in enhancing the pg_duckdb project through collaborative efforts that focus on usability, performance, testing, and documentation improvements.