‹ Reports
The Dispatch

OSS Report: mendableai/firecrawl


Firecrawl Development Stagnates Amidst High Activity in Issue Reporting

Firecrawl, an API service by Mendable.ai designed for web crawling and data extraction, has seen a surge in issue reporting and user engagement despite a lack of recent commits or pull requests in the last 30 days.

Recent Activity

The project currently has 68 open issues, with several new ones created in the past few days. These issues primarily focus on bugs related to encoding errors (#547) and scraping failures (#540), as well as feature requests like automatic retries for failed requests (#518). The high volume of issue reporting suggests a growing user base encountering challenges with the tool's current capabilities.

The development team, consisting of members like Nicolas (nickscamara) and Gergő Móricz (mogery), has not made any new commits recently. Their previous work involved significant contributions to API controllers and services, focusing on functionality improvements and bug fixes. The absence of recent activity may indicate a pause in development or a shift in focus to addressing existing issues.

Of Note

  1. Encoding Challenges: Persistent encoding issues with non-English websites highlight potential limitations in Firecrawl's global usability.
  2. High User Engagement: The influx of issue reports reflects active user involvement, suggesting a need for enhanced support and documentation.
  3. Stalled Development: No recent commits or PRs suggest a temporary halt in development efforts, raising questions about project priorities.
  4. Feature Requests: Users are actively requesting new features, such as better handling of JavaScript-rendered pages (#543), indicating demand for more robust capabilities.
  5. Community Contributions: Despite the lack of recent commits, the community remains engaged through discussions and feedback on GitHub issues and PRs.

Quantified Reports

Quantify commits



Quantified Commit Activity Over 30 Days

Developer Avatar Branches PRs Commits Files Changes
Eric Ciarla 2 1/1/0 7 65 55526
Gergő Móricz 2 3/4/0 73 66 14815
Nicolas 7 12/12/0 80 73 6794
Rafael Miller 7 17/11/1 42 116 5809
Kent (Chia-Hao), Hsu 1 2/2/0 3 11 1267
Kevin Swiber 1 2/1/0 1 1 38
Thomas Kosmas 1 0/0/0 2 1 33
Quan Ming 1 1/1/0 3 2 10
tak-s 1 1/1/0 2 2 9
Yuki Matsukura 1 2/1/0 1 1 1
Alfred Nutile (alnutile) 0 1/0/0 0 0 0
Matt Joyce (mattjoyce) 0 0/0/1 0 0 0
Cherilyn Buren (NiuBlibing) 0 0/1/0 0 0 0
darker (Sanix-Darker) 0 0/0/1 0 0 0
Jakob Stadlhuber (JakobStadlhuber) 0 2/1/1 0 0 0
None (dependabot[bot]) 0 18/0/18 0 0 0

PRs: created by that dev and opened/merged/closed-unmerged during the period

Quantify Issues



Recent GitHub Issues Activity

Timespan Opened Closed Comments Labeled Milestones
7 Days 14 5 20 1 1
30 Days 49 36 96 7 1
90 Days 183 143 414 27 1
All Time 251 183 - - -

Like all software activity quantification, these numbers are imperfect but sometimes useful. Comments, Labels, and Milestones refer to those issues opened in the timespan in question.

Detailed Reports

Report On: Fetch issues



Recent Activity Analysis

The Firecrawl project has seen a significant amount of recent activity, with 68 open issues currently logged. Notably, several issues have been created or updated within the last few days, indicating an active engagement from both users and developers. Common themes among these issues include bugs related to encoding and scraping failures, questions about functionality, and feature requests aimed at enhancing the tool's capabilities.

Several issues stand out due to their urgency or complexity. For instance, Issue #540 regarding the failure to scrape content from a specific URL has been marked as high-priority, highlighting the need for immediate attention. Additionally, there are recurring reports of encoding problems, particularly with non-English websites (e.g., Issue #547), which could affect the tool's usability in diverse contexts.

Another theme is the discussion around improving user experience by adding features such as automatic retries for failed requests (Issue #518) and better handling of JavaScript-rendered pages (Issue #543). The presence of multiple questions about functionality also suggests that users may require more guidance on how to effectively utilize Firecrawl's features.

Issue Details

Most Recently Created Issues

  1. Issue #548: [BUG] Getting 408 when trying to run firecrawl locally

    • Priority: Bug
    • Status: Open
    • Created: 0 days ago
    • Description: User encounters a 408 error when attempting to run Firecrawl locally.
  2. Issue #547: [BUG] The encoding is not correct for some Chinese language sites

    • Priority: Bug
    • Status: Open
    • Created: 0 days ago
    • Description: Incorrect encoding reported for specific Chinese language URLs.
  3. Issue #546: [Question] Do you support crawling pages requires login?

    • Priority: Question
    • Status: Open
    • Created: 1 day ago
    • Description: Inquiry about the ability to crawl authenticated pages.
  4. Issue #545: [BUG] Doesn't work on /scrape

    • Priority: Bug
    • Status: Open
    • Created: 1 day ago
    • Description: A specific URL fails to return expected data during scraping.
  5. Issue #544: [Feat] Send "cancel" to fire-engine on timeout

    • Priority: Feature
    • Status: Open
    • Created: 2 days ago
    • Description: Suggestion to improve queue management by sending cancel requests on timeouts.

Most Recently Updated Issues

  1. Issue #540: [BUG] https://www.solvhealth.com/privacy Only main content causing no content to be returned?

    • Priority: High-priority Bug
    • Status: Open
    • Updated: 4 days ago
    • Description: Issue with returning no data despite a successful HTTP status code.
  2. Issue #538: Strange behaviors in concurrency

    • Priority: Bug/Question
    • Status: Open
    • Updated: 4 days ago
    • Description: Reports of unexpected behavior when handling concurrent requests.
  3. Issue #519: [Feat] What will happen to the links that uses authentication services?

    • Priority: Feature Request
    • Status: Open
    • Updated: 8 days ago
    • Description: Inquiry about handling links requiring multiple authentication methods.
  4. Issue #518: [Feat] Add automatic retries to failed links on crawl

    • Priority: Feature Request
    • Status: Open
    • Updated: 8 days ago
    • Description: Suggestion for automatic retries to enhance reliability during crawls.
  5. Issue #517: [Feat] Run actions like clicking or scrolling on page before extraction

    • Priority: Customer Request
    • Status: Open
    • Updated: 9 days ago
    • Description: Request for functionality allowing pre-extraction actions on pages.

Summary

The recent activity in Firecrawl's GitHub repository reflects a dynamic environment with numerous bugs being reported and addressed, alongside feature requests aimed at improving user experience and functionality. Key issues revolve around encoding problems, scraping failures, and enhancements for crawling capabilities, indicating areas where users seek more robust solutions or clearer documentation.

Report On: Fetch pull requests



Overview

The analysis of the pull requests (PRs) for the Firecrawl project reveals a total of 16 open PRs, reflecting ongoing development efforts focused on bug fixes, feature enhancements, and integration improvements. The activity indicates a collaborative environment with significant contributions from multiple developers.

Summary of Pull Requests

  1. PR #542: [Bug] Fixed go sdk workflow

    • State: Open
    • Created: 3 days ago
    • Significance: Addresses issues in the Go SDK workflow, ensuring proper deployment and testing processes. Comments indicate uncertainty about its effectiveness.
  2. PR #541: [Feat] Added attempts to sdks for db saving time

    • State: Open
    • Created: 3 days ago
    • Significance: Introduces new functionality to SDKs aimed at optimizing database saving times. This reflects a focus on performance improvements.
  3. PR #535: fix docker compose port setting

    • State: Open
    • Created: 6 days ago
    • Significance: Adjusts Docker Compose settings to resolve port conflicts, which is crucial for local development environments.
  4. PR #527: [V1] Release

    • State: Open (Draft)
    • Created: 6 days ago
    • Significance: A draft PR documenting various features and changes for the upcoming version 1 release, indicating significant project evolution.
  5. PR #516: Ensuring USE_DB_AUTHENTICATION is true in single URL scraper

    • State: Open
    • Created: 9 days ago
    • Significance: Fixes a critical bug related to database authentication checks, enhancing security measures.
  6. PR #505: Add another Open-Source Integration

    • State: Open
    • Created: 13 days ago
    • Significance: Expands the project's integration capabilities by adding support for a Laravel PHP RAG system.
  7. PR #493: [Feat] Added llama parser sdk and timeout for scrape

    • State: Open
    • Created: 16 days ago
    • Significance: Enhances scraping functionality with new SDK support and timeout management for larger files.
  8. PR #373: [Feat]: Add RUST SDK client for firecrawl API

    • State: Open
    • Created: 42 days ago
    • Significance: Introduces a Rust SDK client, broadening language support and potential user base.
  9. PR #355: feat: small room optimisation of the apps/api Dockerfile image

    • State: Open
    • Created: 46 days ago
    • Significance: Optimizes the Docker image size, improving deployment efficiency.
  10. PR #438: [Feat] Added rate limit singleton for redis

    • State: Open
    • Created: 30 days ago
    • Significance: Implements a rate-limiting mechanism using Redis, enhancing API performance under load.
  11. PR #389: [Feat] Proposal to resolve the redirect url

    • State: Open
    • Created: 38 days ago
    • Significance: Addresses URL redirection issues, improving user experience and functionality.
  12. PR #344: Adds support for npm i firecrawl

    • State: Open
    • Created: 48 days ago
    • Significance: Facilitates installation via npm, increasing accessibility for JavaScript developers.
  13. PR #343: Adds support for pip install firecrawl

    • State: Open
    • Created: 48 days ago
    • Significance: Similar to PR #344 but for Python users, further expanding the project's reach.
  14. PR #280: Add rendering service to improve scalability

    • State: Open
    • Created: 65 days ago
    • Significance: Introduces a rendering service that enhances scalability and performance against anti-bot measures.
  15. PR #278: Usage billing support for overuse

    • State: Open
    • Created: 66 days ago
    • Significance: Implements usage tracking features that are essential for subscription-based models.
  16. PR #10: Categorize gitignore items

    • State: Open
    • Created: 124 days ago
    • Significance: Enhances project organization by categorizing ignored files in .gitignore.

Analysis of Pull Requests

The recent activity in the Firecrawl repository shows a strong focus on improving both functionality and usability across various aspects of the project. A notable trend is the introduction of new SDKs and integrations, such as the Go SDK (#542) and Rust SDK (#373), which broaden the project's appeal to developers using different programming languages. This aligns with Firecrawl's goal of being an accessible web crawling solution across multiple platforms.

Another significant theme is addressing bugs and enhancing security measures, as seen in PRs like #516 (ensuring proper database authentication) and #541 (optimizing database saving times). These enhancements are critical as they directly impact user trust and application reliability.

There is also an emphasis on optimizing performance through various means—reducing Docker image sizes (#355), implementing rate limiting with Redis (#438), and enhancing logging capabilities (#496). These optimizations are essential in maintaining efficient operations as user demand grows.

The discussions within PR comments reveal an active community engagement where contributors provide feedback and suggestions, fostering collaboration among developers. For instance, discussions around PR #535 regarding Docker Compose settings highlight the importance of clear communication in resolving technical issues collaboratively.

However, there are some closed PRs that were not merged due to overlapping functionalities or because they were superseded by other changes (e.g., PRs #506 and #527). This indicates a need for better coordination among team members to avoid redundancy in efforts and streamline contributions effectively.

Overall, the current state of pull requests reflects a healthy development cycle characterized by active contributions aimed at enhancing functionality, fixing bugs, and improving overall user experience while maintaining robust community engagement practices.

Report On: Fetch commits



Repo Commits Analysis

Development Team and Recent Activity

Team Members

  • Nicolas (nickscamara)

  • Gergő Móricz (mogery)

  • Rafael Miller (rafaelsideguide)

  • Thomas Kosmas (tomkosm)

  • Eric Ciarla (ericciarla)

  • Yuki Matsukura (matsubo)

  • Kent Hsu (KentHsu)

  • Kevin Swiber (kevinswiber)

  • Quan Ming (wahpiangle)

  • Caleb Peffer (calebpeffer)

  • Tak-S (tak-s)

  • Dependabot

Recent Activity Summary

Nicolas (nickscamara)

  • Commits: 80
  • Changes: 6794 across 73 files
  • Recent work includes:
    • Updates to various API controllers and services, focusing on improving functionality and fixing bugs.
    • Significant contributions to the v1-webscraper branch, particularly in enhancing the /map endpoint and adding tests.
    • Merged multiple pull requests related to feature enhancements and bug fixes.

Gergő Móricz (mogery)

  • Commits: 73
  • Changes: 14815 across 66 files
  • Recent work includes:
    • Numerous fixes to the queue-worker, addressing race conditions, logging issues, and job success propagation.
    • Development of new features in the v1-webscraper branch, including websocket functionality for crawl status.
    • Collaborated with Nicolas on various improvements and bug fixes.

Rafael Miller (rafaelsideguide)

  • Commits: 42
  • Changes: 5809 across 116 files
  • Recent work includes:
    • Added support for Python SDK and made enhancements to existing features.
    • Worked on fixing tests and improving logging functionality within the API.
    • Contributed to several branches, focusing on integrating new features and resolving issues.

Thomas Kosmas (tomkosm)

  • Commits: 2
  • Changes: 33 across 1 file
  • Recent work includes minor updates related to website parameters.

Eric Ciarla (ericciarla)

  • Commits: 7
  • Changes: 55526 across 65 files
  • Recent work includes significant updates to the UI components of the project.

Yuki Matsukura (matsubo)

  • Commits: 1
  • Minor changes related to Docker configuration.

Kent Hsu (KentHsu)

  • Commits: 3
  • Minor updates related to Go SDK integration.

Kevin Swiber (kevinswiber)

  • Commits: 1
  • Minor changes related to environment configurations.

Quan Ming (wahpiangle)

  • Commits: 3
  • Minor updates related to Redis configurations.

Caleb Peffer (calebpeffer)

  • Contributions focused on enhancing the crawling capabilities and adding tests for new features.

Tak-S (tak-s)

  • Minor contributions focused on documentation improvements.

Patterns and Themes

  1. Collaboration: There is a strong collaborative effort between team members, particularly between Nicolas and Gergő, who frequently work together on feature enhancements and bug fixes.
  2. Focus on Testing: A significant emphasis is placed on writing tests, particularly in recent commits, indicating a commitment to maintaining code quality.
  3. Feature Development: The team is actively developing new features, especially in the API's crawling capabilities and SDK integrations.
  4. Bug Fixes: Many recent commits are dedicated to fixing bugs, particularly in the queue worker service, which suggests ongoing stability issues that need addressing.
  5. Branch Activity: Multiple branches are actively being developed, indicating a structured approach to feature development and bug resolution.

Overall, the development team is engaged in a productive cycle of feature enhancement, testing, and collaboration aimed at improving the Firecrawl project.