Skip to content
Change the repository type filter

All

    Repositories list

    • List of libraries, tools and APIs for web scraping and data processing.
      Makefile
      Other
      787000Updated Oct 29, 2024Oct 29, 2024
    • Web data extraction tool implemented as chrome extension
      JavaScript
      GNU Lesser General Public License v3.0
      68000Updated Oct 29, 2024Oct 29, 2024
    • A list of most common User Agent used on Internet.
      JavaScript
      MIT License
      16000Updated Oct 29, 2024Oct 29, 2024
    • lol-html

      Public
      Low output latency streaming HTML parser/rewriter with CSS selector-based API
      Rust
      BSD 3-Clause "New" or "Revised" License
      83000Updated Oct 29, 2024Oct 29, 2024
    • net/http.Client like HTTP Client with options to select specific client TLS Fingerprints to use for requests.
      Go
      BSD 4-Clause "Original" or "Old" License
      163000Updated Oct 28, 2024Oct 28, 2024
    • Scrapy stats exporter for prometheus
      Python
      MIT License
      11000Updated Oct 28, 2024Oct 28, 2024
    • adblocker

      Public
      Efficient embeddable adblocker library
      TypeScript
      Mozilla Public License 2.0
      1010010Updated Oct 28, 2024Oct 28, 2024
    • Python
      27001Updated Oct 27, 2024Oct 27, 2024
    • estela

      Public
      estela, an elastic web scraping cluster 🕸
      TypeScript
      MIT License
      13000Updated Oct 27, 2024Oct 27, 2024
    • Python binding to Modest engine (fast HTML5 parser with CSS selectors).
      Cython
      MIT License
      69000Updated Oct 26, 2024Oct 26, 2024
    • Parse numbers written in natural language
      Python
      BSD 3-Clause "New" or "Revised" License
      23000Updated Oct 25, 2024Oct 25, 2024
    • hero

      Public
      The web browser that’s nearly impossible for bot blockers to block
      TypeScript
      MIT License
      44001Updated Oct 25, 2024Oct 25, 2024
    • HTTP client made for scraping based on got.
      TypeScript
      44000Updated Oct 25, 2024Oct 25, 2024
    • Zyte Data API integration for Scrapy
      Python
      BSD 3-Clause "New" or "Revised" License
      19000Updated Oct 24, 2024Oct 24, 2024
    • Library that helps use puppeteer in scrapy.
      Python
      BSD 3-Clause "New" or "Revised" License
      3000Updated Oct 23, 2024Oct 23, 2024
    • Python
      5001Updated Oct 23, 2024Oct 23, 2024
    • TLS implementation in pure python, focused on interoperability testing
      Python
      Other
      80000Updated Oct 22, 2024Oct 22, 2024
    • creepjs

      Public
      Creepy device and browser fingerprinting
      TypeScript
      MIT License
      194000Updated Oct 20, 2024Oct 20, 2024
    • HTML
      MIT License
      3000Updated Oct 20, 2024Oct 20, 2024
    • dukpy

      Public
      Simple JavaScript interpreter for Python
      JavaScript
      MIT License
      43000Updated Oct 20, 2024Oct 20, 2024
    • Contains the common item definitions used in Zyte.
      Python
      BSD 3-Clause "New" or "Revised" License
      7000Updated Oct 20, 2024Oct 20, 2024
    • Common interface for data container classes
      Python
      BSD 3-Clause "New" or "Revised" License
      13000Updated Oct 19, 2024Oct 19, 2024
    • Spider templates for automatic crawlers.
      Python
      BSD 3-Clause "New" or "Revised" License
      4000Updated Oct 19, 2024Oct 19, 2024
    • w3lib

      Public
      Python library of web-related functions
      Python
      BSD 3-Clause "New" or "Revised" License
      104000Updated Oct 19, 2024Oct 19, 2024
    • Page Object pattern for Scrapy
      Python
      BSD 3-Clause "New" or "Revised" License
      28000Updated Oct 19, 2024Oct 19, 2024
    • 43 MB Google Chrome to fit inside AWS Lambda Layer compressed with Brotli
      MIT License
      46000Updated Oct 19, 2024Oct 19, 2024
    • Node.js implementation of a proxy server (think Squid) with support for SSL, authentication and upstream proxy chaining.
      JavaScript
      Apache License 2.0
      1440010Updated Oct 18, 2024Oct 18, 2024
    • Crawlera middleware for Scrapy
      Python
      BSD 3-Clause "New" or "Revised" License
      88000Updated Oct 18, 2024Oct 18, 2024
    • web-poet

      Public
      Web scraping Page Objects core library
      Python
      BSD 3-Clause "New" or "Revised" License
      15000Updated Oct 18, 2024Oct 18, 2024
    • Remove DIVs, style stuff and normalize HTML preserving estructure information
      Python
      MIT License
      1000Updated Oct 17, 2024Oct 17, 2024