ZODs LLM Web Scrapper

An advanced web scraping service that intelligently extracts and processes web content using embedding models. The service makes smart decisions about content relevancy based on embedding analysis, with support for multi-language and region-specific scraping.

Main purpose of this service is to extract relevant data from the web to feed provide fresh data to LLMs (AI models).

Overview

This service performs intelligent web scraping by combining search engine results with deep content analysis. It uses embedding models to evaluate content relevance and processes HTML content to extract meaningful information across different languages and regions.

Key Features

Intelligent web search and content discovery
Multi-language and region-specific content scraping
Smart HTML content processing and relevancy analysis
Multiple embedding model integrations:
- HuggingFace
- OpenAI
- Google VertexAI
- AWS Bedrock
- Anthropic
- Cohere
Built-in rate limiting and request management
Automated content relevancy scoring
Browser-based scraping with Playwright
Ad blocking and content cleaning
Metrics and monitoring integration

Tech Stack

TypeScript / Node.js
Express.js
Playwright for browser automation
MongoDB for data storage
Vite for build tooling
Winston/Pino for logging
Prometheus for metrics
Multiple AI provider SDKs

Getting Started

Quick setup guide for ZODs LLM Web Scrapper.

Prerequisites

Node.js >= 20
pnpm
MongoDB instance
API keys for:
- OpenAI
- Google API (VertexAI)
- HuggingFace
- Search API

Installation

Clone and install dependencies:

git clone https://github.com/ZODs-Labs/zods-llm-web-scrapper.git
cd zods-llm-web-scrapper
pnpm install

Set up environment variables:

cp .env.example .env

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
src		src
.env		.env
.eslintignore		.eslintignore
.eslintrc.cjs		.eslintrc.cjs
.gitignore		.gitignore
.prettierrc.cjs		.prettierrc.cjs
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ZODs LLM Web Scrapper

Overview

Key Features

Tech Stack

Getting Started

Prerequisites

Installation

More setup steps will be added here.

About

Releases

Packages

Languages

ZODs-Labs/zods-llm-web-scrapper

Folders and files

Latest commit

History

Repository files navigation

ZODs LLM Web Scrapper

Overview

Key Features

Tech Stack

Getting Started

Prerequisites

Installation

More setup steps will be added here.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages