CSR SEO Page Crawler

A solution for building SEO-friendly Client-Side Rendered (CSR) blogs with dynamic content management capabilities, all while maintaining a zero-cost infrastructure.

Problem Statement

Building a blog platform with three key requirements:

Dynamic content management through a user-friendly interface
Zero infrastructure costs (excluding domain)
SEO optimization

Detailed Architecture Implementation

For a comprehensive understanding of the implementation and architecture, read the full blog post here: https://quochung.cyou/toi-uu-seo-website-csr-react-angular-bang-ki-thuat-precrawl/

Solution

This project implements a pre-crawling strategy that combines the best of both worlds:

CSR for regular users: Fast, dynamic content loading via API
Pre-rendered static HTML for search engine bots: Optimal SEO performance

Demo

React Site without Pre-Crawling

React Site with Pre-Crawling

Project Structure

project/
├── cloudflare-worker/        # Bot detection and routing
│   ├── src/
│   │   └── index.ts         # Main worker logic
│   ├── package.json         # Dependencies
│   └── wrangler.toml        # Worker configuration
├── src/                     # Main application code (scraper.ts, webScraper.ts)
└── README.md

Technical Implementation

1. Bot Detection & Routing

Intelligent user-agent detection for major search engines and social media bots
Separate handling for media files (images, videos, etc.)
Automatic routing of bot requests to pre-rendered HTML
Human visitors get the full CSR experience

2. Caching Strategy

Pre-rendered HTML stored in Supabase Storage
Site identification using MD5 hashing
Configurable cache duration
Fallback mechanisms for cache misses

3. Infrastructure Components

Proxy Layer: Cloudflare Worker Route + Worker for bot detection and routing
Storage: Supabase Storage for HTML caching
Frontend: React-based static site hosted on Cloudflare Pages
Crawler: Automated crawling via GitHub Actions
Backend: Supabase Database + API for dynamic content

Setup & Usage

Prerequisites

Node.js and npm
Cloudflare account
Supabase account

Installation

Configure environment variables (see .env.example)
Install dependencies: npm install
Run crawler: npm start

Development

Setup Supabase Storage for Pre-rendered HTML

Create a new project on Supabase
Create Storage bucket for storing pre-rendered HTML

Public Bucket: To allow any source to access the bucket
Bucket Name: Name of the bucket
You can also setup the restrictions for the bucket

Setup Cloudflare Worker Route

Check the cloudflare-worker/src/index.ts file
Edit config file cloudflare-worker/wrangler.toml
Edit config file cloudflare-worker/src/index.ts
Follow doc for deploy worker https://developers.cloudflare.com/workers/

Config gonna need your domain name, supabase storage url, and supabase storage bucket name, supabase key for accessing the storage (Create)

Add worker route for the domain, choose the worker you just deployed
You still may need a DNS Record for the domain you want the worker to be applied to (It need to get proxied by Cloudflare - the orange cloud)

Scrape Data Locally

# Start local development for scraping data

# Edit the .env file



npm run dev

# Deploy worker
npm run deploy

# Run tests
npm run test

The scraper will crawl the website and store the pre-rendered HTML in Supabase Storage.

Deploy scrape using GitHub Actions

# Check the .github/workflows/scraper.yml file

Deploy Cloudflare Worker for bot detection and routing

# Check the cloudflare-worker/src/index.ts file

# Edit config file cloudflare-worker/wrangler.toml

# Edit config file cloudflare-worker/src/index.ts

# Deploy worker

https://developers.cloudflare.com/workers/

Limitations & Considerations

GitHub Actions free tier limit (2200 minutes/month)
Supabase free tier storage limit (1GB)
Cloudflare free tier worker limit (100,000 requests/day)
Requires crawl strategy optimization based on content update frequency
Cache invalidation needs careful handling
Bot detection requires regular updates for new user agents

Future Improvements

Performance Optimization
- Response compression
- Smart cache invalidation by getting the last modified date of the page
Monitoring & Logging
- Performance metrics collection
- Enhanced error tracking
- Request logging
Features
- Custom cache rules per route
- Advanced bot detection
- Automated cache warming

Tech Stack

Puppeteer for web crawling
Supabase for storage and database
Cloudflare for hosting and routing
GitHub Actions for automation
TypeScript/Node.js

Contributing

Feel free to submit issues and enhancement requests.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
cloudflare-worker		cloudflare-worker
src		src
.env		.env
.gitignore		.gitignore
README.md		README.md
image-1.png		image-1.png
image-10.png		image-10.png
image-11.png		image-11.png
image-2.png		image-2.png
image-3.png		image-3.png
image-4.png		image-4.png
image-7.png		image-7.png
image-8.png		image-8.png
image-9.png		image-9.png
image.png		image.png
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSR SEO Page Crawler

Problem Statement

Detailed Architecture Implementation

Solution

Demo

React Site without Pre-Crawling

React Site with Pre-Crawling

Project Structure

Technical Implementation

1. Bot Detection & Routing

2. Caching Strategy

3. Infrastructure Components

Setup & Usage

Prerequisites

Installation

Development

Setup Supabase Storage for Pre-rendered HTML

Setup Cloudflare Worker Route

Scrape Data Locally

Deploy scrape using GitHub Actions

Deploy Cloudflare Worker for bot detection and routing

Limitations & Considerations

Future Improvements

Tech Stack

Contributing

License

About

Releases

Packages

Languages

quochung-cyou/csr-seo-page-crawl

Folders and files

Latest commit

History

Repository files navigation

CSR SEO Page Crawler

Problem Statement

Detailed Architecture Implementation

Solution

Demo

React Site without Pre-Crawling

React Site with Pre-Crawling

Project Structure

Technical Implementation

1. Bot Detection & Routing

2. Caching Strategy

3. Infrastructure Components

Setup & Usage

Prerequisites

Installation

Development

Setup Supabase Storage for Pre-rendered HTML

Setup Cloudflare Worker Route

Scrape Data Locally

Deploy scrape using GitHub Actions

Deploy Cloudflare Worker for bot detection and routing

Limitations & Considerations

Future Improvements

Tech Stack

Contributing

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages