Skip to content

A CHEERIO Node.js scraper designed to extract and save content from Wikipedia pages. It uses axios for making HTTP requests, cheerio for parsing HTML, and fs-extra for file operations.

License

Notifications You must be signed in to change notification settings

Devansh-Bhagania/NODEJS_WEB_SCRAPER_CHEERIO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wikipedia Scraper

This project is a simple Node.js scraper designed to extract and save content from Wikipedia pages. It uses axios for making HTTP requests, cheerio for parsing HTML, and fs-extra for file operations.

Table of Contents

Overview

This scraper fetches content from Wikipedia pages and saves the data in JSON format. It demonstrates how to perform web scraping, handle asynchronous operations, and manage file I/O in Node.js.

How It Works

  1. Fetch Data: The scraper uses axios to send HTTP GET requests to Wikipedia pages.
  2. Parse HTML: The HTML content of the pages is parsed using cheerio, which allows easy extraction of relevant information.
  3. Extract Information: The title of the page and its main content are extracted from the HTML.
  4. Save Data: The extracted data is saved as JSON files in a specified directory using fs-extra.

Setup and Installation

  1. Clone the Repository

    git clone https://github.com/your-username/wikipedia-scraper.git
    cd wikipedia-scraper
  2. Installing dependencies

    npm install
  3. Update Scraper Configuration

    const PAGES = ['Node.js', 'JavaScript', 'Web_scraping'];
  4. Run the Scraper

    node scraper.js

After running the scraper, check the data directory for JSON files. Each file will contain the title and content of a specific Wikipedia page.

About

A CHEERIO Node.js scraper designed to extract and save content from Wikipedia pages. It uses axios for making HTTP requests, cheerio for parsing HTML, and fs-extra for file operations.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published