Goodreads quotes scraper

Description

This scraper is straightforward and specifically designed to extract quotes from the Goodreads website. Built in Node.js using the Puppeteer library, it scrapes 100 pages at once, collecting approximately 8,000 records. Each record includes the main quote, the writer's name, and their image. Simply follow the steps provided to set up this project on your local machine, and you can modify and use it as needed.

Installation

Features

Scrapes quotes and image sources from Goodreads quotes pages.
Navigates through multiple pages of quotes.
Saves the collected data into a CSV file.

Prerequisites

Node.js (>=14.0.0)
npm (Node Package Manager)

Clone this repository:

git clone https://github.com/NomanSiddiqui0000/goodreads_quotes-scraper.git

Navigate to the project directory:
```
cd goodreads_quotes scraper
```
Install the required npm packages:
```
npm install
```

Usage

Open the Scraper_script.js file and adjust the maxPages variable if you want to scrape fewer pages. By default, it is set to scrape 100 pages.
Run the script:
```
node Scraper_script.js
```
The script will navigate through the Goodreads quotes pages, scrape quotes and image sources, and save the data to quotes_and_images.csv.

Configuration

Puppeteer Launch Options: The script launches Puppeteer in non-headless mode for debugging purposes. Change { headless: false } to { headless: true } if you want to run it in headless mode.
CSV File Path: The CSV file will be saved as quotes_and_images.csv in the root directory of the project.

Notes

Ensure you comply with Goodreads' terms of service when scraping their website.
This script assumes the structure of the Goodreads quotes pages remains consistent. If Goodreads updates their site, you may need to adjust the selectors used in the script.

License

This project is licensed under the MIT License.

Author

Muhammad Noman

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
node_modules		node_modules
README.md		README.md
Scraper_script.js		Scraper_script.js
package-lock.json		package-lock.json
package.json		package.json
quotes_and_images.csv		quotes_and_images.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Goodreads quotes scraper

Description

Table of Contents

Installation

Features

Prerequisites

Usage

Configuration

Notes

License

Author

About

Releases

Packages

Languages

NomanSiddiqui0000/goodreads_quotes-scraper

Folders and files

Latest commit

History

Repository files navigation

Goodreads quotes scraper

Description

Table of Contents

Installation

Features

Prerequisites

Usage

Configuration

Notes

License

Author

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages