Web Scraper for Fashion Marketplace Sites

A Python tool for scraping multiple shopping websites such as Grailed, Depop, GOAT, and STOCKx (maybe more).

Introduction

This project aims to provide a convenient interface to scraping product listings and related data from various online shopping platforms.

This originated from my AP Computer Science Principles project which was just a Grailed scraper, and I wished to expand it to more sites so I created this. The original is here.

Project Plan

To-Do List / Possible Features:

Installation

Install using Poetry (recommended):

# clone repository
git clone https://github.com/peppapig450/FashionCrawler

# switch to directory
cd FashionCrawler

# install dependencies
poetry install

Install using a virtual environment:

# clone repository
git clone https://github.com/peppapig450/FashionCrawler

# switch to directory
cd FashionCrawler

# setup and activate virtual environment
python3 -m venv venv && source venv/bin/activate

# install dependencies
pip install -r requirements.txt

Usage

Below are the available options for running the scraper.

Options:

Site Selection:

By default, all supported sites are enabled, or it uses the sites specified in the config.yaml file.
--enabled-site ENABLE_SITE: Enable specific site(s) by providing a comma-seperated list of supported site names.
--disabled-site DISABLE_SITE: Disable specific site(s) by providing a comma-seperated list of supporte site names.

Search Options:

-s SEARCH, --search SEARCH: Specify a search query to scrape for.

Output Options:

If no output option is specified, the scraper prints the result as a table on the command line.
-j, --json: Output the result as JSON.
-c, --csv: Output the result as CSV.
-y, --yaml: Output the result as YAML.
-o OUTPUT, --output OUTPUT: Specify the output file name (without extension).
--output-dir OUTPUT_DIR: Specify the output directory.

Example Usage:

To enable only Grailed and Depop sites, search for "Nike Air Force", and output the result as JSON to a file named "output.json" in the "data" directory, the command would be:

poetry run python main.py --enable-site Grailed,Depop --search "Nike Air Force" -j -o output --output-dir data

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 338 Commits
.vscode		.vscode
dev/scripts		dev/scripts
fashioncrawler		fashioncrawler
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
TODO.md		TODO.md
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraper for Fashion Marketplace Sites

Table of Contents

Introduction

Project Plan

To-Do List / Possible Features:

Installation

Usage

Options:

Site Selection:

Search Options:

Output Options:

Example Usage:

License

About

Releases

Packages

Languages

License

peppapig450/FashionCrawler

Folders and files

Latest commit

History

Repository files navigation

Web Scraper for Fashion Marketplace Sites

Table of Contents

Introduction

Project Plan

To-Do List / Possible Features:

Installation

Usage

Options:

Site Selection:

Search Options:

Output Options:

Example Usage:

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages