SpiderBolt

SpiderBolt is a fast and efficient Python web scraping script that extracts links from websites using multi-threading and random user agents. It categorizes links into HTML and other types, groups them by paths, and saves them in an organized file. Customizable settings ensure flexibility for various scraping needs.

Features

🌟 Multi-threading: Handles up to 500 threads for fast and efficient web scraping.
🌐 Custom User Agents: Mimics real browsers using random user-agent headers to avoid detection.
📊 Link Categorization: Automatically categorizes links into HTML and other types, grouping them by paths for easy analysis.
🛠️ Customizable Settings: Adjust the number of threads and tweak other settings to suit your scraping needs.

Installation

Clone the repository:

git clone https://github.com/ogtirth/SpiderBolt.git
cd SpiderBolt

Install the required dependencies:
```
pip install -r requirements.txt
```
Make sure to add a `user-agents.txt` file with a list of user agents (one per line) in the project directory.

Usage

Run the script:

python spiderbolt.py

Follow the on-screen prompts to:

Enter the domain to scrape links.
Specify the number of threads you want.

The script will handle the rest, providing you with real-time status updates for each request.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
requirements.txt		requirements.txt
spiderbolt.py		spiderbolt.py
user-agents.txt		user-agents.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpiderBolt

Features

Installation

Usage

About

Releases

Packages

Languages

ogtirth/SpiderBolt

Folders and files

Latest commit

History

Repository files navigation

SpiderBolt

Features

Installation

Usage

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages