This project is a simple web crawler designed to scrape web pages and extract information. The main focus of the crawler is to demonstrate basic web scraping capabilities using Python.
- Class/Crawler.py: Contains the
Crawler
class responsible for fetching and parsing web pages. - main.py: The main entry point of the project which creates an instance of the
Crawler
class and performs web crawling on a specified URL.
- Python 3.x
requests
beautifulsoup4
- Clone the repository:
git clone https://github.com/yourusername/web-crawler.git
- Navigate to the project directory:
cd web-crawler
- Install the required dependencies:
pip install -r requirements.txt
You can test the crawler by running the Crawler.py
file directly:
python Class/Crawler.py
This will crawl the Dota 2 Wiki and print the title of the page.
To use the crawler with a different URL, you can modify the main.py
file:
python main.py
This will create an instance of the Crawler
class and perform web crawling on the specified URL in main.py
.
Here's a quick example of what you might see when running the test:
Crawling successful. Here is the title of the page:
Dota 2 Wiki
Feel free to submit issues or pull requests if you have any suggestions or improvements.
This project is licensed under the MIT License. See the LICENSE file for details.
Feel free to customize this README file according to your project's specific needs and details.