Skip to content

harivonyR/internal_site_link_crawling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Web Crawler Project

This project is a simple web crawler designed to scrape web pages and extract information. The main focus of the crawler is to demonstrate basic web scraping capabilities using Python.

Project Structure

  • Class/Crawler.py: Contains the Crawler class responsible for fetching and parsing web pages.
  • main.py: The main entry point of the project which creates an instance of the Crawler class and performs web crawling on a specified URL.

Requirements

  • Python 3.x
  • requests
  • beautifulsoup4

Installation

  1. Clone the repository:
    git clone https://github.com/yourusername/web-crawler.git
  2. Navigate to the project directory:
    cd web-crawler
  3. Install the required dependencies:
    pip install -r requirements.txt

Usage

Running the Test

You can test the crawler by running the Crawler.py file directly:

python Class/Crawler.py

This will crawl the Dota 2 Wiki and print the title of the page.

Running the Main Script

To use the crawler with a different URL, you can modify the main.py file:

python main.py

This will create an instance of the Crawler class and perform web crawling on the specified URL in main.py.

Example

Here's a quick example of what you might see when running the test:

Crawling successful. Here is the title of the page:
Dota 2 Wiki

Contributing

Feel free to submit issues or pull requests if you have any suggestions or improvements.

License

This project is licensed under the MIT License. See the LICENSE file for details.


Feel free to customize this README file according to your project's specific needs and details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages