This project is aimed at scraping the data of university professors from Google Scholar, including their citation count, h-index, and other relevant information. The scraped data will be stored in a CSV file for further analysis. This project will be implemented using Python, BeautifulSoup, and the requests library. The project will be executed in an IPython Notebook.
Before starting this project, you will need to have the following software installed on your machine:
- Python 3.x
- Jupyter Notebook
- pip
- virtualenv (optional)
- Clone the repository to your local machine:
- Change into the project directory:
- cd <repository_name>
- Create a virtual environment (optional):
- virtualenv env
- Activate the virtual environment (if you created one in step 3):
- source env/bin/activate
- Install the required packages:
- pip install -r requirements.txt
- Start Jupyter Notebook:
- jupyter notebook
-
Open the IPython Notebook file
google_scholar_scraping.ipynb
. -
Follow the instructions in the notebook to scrape the data from Google Scholar.
-
The scraped data will be stored in a CSV file named
professors_data.csv
.
Please note that Google Scholar may have restrictions on scraping its data. Use this project at your own risk and make sure to respect the terms of service of Google Scholar.