AAER Scraper is a Python application that allows you to scrape table data from a webpage and store it in a Pandas DataFrame. It utilizes the requests library for making HTTP requests, BeautifulSoup for HTML parsing, and Pandas for data manipulation.
- AAER_Data.csv
- AAER_Data_2014_Onwards.csv
- AAER_Scraper.ipynb
- CODE_OF_CONDUCT
- LICENSE
- README.md
- config.ini
- http_cache.sqlite
- main.py
- requirements.txt
- scraper/
- __init__.py
- aaer_scraper.py
- config_loader.py
Here's a brief description of each file and directory:
AAER_Data.csv
andAAER_Data_2014_Onwards.csv
: Sample CSV files containing scraped data.AAER_Scraper.ipynb
: Jupyter Notebook containing an example usage of the AAER Scraper.CODE_OF_CONDUCT
: Community code of conduct guidelines.LICENSE
: License information for the AAER Scraper.README.md
: Readme file providing an overview of the AAER Scraper and instructions for installation and usage.config.ini
: Configuration file containing necessary values for the AAER Scraper.http_cache.sqlite
: SQLite database file used for caching HTTP requests.main.py
: Python script for running the AAER Scraper.requirements.txt
: File specifying the required dependencies for the AAER Scraper.scraper/
: Directory containing the source code for the AAER Scraper. It includes the following files:__init__.py
: An empty file indicating that the directory is a Python package.aaer_scraper.py
: Python module containing the main implementation of the AAER Scraper.config_loader.py
: Python module for loading the configuration values fromconfig.ini
.
Please note that this is just a sample folder structure and may vary based on your specific implementation or usage requirements.
This updated section provides a more detailed description of each file and directory in the folder structure and their purposes within the project.
-
Clone the repository:
git clone https://github.com/pChitral/Accounting-and-Auditing-Enforcement-Releases-Web-Scraper.git
-
Install the required dependencies using pip:
pip install -r requirements.txt
-
Update the
config.ini
file:The
config.ini
file contains the necessary configuration values for the AAER Scraper. Update the following values as per your requirements:base_url
: The base URL of the webpage to scrape.- Any other configuration values specific to your use case.
-
Run the
main.py
script:python main.py
The script will execute the scraping process and store the scraped data in a Pandas DataFrame.
Note: Ensure that you have the necessary permissions to access the webpage and that it contains a table with the required structure.
The config.ini
file contains the configuration values for the AAER Scraper. Update these values according to your needs. Below is an explanation of the available options:
base_url
: The base URL of the webpage to scrape. Make sure to include the necessary query parameters if required.
Contributions to AAER Scraper are welcome! If you find any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request. When contributing, please follow the existing coding style and guidelines.
AAER Scraper is released under the MIT License.
This version of the README.md file incorporates improved formatting using code blocks and indented directory structure. It is more visually appealing when viewed on GitHub.