Project Name

Overview

This project contains a collection of Python scripts designed to scrape data from PDF files hosted on the U.S. Securities and Exchange Commission (SEC) website. The scripts aim to extract information about companies that either use specific software solutions or are involved in dealing with cryptocurrencies. The extracted data can provide insights into the adoption of certain technologies or the prevalence of cryptocurrency-related activities among publicly traded companies.

Features

PDF Scraping: Utilizes pypdf to extract text and data from PDF documents.
Keyword Search: Searches for specific keywords related to software usage or cryptocurrency activities within the extracted text.
Data Output: Provides structured data output, such as CSV files or database entries, for further analysis.
Customizable: Easily customizable to adapt to different search criteria or PDF formats.

Scripts

main.py: Scrapes PDF documents to identify companies that mention specific software solutions.
cryptor.py: Extracts information about companies involved in cryptocurrency-related activities from PDF files.

Usage

Clone the repository to your local machine:

git clone https://github.com/otisscott/sec_scraping.git

Install the required dependencies:

pip install -r requirements.txt

Run the desired script, providing necessary arguments such as keywords or file paths:

python main.py

Requirements

Python 3.x
Dependencies listed in requirements.txt
Access to the internet to download PDF files from the SEC website.
A locally saved copy of the XML file containing all of the registered investment advisers found here: https://adviserinfo.sec.gov/compilation

Contribution

Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.

Disclaimer

This project is intended for educational and research purposes only. The information extracted from SEC filings should be verified and used responsibly. The creators of this project are not responsible for any misuse of the data obtained through these scripts.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md
SEC_DataDump.xml		SEC_DataDump.xml
main.py		main.py
requirements.txt		requirements.txt
tester.py		tester.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Name

Overview

Features

Scripts

Usage

Requirements

Contribution

Disclaimer

License

About

Releases

Packages

Contributors 2

Languages

otisscott/sec_scraping

Folders and files

Latest commit

History

Repository files navigation

Project Name

Overview

Features

Scripts

Usage

Requirements

Contribution

Disclaimer

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages