Welcome to the Information Retrieval repository! This project focuses on web scraping from Wildberries and implementing advanced techniques for content vectorization and multimodal embeddings.
- Wildberries Scraper: Utilizes web scraping techniques to extract data from Wildberries, as detailed in
wb_scraper.ipynb
. - Content Vectorization: Implements methods to convert textual content into numerical vectors for machine learning.
- Multimodal Embeddings: Creates embeddings that combine different types of data (text, images, etc.) for richer representations.
To get started with the Information Retrieval project, follow these steps:
- Clone the Repository:
git clone https://github.com/ivanovsdesign/information_retrieval.git
- Navigate to the Project Directory:
cd information_retrieval
-
Explore the Notebooks:
-
Open
wb_scraper.ipynb
to learn how to scrape data from Wildberries. -
Open
wb_content_vect_colab.ipynb
to understand the workflow for content vectorization and creating multimodal embeddings.
-
This project is intended for educational and research purposes. The author and contributors do not condone or support the misuse of this scraper to violate the terms of service of Wildberries. Users are solely responsible for ensuring their use of this tool complies with all applicable laws and terms of service.
Contributions are welcome! Please read the CONTRIBUTING.md for details on how to contribute to this project.
This project is licensed under the MIT License.
For questions or feedback, please open an issue on GitHub.
🌈 Thank you for visiting the repository! If you find this project helpful, please consider starring it to show your support. Happy coding! 🚀