Information Retrieval Project 📚🔍

Welcome to the Information Retrieval repository! This project focuses on web scraping from Wildberries and implementing advanced techniques for content vectorization and multimodal embeddings.

🌟 Features

Wildberries Scraper: Utilizes web scraping techniques to extract data from Wildberries, as detailed in wb_scraper.ipynb.
Content Vectorization: Implements methods to convert textual content into numerical vectors for machine learning.
Multimodal Embeddings: Creates embeddings that combine different types of data (text, images, etc.) for richer representations.

🛠️ Getting Started

To get started with the Information Retrieval project, follow these steps:

Clone the Repository:

git clone https://github.com/ivanovsdesign/information_retrieval.git

Navigate to the Project Directory:

cd information_retrieval

Explore the Notebooks:
- Open wb_scraper.ipynb to learn how to scrape data from Wildberries.
- Open wb_content_vect_colab.ipynb to understand the workflow for content vectorization and creating multimodal embeddings.

📜 Disclaimer

This project is intended for educational and research purposes. The author and contributors do not condone or support the misuse of this scraper to violate the terms of service of Wildberries. Users are solely responsible for ensuring their use of this tool complies with all applicable laws and terms of service.

🤝 Contributing

Contributions are welcome! Please read the CONTRIBUTING.md for details on how to contribute to this project.

📄 License

This project is licensed under the MIT License.

📬 Contact

For questions or feedback, please open an issue on GitHub.

🌈 Thank you for visiting the repository! If you find this project helpful, please consider starring it to show your support. Happy coding! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
res/embeddings		res/embeddings
.gitignore		.gitignore
README.md		README.md
data.csv		data.csv
metadata.csv		metadata.csv
tfidf_vectorizer.pkl		tfidf_vectorizer.pkl
wb_content_vect_colab.ipynb		wb_content_vect_colab.ipynb
wb_scraper.ipynb		wb_scraper.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Information Retrieval Project 📚🔍

🌟 Features

🛠️ Getting Started

📜 Disclaimer

🤝 Contributing

📄 License

📬 Contact

About

Releases

Packages

Languages

ivanovsdesign/information_retrieval

Folders and files

Latest commit

History

Repository files navigation

Information Retrieval Project 📚🔍

🌟 Features

🛠️ Getting Started

📜 Disclaimer

🤝 Contributing

📄 License

📬 Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages