GitHub - maladeep/Coventry-PureHub-Search-Engine: Python streamlit app to uncover the brilliance: explore profiles, groundbreaking work, and cutting-edge research by the exceptional minds of Coventry University.

Uncover the brilliance: Explore profiles, groundbreaking work, and cutting-edge research by the exceptional minds of Coventry University.

Overview

The Coventry PureHuB Search Engine is a web application that allows users to search for research publications and authors affiliated with Coventry University. The application utilizes natural language processing techniques, such as stemming and TF-IDF, and other techniques like inverse indexer to provide accurate search results in a user-friendly manner.

Features

Research Publication Search: Users can search for research publications by entering relevant keywords or phrases. The search engine employs advanced techniques such as stemming and TF-IDF to match the user's query with the indexed publication data accurately.
Author Search: Users can also search for specific authors by their names or related keywords. The search engine applies the same advanced techniques to match the user's input with the indexed author data. Stemming and TF-IDF: The search engine utilizes stemming to reduce words to their base or root form, enabling broader search coverage. Additionally, the application employs TF-IDF to calculate the importance of each term in the documents and generate relevance scores for the accurate ranking of search results.
Inverse Indexer: The search engine includes an inverse indexer that indexes and stores the publication and author data in a structured manner, enabling efficient retrieval and retrieval of relevant information.
Multinomial Naïve Bayes Classification: The search engine incorporates the Multinomial Naïve Bayes classification technique to categorize publications into different subject categories.
Cron job: The specific cron schedule used was "0 0 * * 0" along with the command file "Scrapper.py," indicating that the crawler would run every Sunday at midnight. This configuration ensured that the study remained up-to-date with the latest data by consistently retrieving fresh information at the beginning of each week.

Try PureHuB

Installation

Clone the repository:

git clone https://github.com/maladeep/Coventry-PureHub-Search-Engine.git
Install the required dependencies:

pip install -r requirements.txt

Usage

Run Live App

or

Run locally

Streamlit run clone https://github.com/maladeep/Coventry-PureHub-Search-Engine.git

Open the provided URL in your web browser.
Enter your search query, select the search filter and search type, and click the "SEARCH" button.
View the search results displayed in cards.
Scroll down to view more search results.

Dependencies

The project has the following vital dependencies:

The Coventry PureHub Search Engine relies on the following dependencies:

streamlit: The web application framework used for building the user interface.
Pillow: A library for opening and manipulating images, used to display an image in the streamlit application.
ujson: A fast JSON encoder and decoder library, used to load JSON data.
scikit-learn: A machine learning library, used for text preprocessing, TF-IDF vectorization, and cosine similarity calculation.
nltk: The Natural Language Toolkit, used for tokenization, stemming, and stop-word removal.
numpy: A powerful library for numerical computations in Python.
pandas: A data manipulation library, used for handling and processing structured data.
seaborn: A data visualization library, used for creating attractive and informative plots.
matplotlib: A versatile plotting library, used for generating various types of charts and graphs.
scikit-multilearn: A library for multi-label classification, used for advanced search features.
requests: A library for making HTTP requests, used for fetching external resources.
beautifulsoup4: A library for web scraping, used for extracting data from web pages.
selenium: A library for web automation, used for interacting with web pages.
webdriver_manager: A library for managing web drivers, used for browser automation.

Contributing

Contributions to this project are welcome. If you find any issues or would like to suggest improvements, please open an issue or submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for more information.

Note

This work is done for the partial fulfillment of STW7071CEM Information Retrieval coursework provided by Coventry University.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
CPhub.png		CPhub.png
Coventry PureHuB Dark.png		Coventry PureHuB Dark.png
Coventry PureHuB light.png		Coventry PureHuB light.png
Indexer.py		Indexer.py
LICENSE		LICENSE
README.md		README.md
Scrapper.py		Scrapper.py
app.py		app.py
author_indexed_dictionary.json		author_indexed_dictionary.json
author_list_stemmed.json		author_list_stemmed.json
author_names.json		author_names.json
authorindexer.py		authorindexer.py
cire.png		cire.png
classifier.py		classifier.py
crontab scrapper		crontab scrapper
crontab.save		crontab.save
model_MultiNB.pkl		model_MultiNB.pkl
pub_cu_author.json		pub_cu_author.json
pub_date.json		pub_date.json
pub_name.json		pub_name.json
pub_url.json		pub_url.json
publication_indexed_dictionary.json		publication_indexed_dictionary.json
publication_list_stemmed.json		publication_list_stemmed.json
requirements.txt		requirements.txt
scraper_results.json		scraper_results.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Overview

Features

Installation

Usage

Dependencies

Contributing

License

Note

About

Releases

Packages

Languages

License

maladeep/Coventry-PureHub-Search-Engine

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Overview

Features

Installation

Usage

Dependencies

Contributing

License

Note

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages