Wordlit.net is aimed at demystifying the decisions and behaviors of algorithms in Natural Language Processing (NLP). It visualizes the relationships and entities extracted from text, offering insights into how NLP algorithms interpret and process language. The application is helpful for NLP researchers, data scientists, and enthusiasts keen on understanding the workings of computational linguistics. Wordlit.net currently supports input for file types of PDF, Word, and TXT.
- Entity Extraction: Leverages spaCy's NLP capabilities to identify entities in the text.
- Knowledge Graph Construction: Builds a graph using NetworkX, linking entities based on their relationships.
- Interactive Visualization: Utilizes Plotly and Streamlit for dynamic graph visualization.
- Customizable Graph Parameters: Offers options to adjust layout spacing, color scheme, node size, and more.
- Graph Analytics: Provides statistics like node and edge counts, graph density and centrality measures.
- Text Analytics: Calculates various text statistics such as token counts, sentence lengths, and unique tokens.
To use this tool, you need to install the following dependencies:
pip install spacy networkx transformers streamlit plotly matplotlib pandas
Don't forget to download the spaCy language model:
python -m spacy download en_core_web_sm
1. Start the Streamlit App: Run the app using Streamlit
streamlit run wordlit.py
2. Input Text: You can input the text by uploading a file, inputting a website URL or pasting it directly into the text area provided.
3. Customize Graph: Adjust the graph parameters like layout spacing, node size, and color scheme using the sidebar options.
4. Generate Graph: Select 'Generate Graph' to visualize the knowledge graph based on your text.
5. Explore Graph Analytics: View various statistics and metrics related to the generated graph and the input text.
Below is an example of a knowledge graph generated from a file. The nodes represent entities, and edges represent their relationships. Each node's size corresponds to its connection degree, and colors vary based on the selected color scheme.
Upload.a.File.mp4
An example of a knowledge graph generated from text.
Enter.Text.Manually.mp4
An example of a knowledge graph generated from a website URL.
Enter.Website.URL.mp4
Python: The entire code is written in Python.
Spacy: An open-source software library for advanced Natural Language Processing (NLP) in Python. It is used for tokenization, named entity recognition (NER), part of speech tagging, and dependency parsing.
NetworkX: A Python library used for building and analyzing network graphs.
Streamlit: An open-source Python library used to build and run the web application.
Plotly: This is a graphing library used for creating interactive knowledge graph visualizations.
Pandas: An open-source data analysis and manipulation tool built on top of the Python programming language.
Time Module: A Python module that is used here for tracking processing time.
Python-Docx: A Python library for creating and updating Microsoft Word (.docx) files.
Pdfplumber: Used for extracting text from PDF files. It allows detailed access to text, tables, and metadata in PDFs.
Requests: A simple HTTP library for Python, used to send HTTP requests easily.
Beautiful Soup (bs4): A Python library used here to parse HTML content.
Contributions to enhance Wordlit.net are welcome. Feel free to fork the repository, make changes, and create a pull request.
All code contributed to Wordlit.net © 2024 by Sahir Maharaj is licensed under Attribution 4.0 International
When using the code from Wordlit.net, please credit as follows:
Code sourced from Wordlit.net, authored by Sahir Maharaj, 2024.
Report a bug or request a feature: sahir@sahirmaharaj.com
LinkedIn: Sahir Maharaj