Skip to content

This project is a 2-part program that will: 1/Scrap the SCP Wiki for data. 2/Apply pre-processing and some NLP algorithms

License

Notifications You must be signed in to change notification settings

b00731976/SCP_Wiki_NLP_Exploration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SCP Foundation Wiki - NLP Exploration

orga_ner

Explore language patterns of the SCP Wiki

This project will attempt to find patterns in the SCP submissions of the SCP Foundation Wiki in 3 steps. To date:

1/ A scraping algorithm that scraps the text up to the SCP-2700 as well as users ratings:
  • The website framework changes a lot the more recent the SCP submissions are. A more flexible and robust scraping algorithm will be submitted soon
  • It is easy to enrich the current and future data with the tags and the number of discusssions at least. A commit will be submitted soon
2/ Pre-processing of the data and basic NLP implementations
  • To first structure the project, I only implemented to-date basic NLP algorithms
3/ An Apache Spark implementation of some NLP algorithms through John Snow Labs SparkNLP. The implementation is done within Databriks

word_cloud

About

This project is a 2-part program that will: 1/Scrap the SCP Wiki for data. 2/Apply pre-processing and some NLP algorithms

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published