(for « Analyse de papiers scientifiques ») Analysis of scientific papers and results visualization.
Project done during our 1st year at Télécom ParisTech, in June 2016. We mined scientific publications from our school to generate insights about the topics and the authors.
For instance we built visualization about the various topics covered by the papers, and built charts to highlight links between authors.
-
nltk for natural language processing
-
latexcodec to work with LaTeX
-
Flask to code the server
-
pdfMiner to extract text from pdf
-
bibtexparser to extract data from bibtex files
-
rdflib to work with RDF and represent information
-
pyspotlight to connect to DBpedia and retrieve semantic categories/topics
-
datetime for time computations
To install them all (Linux):
→ sudo pip install nltk latexcodec Flask pdfMiner bibtexparser rdflib pyspotlight dateutil
-
You will need a command line inside the source folder. You can execute
cd aps/
after cloning our repository. -
Launch
python src/server.py
to launch the Flask server. -
Go to http://localhost:5000/, this is the index page
-
Launch the computation by going to http://localhost:5000/init_wordcloud. You can follow the computations in the console.
-
When the page is done loading, it will print something like
Word Cloud for biblio.bib done
. you can check the word cloud on the index page. -
Enjoy!
-
Make sure you have telecom.bib in your aps/data folder. (If you don't, you can put another .bib file and rename it)
-
You will need a command line inside the source folder. You can execute
cd aps/
after cloning our repository. -
Launch
python src/main_graph.py
to launch the computations. -
Launch
python src/server.py
to start the server if the server isn't already on. If it's already on, there's no point restarting it as it refreshes. -
Go to http://localhost:5000/, this is the index page
-
Enjoy!
Loïc Herbelot
Antoine Sueur
Romeo Brofiga
Adrien Lagasse