About
This module will enable you to automatically scrape Eurostat online_"Statistics Explained_" and index the contents of these pages into some sort of knowledge graph. It will actually build a graph of inter-relationships between the pages while extracting existing semantic contents (documentation, concepts, glossary, ...).
documentation | |
status | since 2018 – in construction |
contributors | |
license | EUPL |
- Framework
Scrapy
for extracting data from online websites. - Natural language toolkit
nltk
to work with human language data. - Package
NetworkX
for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. - Module
py2neo
forneo4j
graph database, though the bolt driverneo4j-python-driver
does the job.
- Statistics Explained main page.
- How Open Are Official Statistics?.