Python library to automatically collect relevant data from different data sources in the context of COVID19 in Spain
COnVIDa (convida.inf.um.es) is a tool developed by the Cybersecurity and Data Science Laboratory at the University of Murcia (Spain) that allows to easily gather data related to the COVID19 pandemic form different data sources, in the context of Spain.
In particular, this project contains the Python library developed to collect data from the different data sources. The code is publicly available to be used by researchers, data analysis and software developers, ready to be integrated as modules in Python scripts or IPython Notebooks. Moreover, it is specially designed to be modular and extensible to new data sources.
COnVIDa library is composed of two main elements:
/lib
: Core implementation of the crawling functionality./server
: Example of the use of the library itself, when deploying it as a web service.
- Python3
- pip3
- Clone the repo
git clone https://github.com/CyberDataLab/COnVIDa-lib.git
- Change to COnVIDa-lib directory
- Install required packages (using a virtual environment such as
conda
is highly recommended)
pip3 install -r requirements.txt
The library can be easily used as shown in the test lib notebook. Some considerations should be taken:
-
ConVIDa modules should be imported accordingly. The simplest way would be to place your script or Notebook within the
lib
folder. However, you are free to manage the imports as desired. -
The class COnVIDa acts as a factory which encapsulates the low-level implementation of the different data sources. In this sense, to appropriately use this library it is only necessary to know its public functions. For more info, see lib documentation.
-
The COnVIDa library always accesses external sources to retrieve the data. In this sense, it is worth keeping the data on disk to avoid requesting the same data several times. In fact, the
server
example is specifically oriented to work with the data locally once a data cache is built by usinglib
.
The service can be easily used as shown in the test server notebook. Some considerations should be taken:
-
ConVIDa server should be imported accordingly. The simplest way would be to place your script or Notebook within the
server
folder. However, you are free to manage the imports as desired. -
The class convida_server also manages user queries from a high-level perspective. However, these queries are in this case locally filtered against a data cache file (which is generated for the first time with the data generation notebook and can be updated with the daily_update function). For more info, check the server documentation.
See the open issues for a list of proposed features (as well as known issues).
Contributions are what make the open source community such an amazing place to learn, create and get inspired. COnVIDa library is specially designed to be extended with little effort, see the developer guidelines for more info.
Any contributions you make are greatly appreciated. To do so, follow the next steps:
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE
for more information.
CyberDataLab - @cyberdatalab
Contact us through convida@listas.um.es
Entire COnVIDa project: https://github.com/CyberDataLab/COnVIDa