El objetivo de esta actividad dirigida, es utilizar "Pandas", la librería de Phyton, para analisar y extraer datos de Covid 19 en tiempo real.
Una de las principales metas que se busca alcanzár con este ejercicio, es traducir y limpiar los datos encontrados en distintos medios para asi, reinterpretarlos como de forma grafica.
La URL: https://api.covid19api.com/countries
Usamos la función !pip para instalar Pandas
!pip install pandas
Requirement already satisfied: pandas in c:\users\gabri\anaconda3\lib\site-packages (1.4.4)
Requirement already satisfied: pytz>=2020.1 in c:\users\gabri\anaconda3\lib\site-packages (from pandas) (2022.1)
Requirement already satisfied: python-dateutil>=2.8.1 in c:\users\gabri\anaconda3\lib\site-packages (from pandas) (2.8.2)
Requirement already satisfied: numpy>=1.18.5 in c:\users\gabri\anaconda3\lib\site-packages (from pandas) (1.21.5)
Requirement already satisfied: six>=1.5 in c:\users\gabri\anaconda3\lib\site-packages (from python-dateutil>=2.8.1->pandas) (1.16.0)
Importamos gracias al PD de Pandas
import pandas as pd
El simbolo = se usa para asignar variables, los links van con comillas porque son cadenas de caracteres.
miurl = "https://api.covid19api.com/countries"
Escribimos miurl para comprobar que está bien hecho.
miurl
'https://api.covid19api.com/countries'
Se evidencia que es una cadena de caracteres al poner miurl dentro de parentesis.
type(miurl)
str
La abreviatura de dataframe es df. La funcion read_json() lee el formato json. para leer el url lo ponemos dentro de un parentesis.
df = pd.read_json(url)
Para visualizar los datos llamamos el objeto y Pandas identifica una de las entradas del dataframe. Para ver los datos llamamos el objeto y Pandas identifica que se trata de un dataframe
Llamamos al objeto para visualizar los datos. vemos una tabla de Pandas que identifica las entradas del dataframe.
df
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Country | Slug | ISO2 | |
---|---|---|---|
0 | Angola | angola | AO |
1 | Georgia | georgia | GE |
2 | Ireland | ireland | IE |
3 | Slovenia | slovenia | SI |
4 | French Guiana | french-guiana | GF |
... | ... | ... | ... |
243 | Sri Lanka | sri-lanka | LK |
244 | Canada | canada | CA |
245 | Kuwait | kuwait | KW |
246 | Libya | libya | LY |
247 | Seychelles | seychelles | SC |
248 rows × 3 columns
Para ver las primeras entradas de la tabla utilizaremos la siguiente función: 6 para ver las seis primeras.
df.head(6)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Country | Slug | ISO2 | |
---|---|---|---|
0 | Fiji | fiji | FJ |
1 | Hong Kong, SAR China | hong-kong-sar-china | HK |
2 | Palestinian Territory | palestine | PS |
3 | Sierra Leone | sierra-leone | SL |
4 | Turkey | turkey | TR |
5 | Uzbekistan | uzbekistan | UZ |
Con df.tail() Vemos las ultimas
df.tail(6)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Country | Slug | ISO2 | |
---|---|---|---|
242 | Republic of Kosovo | kosovo | XK |
243 | Zambia | zambia | ZM |
244 | Argentina | argentina | AR |
245 | Burundi | burundi | BI |
246 | Monaco | monaco | MC |
247 | Seychelles | seychelles | SC |
Para ver las informaciones de las variables que hay en el df usamos la siguiente función:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 248 entries, 0 to 247
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Country 248 non-null object
1 Slug 248 non-null object
2 ISO2 248 non-null object
dtypes: object(3)
memory usage: 5.9+ KB
Para visualizar una sola variable
df['Country']
0 Fiji
1 Hong Kong, SAR China
2 Palestinian Territory
3 Sierra Leone
4 Turkey
...
243 Zambia
244 Argentina
245 Burundi
246 Monaco
247 Seychelles
Name: Country, Length: 248, dtype: object
Para ver un valor en especifico de una de las varibles:
df['Country'][66]
'British Indian Ocean Territory'
df['ISO2'].head()
0 FJ
1 HK
2 PS
3 SL
4 TR
Name: ISO2, dtype: object
La URL utilizada es: https://api.covid19api.com/country/colombia/status/confirmed/live
Se guardan los datos, pero ahora añadiendo co para solo trabajar con este país df_co.
url_co = 'https://api.covid19api.com/country/colombia/status/confirmed/live'
df_co = pd.read_json(url_co)
df_co
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Country | CountryCode | Province | City | CityCode | Lat | Lon | Cases | Status | Date | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Colombia | CO | 4.57 | -74.3 | 0 | confirmed | 2020-01-22 00:00:00+00:00 | |||
1 | Colombia | CO | 4.57 | -74.3 | 0 | confirmed | 2020-01-23 00:00:00+00:00 | |||
2 | Colombia | CO | 4.57 | -74.3 | 0 | confirmed | 2020-01-24 00:00:00+00:00 | |||
3 | Colombia | CO | 4.57 | -74.3 | 0 | confirmed | 2020-01-25 00:00:00+00:00 | |||
4 | Colombia | CO | 4.57 | -74.3 | 0 | confirmed | 2020-01-26 00:00:00+00:00 | |||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1037 | Colombia | CO | 4.57 | -74.3 | 6312657 | confirmed | 2022-11-24 00:00:00+00:00 | |||
1038 | Colombia | CO | 4.57 | -74.3 | 6312657 | confirmed | 2022-11-25 00:00:00+00:00 | |||
1039 | Colombia | CO | 4.57 | -74.3 | 6312657 | confirmed | 2022-11-26 00:00:00+00:00 | |||
1040 | Colombia | CO | 4.57 | -74.3 | 6312657 | confirmed | 2022-11-27 00:00:00+00:00 | |||
1041 | Colombia | CO | 4.57 | -74.3 | 6312657 | confirmed | 2022-11-28 00:00:00+00:00 |
1042 rows × 10 columns
Por columnas
df_co.columns
Index(['Country', 'CountryCode', 'Province', 'City', 'CityCode', 'Lat', 'Lon',
'Cases', 'Status', 'Date'],
dtype='object')
Cabecera
df_co.head(10)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Country | CountryCode | Province | City | CityCode | Lat | Lon | Cases | Status | Date | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Colombia | CO | 4.57 | -74.3 | 0 | confirmed | 2020-01-22 00:00:00+00:00 | |||
1 | Colombia | CO | 4.57 | -74.3 | 0 | confirmed | 2020-01-23 00:00:00+00:00 | |||
2 | Colombia | CO | 4.57 | -74.3 | 0 | confirmed | 2020-01-24 00:00:00+00:00 | |||
3 | Colombia | CO | 4.57 | -74.3 | 0 | confirmed | 2020-01-25 00:00:00+00:00 | |||
4 | Colombia | CO | 4.57 | -74.3 | 0 | confirmed | 2020-01-26 00:00:00+00:00 | |||
5 | Colombia | CO | 4.57 | -74.3 | 0 | confirmed | 2020-01-27 00:00:00+00:00 | |||
6 | Colombia | CO | 4.57 | -74.3 | 0 | confirmed | 2020-01-28 00:00:00+00:00 | |||
7 | Colombia | CO | 4.57 | -74.3 | 0 | confirmed | 2020-01-29 00:00:00+00:00 | |||
8 | Colombia | CO | 4.57 | -74.3 | 0 | confirmed | 2020-01-30 00:00:00+00:00 | |||
9 | Colombia | CO | 4.57 | -74.3 | 0 | confirmed | 2020-01-31 00:00:00+00:00 |
df_co.tail(10)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Country | CountryCode | Province | City | CityCode | Lat | Lon | Cases | Status | Date | |
---|---|---|---|---|---|---|---|---|---|---|
1032 | Colombia | CO | 4.57 | -74.3 | 6312657 | confirmed | 2022-11-19 00:00:00+00:00 | |||
1033 | Colombia | CO | 4.57 | -74.3 | 6312657 | confirmed | 2022-11-20 00:00:00+00:00 | |||
1034 | Colombia | CO | 4.57 | -74.3 | 6312657 | confirmed | 2022-11-21 00:00:00+00:00 | |||
1035 | Colombia | CO | 4.57 | -74.3 | 6312657 | confirmed | 2022-11-22 00:00:00+00:00 | |||
1036 | Colombia | CO | 4.57 | -74.3 | 6312657 | confirmed | 2022-11-23 00:00:00+00:00 | |||
1037 | Colombia | CO | 4.57 | -74.3 | 6312657 | confirmed | 2022-11-24 00:00:00+00:00 | |||
1038 | Colombia | CO | 4.57 | -74.3 | 6312657 | confirmed | 2022-11-25 00:00:00+00:00 | |||
1039 | Colombia | CO | 4.57 | -74.3 | 6312657 | confirmed | 2022-11-26 00:00:00+00:00 | |||
1040 | Colombia | CO | 4.57 | -74.3 | 6312657 | confirmed | 2022-11-27 00:00:00+00:00 | |||
1041 | Colombia | CO | 4.57 | -74.3 | 6312657 | confirmed | 2022-11-28 00:00:00+00:00 |
df_co.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1042 entries, 0 to 1041
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Country 1042 non-null object
1 CountryCode 1042 non-null object
2 Province 1042 non-null object
3 City 1042 non-null object
4 CityCode 1042 non-null object
5 Lat 1042 non-null float64
6 Lon 1042 non-null float64
7 Cases 1042 non-null int64
8 Status 1042 non-null object
9 Date 1042 non-null datetime64[ns, UTC]
dtypes: datetime64[ns, UTC](1), float64(2), int64(1), object(6)
memory usage: 81.5+ KB
Para una descripción estadística de las variables del df (numero total, media, moda, desviación, mínimo, máximo y cuartiles):
df_co.describe()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Lat | Lon | Cases | |
---|---|---|---|
count | 1.042000e+03 | 1.042000e+03 | 1.042000e+03 |
mean | 4.570000e+00 | -7.430000e+01 | 3.430246e+06 |
std | 2.043791e-14 | 1.464421e-12 | 2.436522e+06 |
min | 4.570000e+00 | -7.430000e+01 | 0.000000e+00 |
25% | 4.570000e+00 | -7.430000e+01 | 8.882092e+05 |
50% | 4.570000e+00 | -7.430000e+01 | 4.109543e+06 |
75% | 4.570000e+00 | -7.430000e+01 | 6.076698e+06 |
max | 4.570000e+00 | -7.430000e+01 | 6.312657e+06 |
Eje X (fechas) y Y (Casos).
Se establecen la fecha como índice.
df_co.set_index('Date')
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Country | CountryCode | Province | City | CityCode | Lat | Lon | Cases | Status | |
---|---|---|---|---|---|---|---|---|---|
Date | |||||||||
2020-01-22 00:00:00+00:00 | Colombia | CO | 4.57 | -74.3 | 0 | confirmed | |||
2020-01-23 00:00:00+00:00 | Colombia | CO | 4.57 | -74.3 | 0 | confirmed | |||
2020-01-24 00:00:00+00:00 | Colombia | CO | 4.57 | -74.3 | 0 | confirmed | |||
2020-01-25 00:00:00+00:00 | Colombia | CO | 4.57 | -74.3 | 0 | confirmed | |||
2020-01-26 00:00:00+00:00 | Colombia | CO | 4.57 | -74.3 | 0 | confirmed | |||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2022-11-24 00:00:00+00:00 | Colombia | CO | 4.57 | -74.3 | 6312657 | confirmed | |||
2022-11-25 00:00:00+00:00 | Colombia | CO | 4.57 | -74.3 | 6312657 | confirmed | |||
2022-11-26 00:00:00+00:00 | Colombia | CO | 4.57 | -74.3 | 6312657 | confirmed | |||
2022-11-27 00:00:00+00:00 | Colombia | CO | 4.57 | -74.3 | 6312657 | confirmed | |||
2022-11-28 00:00:00+00:00 | Colombia | CO | 4.57 | -74.3 | 6312657 | confirmed |
1042 rows × 9 columns
df_co
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Country | CountryCode | Province | City | CityCode | Lat | Lon | Cases | Status | Date | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Colombia | CO | 4.57 | -74.3 | 0 | confirmed | 2020-01-22 00:00:00+00:00 | |||
1 | Colombia | CO | 4.57 | -74.3 | 0 | confirmed | 2020-01-23 00:00:00+00:00 | |||
2 | Colombia | CO | 4.57 | -74.3 | 0 | confirmed | 2020-01-24 00:00:00+00:00 | |||
3 | Colombia | CO | 4.57 | -74.3 | 0 | confirmed | 2020-01-25 00:00:00+00:00 | |||
4 | Colombia | CO | 4.57 | -74.3 | 0 | confirmed | 2020-01-26 00:00:00+00:00 | |||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1037 | Colombia | CO | 4.57 | -74.3 | 6312657 | confirmed | 2022-11-24 00:00:00+00:00 | |||
1038 | Colombia | CO | 4.57 | -74.3 | 6312657 | confirmed | 2022-11-25 00:00:00+00:00 | |||
1039 | Colombia | CO | 4.57 | -74.3 | 6312657 | confirmed | 2022-11-26 00:00:00+00:00 | |||
1040 | Colombia | CO | 4.57 | -74.3 | 6312657 | confirmed | 2022-11-27 00:00:00+00:00 | |||
1041 | Colombia | CO | 4.57 | -74.3 | 6312657 | confirmed | 2022-11-28 00:00:00+00:00 |
1042 rows × 10 columns
Ya que la tabla debe reflejar los casos, se crea una vista para verlos por fechas.
df_co.set_index('Date')['Cases']
Date
2020-01-22 00:00:00+00:00 0
2020-01-23 00:00:00+00:00 0
2020-01-24 00:00:00+00:00 0
2020-01-25 00:00:00+00:00 0
2020-01-26 00:00:00+00:00 0
...
2022-11-24 00:00:00+00:00 6312657
2022-11-25 00:00:00+00:00 6312657
2022-11-26 00:00:00+00:00 6312657
2022-11-27 00:00:00+00:00 6312657
2022-11-28 00:00:00+00:00 6312657
Name: Cases, Length: 1042, dtype: int64
Sobre esta vista creamos el gráfico con la funcion plot.
df_co.set_index('Date')['Cases'].plot()
Matplotlib is building the font cache; this may take a moment.
<AxesSubplot:xlabel='Date'>
Usamos el atributo title para el nombre:
df_co.set_index('Date')['Cases'].plot(title= "Casos de Covid19 en Colombia")
<AxesSubplot:title={'center':'Casos de Covid19 en Colombia'}, xlabel='Date'>
Se repite lo mismo con España, Republica Dominicana y Ecuador
LURL: https://api.covid19api.com/country/spain/status/confirmed/live
Se guardan los datos y añadimos "es" para solo trabajar con este país df_es.
url_es = 'https://api.covid19api.com/country/spain/status/confirmed/live'
df_es = pd.read_json(url_es)
df_es
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Country | CountryCode | Province | City | CityCode | Lat | Lon | Cases | Status | Date | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Spain | ES | 40.46 | -3.75 | 0 | confirmed | 2020-01-22 00:00:00+00:00 | |||
1 | Spain | ES | 40.46 | -3.75 | 0 | confirmed | 2020-01-23 00:00:00+00:00 | |||
2 | Spain | ES | 40.46 | -3.75 | 0 | confirmed | 2020-01-24 00:00:00+00:00 | |||
3 | Spain | ES | 40.46 | -3.75 | 0 | confirmed | 2020-01-25 00:00:00+00:00 | |||
4 | Spain | ES | 40.46 | -3.75 | 0 | confirmed | 2020-01-26 00:00:00+00:00 | |||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1037 | Spain | ES | 40.46 | -3.75 | 13573721 | confirmed | 2022-11-24 00:00:00+00:00 | |||
1038 | Spain | ES | 40.46 | -3.75 | 13595504 | confirmed | 2022-11-25 00:00:00+00:00 | |||
1039 | Spain | ES | 40.46 | -3.75 | 13595504 | confirmed | 2022-11-26 00:00:00+00:00 | |||
1040 | Spain | ES | 40.46 | -3.75 | 13595504 | confirmed | 2022-11-27 00:00:00+00:00 | |||
1041 | Spain | ES | 40.46 | -3.75 | 13595504 | confirmed | 2022-11-28 00:00:00+00:00 |
1042 rows × 10 columns
df_es.set_index('Date')['Cases'].plot(title= "Casos de Covid19 en España")
<AxesSubplot:title={'center':'Casos de Covid19 en España'}, xlabel='Date'>
URL https://api.covid19api.com/country/Dominican%20Republic/status/confirmed/live
Se guardan los datos y añadimos "do" para solo trabajar con este país df_do.
url_do = 'https://api.covid19api.com/country/Dominican%20Republic/status/confirmed/live'
df_do = pd.read_json(url_do)
df_do
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Country | CountryCode | Province | City | CityCode | Lat | Lon | Cases | Status | Date | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Dominican Republic | DO | 18.74 | -70.16 | 0 | confirmed | 2020-01-22 00:00:00+00:00 | |||
1 | Dominican Republic | DO | 18.74 | -70.16 | 0 | confirmed | 2020-01-23 00:00:00+00:00 | |||
2 | Dominican Republic | DO | 18.74 | -70.16 | 0 | confirmed | 2020-01-24 00:00:00+00:00 | |||
3 | Dominican Republic | DO | 18.74 | -70.16 | 0 | confirmed | 2020-01-25 00:00:00+00:00 | |||
4 | Dominican Republic | DO | 18.74 | -70.16 | 0 | confirmed | 2020-01-26 00:00:00+00:00 | |||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1037 | Dominican Republic | DO | 18.74 | -70.16 | 648456 | confirmed | 2022-11-24 00:00:00+00:00 | |||
1038 | Dominican Republic | DO | 18.74 | -70.16 | 649150 | confirmed | 2022-11-25 00:00:00+00:00 | |||
1039 | Dominican Republic | DO | 18.74 | -70.16 | 649150 | confirmed | 2022-11-26 00:00:00+00:00 | |||
1040 | Dominican Republic | DO | 18.74 | -70.16 | 649834 | confirmed | 2022-11-27 00:00:00+00:00 | |||
1041 | Dominican Republic | DO | 18.74 | -70.16 | 649834 | confirmed | 2022-11-28 00:00:00+00:00 |
1042 rows × 10 columns
df_do.set_index('Date')['Cases'].plot(title= "Casos de Covid19 en República Dominicana")
<AxesSubplot:title={'center':'Casos de Covid19 en República Dominicana'}, xlabel='Date'>
URL: https://api.covid19api.com/country/ecuador/status/confirmed/live
Se guardan los datos y añadimos "ec" para solo trabajar con este país df_ec.
url_ec = 'https://api.covid19api.com/country/ecuador/status/confirmed/live'
df_ec = pd.read_json(url_ec)
df_ec
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Country | CountryCode | Province | City | CityCode | Lat | Lon | Cases | Status | Date | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Ecuador | EC | -1.83 | -78.18 | 0 | confirmed | 2020-01-22 00:00:00+00:00 | |||
1 | Ecuador | EC | -1.83 | -78.18 | 0 | confirmed | 2020-01-23 00:00:00+00:00 | |||
2 | Ecuador | EC | -1.83 | -78.18 | 0 | confirmed | 2020-01-24 00:00:00+00:00 | |||
3 | Ecuador | EC | -1.83 | -78.18 | 0 | confirmed | 2020-01-25 00:00:00+00:00 | |||
4 | Ecuador | EC | -1.83 | -78.18 | 0 | confirmed | 2020-01-26 00:00:00+00:00 | |||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1037 | Ecuador | EC | -1.83 | -78.18 | 1009958 | confirmed | 2022-11-24 00:00:00+00:00 | |||
1038 | Ecuador | EC | -1.83 | -78.18 | 1009958 | confirmed | 2022-11-25 00:00:00+00:00 | |||
1039 | Ecuador | EC | -1.83 | -78.18 | 1009958 | confirmed | 2022-11-26 00:00:00+00:00 | |||
1040 | Ecuador | EC | -1.83 | -78.18 | 1009958 | confirmed | 2022-11-27 00:00:00+00:00 | |||
1041 | Ecuador | EC | -1.83 | -78.18 | 1011132 | confirmed | 2022-11-28 00:00:00+00:00 |
1042 rows × 10 columns
df_ec.set_index('Date')['Cases'].plot(title= "Casos de Covid19 en Ecuador")
<AxesSubplot:title={'center':'Casos de Covid19 en Ecuador'}, xlabel='Date'>
En conclusión, esta actividad puso a prueba todos los conocimientos aprendidos durante la materia, desde la utilización de github, programación, web scraping y finalemte visualización de datos, uno de los diferenciales de este ejercicio versus otros metodos de representación grafica es la posibilidad de actualizar el contenido en tiempo real y obtenerlos datos en poco tiempo.