df_profession.tail()
df_profession_category.tail()
-
+
@@ -722,7 +722,7 @@
df_age
-
+
@@ -786,7 +786,7 @@
df_geography.tail()
-
+
@@ -847,7 +847,7 @@ = df_profession[df_profession['Code'] < 10].index
indices_to_drop =True)
df_profession.drop(indices_to_drop, inplace df_profession
-
+
@@ -1117,7 +1117,7 @@ sum()
df_profession.isna().sum()
df_profession_category.isna().sum() df_age.isna().
-
+
age_group 0
GPGmedian 0
GPGmean 0
@@ -1128,7 +1128,7 @@
# Let's plot the mean and median Gender Pay Gap (GPG)
=['GPGmedian', 'GPGmean']) df_profession.boxplot(column
-
+
<Axes: >
@@ -1139,7 +1139,7 @@
# Let's look at the distribution of the values in the columns
df_profession.describe()
-
+
@@ -1210,7 +1210,7 @@
# Let's try to visualise what's going on with a histogram - what type of skew do you notice?
'GPGmedian']].plot(kind='hist', ec='black') df_profession[[
-
+
<Axes: ylabel='Frequency'>
@@ -1230,26 +1230,26 @@ =600,
width=400
height )
-
+
-
+
-What’s that?!
+Wait, what’s that?! That’s not what we were expecting!
-Because the Earth is round, and maps are flat, geospatial data needs to be “projected”. There are many types of projecting geospatial data, and all of them come with some tradeoff in terms of distorting area and/or distance (in other words, none of them are perfect). You can read more here.
-Now, the geospatial dataset that we are using for this notebook was downloaded from and uses a Coordinate Reference System (CRS) known as EPSG:27700 - OSGB36 / British National Grid. Regretfully, Altair works with a different CRS: WGS 84 (also known as epsg:4326), and this is creating the conflict.
-We have two options: either reproject our data using geopandas, or according to Altair documentation try using the project configuration `(type: ‘identity’, reflectY’: True)``. It draws the geometries without applying a projection.
+Because the Earth is round, and maps are flat, geospatial data needs to be “projected”. There are many types of projecting geospatial data, and all of them come with some tradeoff in terms of distorting area and/or distance (in other words, none of them are perfect). You can read more here.
+Now, the geospatial dataset that we are using for this notebook was downloaded from the Office for National Statistics’ Geoportal and uses a Coordinate Reference System (CRS) known as EPSG:27700 - OSGB36 / British National Grid
. Regretfully, Altair works with a different CRS: WGS 84
(also known as epsg:4326
), and this is creating the conflict.
+We have two options: either reproject our data using geopandas, or according to Altair documentation try using the project configuration (type: 'identity', reflectY': True)
. It draws the geometries without applying a projection.
@@ -1704,26 +1716,26 @@ =True
reflectY
) pre_GPG_England
-
+
-
+
df_profession_category.tail()
df_age
-
+
@@ -786,7 +786,7 @@
df_geography.tail()
-
+
@@ -847,7 +847,7 @@ = df_profession[df_profession['Code'] < 10].index
indices_to_drop =True)
df_profession.drop(indices_to_drop, inplace df_profession
-
+
@@ -1117,7 +1117,7 @@ sum()
df_profession.isna().sum()
df_profession_category.isna().sum() df_age.isna().
-
+
age_group 0
GPGmedian 0
GPGmean 0
@@ -1128,7 +1128,7 @@
# Let's plot the mean and median Gender Pay Gap (GPG)
=['GPGmedian', 'GPGmean']) df_profession.boxplot(column
-
+
<Axes: >
@@ -1139,7 +1139,7 @@
# Let's look at the distribution of the values in the columns
df_profession.describe()
-
+
@@ -1210,7 +1210,7 @@
# Let's try to visualise what's going on with a histogram - what type of skew do you notice?
'GPGmedian']].plot(kind='hist', ec='black') df_profession[[
-
+
<Axes: ylabel='Frequency'>
@@ -1230,26 +1230,26 @@ =600,
width=400
height )
-
+
-
+
-What’s that?!
+Wait, what’s that?! That’s not what we were expecting!
-Because the Earth is round, and maps are flat, geospatial data needs to be “projected”. There are many types of projecting geospatial data, and all of them come with some tradeoff in terms of distorting area and/or distance (in other words, none of them are perfect). You can read more here.
-Now, the geospatial dataset that we are using for this notebook was downloaded from and uses a Coordinate Reference System (CRS) known as EPSG:27700 - OSGB36 / British National Grid. Regretfully, Altair works with a different CRS: WGS 84 (also known as epsg:4326), and this is creating the conflict.
-We have two options: either reproject our data using geopandas, or according to Altair documentation try using the project configuration `(type: ‘identity’, reflectY’: True)``. It draws the geometries without applying a projection.
+Because the Earth is round, and maps are flat, geospatial data needs to be “projected”. There are many types of projecting geospatial data, and all of them come with some tradeoff in terms of distorting area and/or distance (in other words, none of them are perfect). You can read more here.
+Now, the geospatial dataset that we are using for this notebook was downloaded from the Office for National Statistics’ Geoportal and uses a Coordinate Reference System (CRS) known as EPSG:27700 - OSGB36 / British National Grid
. Regretfully, Altair works with a different CRS: WGS 84
(also known as epsg:4326
), and this is creating the conflict.
+We have two options: either reproject our data using geopandas, or according to Altair documentation try using the project configuration (type: 'identity', reflectY': True)
. It draws the geometries without applying a projection.
@@ -1704,26 +1716,26 @@ =True
reflectY
) pre_GPG_England
-
+
-
+
df_age
df_geography.tail()
-
+
@@ -847,7 +847,7 @@ = df_profession[df_profession['Code'] < 10].index
indices_to_drop =True)
df_profession.drop(indices_to_drop, inplace df_profession
-
+
@@ -1117,7 +1117,7 @@ sum()
df_profession.isna().sum()
df_profession_category.isna().sum() df_age.isna().
-
+
age_group 0
GPGmedian 0
GPGmean 0
@@ -1128,7 +1128,7 @@
# Let's plot the mean and median Gender Pay Gap (GPG)
=['GPGmedian', 'GPGmean']) df_profession.boxplot(column
-
+
<Axes: >
@@ -1139,7 +1139,7 @@
# Let's look at the distribution of the values in the columns
df_profession.describe()
-
+
@@ -1210,7 +1210,7 @@
# Let's try to visualise what's going on with a histogram - what type of skew do you notice?
'GPGmedian']].plot(kind='hist', ec='black') df_profession[[
-
+
<Axes: ylabel='Frequency'>
@@ -1230,26 +1230,26 @@ =600,
width=400
height )
-
+
-
+
-What’s that?!
+Wait, what’s that?! That’s not what we were expecting!
-Because the Earth is round, and maps are flat, geospatial data needs to be “projected”. There are many types of projecting geospatial data, and all of them come with some tradeoff in terms of distorting area and/or distance (in other words, none of them are perfect). You can read more here.
-Now, the geospatial dataset that we are using for this notebook was downloaded from and uses a Coordinate Reference System (CRS) known as EPSG:27700 - OSGB36 / British National Grid. Regretfully, Altair works with a different CRS: WGS 84 (also known as epsg:4326), and this is creating the conflict.
-We have two options: either reproject our data using geopandas, or according to Altair documentation try using the project configuration `(type: ‘identity’, reflectY’: True)``. It draws the geometries without applying a projection.
+Because the Earth is round, and maps are flat, geospatial data needs to be “projected”. There are many types of projecting geospatial data, and all of them come with some tradeoff in terms of distorting area and/or distance (in other words, none of them are perfect). You can read more here.
+Now, the geospatial dataset that we are using for this notebook was downloaded from the Office for National Statistics’ Geoportal and uses a Coordinate Reference System (CRS) known as EPSG:27700 - OSGB36 / British National Grid
. Regretfully, Altair works with a different CRS: WGS 84
(also known as epsg:4326
), and this is creating the conflict.
+We have two options: either reproject our data using geopandas, or according to Altair documentation try using the project configuration (type: 'identity', reflectY': True)
. It draws the geometries without applying a projection.
@@ -1704,26 +1716,26 @@ =True
reflectY
) pre_GPG_England
-
+
-
+
df_geography.tail()
= df_profession[df_profession['Code'] < 10].index
indices_to_drop =True)
df_profession.drop(indices_to_drop, inplace df_profession
= df_profession[df_profession['Code'] < 10].index
indices_to_drop =True)
df_profession.drop(indices_to_drop, inplace df_profession
sum() df_profession.isna().sum() df_profession_category.isna().sum() df_age.isna().
age_group 0
GPGmedian 0
GPGmean 0
@@ -1128,7 +1128,7 @@
# Let's plot the mean and median Gender Pay Gap (GPG)
=['GPGmedian', 'GPGmean']) df_profession.boxplot(column
-
+
<Axes: >
@@ -1139,7 +1139,7 @@
# Let's look at the distribution of the values in the columns
df_profession.describe()
-
+
@@ -1210,7 +1210,7 @@
# Let's try to visualise what's going on with a histogram - what type of skew do you notice?
'GPGmedian']].plot(kind='hist', ec='black') df_profession[[
-
+
<Axes: ylabel='Frequency'>
@@ -1230,26 +1230,26 @@ =600,
width=400
height )
-
+
-
+
-What’s that?!
+Wait, what’s that?! That’s not what we were expecting!
-Because the Earth is round, and maps are flat, geospatial data needs to be “projected”. There are many types of projecting geospatial data, and all of them come with some tradeoff in terms of distorting area and/or distance (in other words, none of them are perfect). You can read more here.
-Now, the geospatial dataset that we are using for this notebook was downloaded from and uses a Coordinate Reference System (CRS) known as EPSG:27700 - OSGB36 / British National Grid. Regretfully, Altair works with a different CRS: WGS 84 (also known as epsg:4326), and this is creating the conflict.
-We have two options: either reproject our data using geopandas, or according to Altair documentation try using the project configuration `(type: ‘identity’, reflectY’: True)``. It draws the geometries without applying a projection.
+Because the Earth is round, and maps are flat, geospatial data needs to be “projected”. There are many types of projecting geospatial data, and all of them come with some tradeoff in terms of distorting area and/or distance (in other words, none of them are perfect). You can read more here.
+Now, the geospatial dataset that we are using for this notebook was downloaded from the Office for National Statistics’ Geoportal and uses a Coordinate Reference System (CRS) known as EPSG:27700 - OSGB36 / British National Grid
. Regretfully, Altair works with a different CRS: WGS 84
(also known as epsg:4326
), and this is creating the conflict.
+We have two options: either reproject our data using geopandas, or according to Altair documentation try using the project configuration (type: 'identity', reflectY': True)
. It draws the geometries without applying a projection.
@@ -1704,26 +1716,26 @@ =True
reflectY
) pre_GPG_England
-
+
-
+