Class Activity¶
Week 1¶
For this activity, we'll be using a dataset called iris
which is available from the UC Irvine Machine Learning Repository at http://archive.ics.uci.edu/ml/datasets/Iris and is great for extra practice.
ANSWER BELOW
+import pandas as pd
+
+
+csv_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
+# using the attribute information as the column names
+col_names = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width','Class']
+iris = pd.read_csv(csv_url, names=col_names)
+iris
+
+ | Sepal_Length | +Sepal_Width | +Petal_Length | +Petal_Width | +Class | +
---|---|---|---|---|---|
0 | +5.1 | +3.5 | +1.4 | +0.2 | +Iris-setosa | +
1 | +4.9 | +3.0 | +1.4 | +0.2 | +Iris-setosa | +
2 | +4.7 | +3.2 | +1.3 | +0.2 | +Iris-setosa | +
3 | +4.6 | +3.1 | +1.5 | +0.2 | +Iris-setosa | +
4 | +5.0 | +3.6 | +1.4 | +0.2 | +Iris-setosa | +
... | +... | +... | +... | +... | +... | +
145 | +6.7 | +3.0 | +5.2 | +2.3 | +Iris-virginica | +
146 | +6.3 | +2.5 | +5.0 | +1.9 | +Iris-virginica | +
147 | +6.5 | +3.0 | +5.2 | +2.0 | +Iris-virginica | +
148 | +6.2 | +3.4 | +5.4 | +2.3 | +Iris-virginica | +
149 | +5.9 | +3.0 | +5.1 | +1.8 | +Iris-virginica | +
150 rows × 5 columns
+Use info() to see some more information about the data set
iris.info()
+
<class 'pandas.core.frame.DataFrame'> +RangeIndex: 150 entries, 0 to 149 +Data columns (total 5 columns): + # Column Non-Null Count Dtype +--- ------ -------------- ----- + 0 Sepal_Length 150 non-null float64 + 1 Sepal_Width 150 non-null float64 + 2 Petal_Length 150 non-null float64 + 3 Petal_Width 150 non-null float64 + 4 Class 150 non-null object +dtypes: float64(4), object(1) +memory usage: 6.0+ KB ++
Note the float64
and object
below the Dtype
column. Use Google to find out what those mean.
+
float64
: ...
+
object
: ...
+
Dtype
: ...
A sepal is the (typically) green part of the flower. It serves as protection for the flower while it is in bud, and often as support for the petals when in bloom. Source: Wikipedia
+ +Use Altair to create a scatter plot to explore the relationship between Petal_Length and Sepal_Length of only the Virginica Iris species. Put Petal_Length on the Y-axis and Sepal_Length on the x-axis. Give your axis labels human readable names.
+import altair as alt
+
ANSWER BELOW
+virginica = iris[iris["Class"]=="Iris-virginica"]
+
+iris_plot = alt.Chart(virginica).mark_point().encode(
+ alt.X('Sepal_Length')
+ .title('Sepal Length')
+ .scale(zero=False),
+ alt.Y('Petal_Length')
+ .title('PetalLength')
+ .scale(zero=False)
+)
+
+iris_plot
+
Using the visualization we made, what can we say about the relationship between these variables?¶
+ANSWER BELOW
+-
+
There is a positive relationship between petal length and sepal length in the virginica species. We can say this because as sepal length increases, so does petal length.
+
+Because a straight line would fit nicely between the points we would also say its linear.
+
+Because all the points are tightly clustered around where we would draw a straight line, we would also say that it is a strong relationship.
+
+