From bbb78e2bd2cf6009baa1659b3308f37fb132099a Mon Sep 17 00:00:00 2001 From: Andrew DalPino Date: Wed, 27 Jan 2021 04:09:48 -0600 Subject: [PATCH] Fix links to docs --- LICENSE | 4 ++-- README.md | 10 +++++----- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/LICENSE b/LICENSE index 60c1e93..49a0d31 100644 --- a/LICENSE +++ b/LICENSE @@ -1,7 +1,7 @@ MIT License -Copyright (c) 2020 Rubix ML -Copyright (c) 2020 Andrew DalPino +Copyright (c) 2021 Rubix ML +Copyright (c) 2021 Andrew DalPino Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal diff --git a/README.md b/README.md index d27904d..e66f049 100644 --- a/README.md +++ b/README.md @@ -16,12 +16,12 @@ $ composer create-project rubix/iris ## Tutorial ### Introduction -The Iris dataset consists of 50 samples for each of three species of Iris flower - Iris setosa, Iris virginica, and Iris versicolor (pictured below). Each sample is comprised of 4 measurements or *features* - sepal length, sepal width, petal length, and petal width. Our objective is to train a [K Nearest Neighbors](https://docs.rubixml.com/classifiers/k-nearest-neighbors.html) (KNN) classifier to determine the species of Iris flower from a set of unknown test samples using the Iris dataset. Let's get started! +The Iris dataset consists of 50 samples for each of three species of Iris flower - Iris setosa, Iris virginica, and Iris versicolor (pictured below). Each sample is comprised of 4 measurements or *features* - sepal length, sepal width, petal length, and petal width. Our objective is to train a [K Nearest Neighbors](https://docs.rubixml.com/latest/classifiers/k-nearest-neighbors.html) (KNN) classifier to determine the species of Iris flower from a set of unknown test samples using the Iris dataset. Let's get started! ![Iris Flower Species](https://raw.githubusercontent.com/RubixML/Iris/master/docs/images/iris-species.png) ### Extracting the Data -The first step is to extract the Iris dataset from the `dataset.ndjson` file in our project folder into our training script. You'll notice that we've provided the Iris dataset in CSV (Comma-separated Values) format as well. This is strictly for convenience in case you wanted to view the dataset in your favorite spreadsheet software. To instantiate a new [Labeled](https://docs.rubixml.com/datasets/labeled.html) dataset object we'll pass an [NDJSON](https://docs.rubixml.com/extractors/ndjson.html) extractor pointing to the dataset file in our project folder to the `fromIterator()` factory method. The factory uses the last column of the data table for the labels and the rest of the columns for the values of the sample features. We'll call this our *training* set. +The first step is to extract the Iris dataset from the `dataset.ndjson` file in our project folder into our training script. You'll notice that we've provided the Iris dataset in CSV (Comma-separated Values) format as well. This is strictly for convenience in case you wanted to view the dataset in your favorite spreadsheet software. To instantiate a new [Labeled](https://docs.rubixml.com/latest/datasets/labeled.html) dataset object we'll pass an [NDJSON](https://docs.rubixml.com/latest/extractors/ndjson.html) extractor pointing to the dataset file in our project folder to the `fromIterator()` factory method. The factory uses the last column of the data table for the labels and the rest of the columns for the values of the sample features. We'll call this our *training* set. > **Note:** The source code for this example can be found in the [train.php](https://github.com/RubixML/Iris/blob/master/train.php) file in project root. @@ -39,7 +39,7 @@ $testing = $dataset->randomize()->take(10); ``` ### Instantiating the Learner -Next, we'll instantiate the [K Nearest Neighbors](https://docs.rubixml.com/classifiers/k-nearest-neighbors.html) classifier and choose the value of the `k` hyper-parameter. Hyper-parameters are constructor parameters that effect the behavior of the learner during training and inference. KNN is a distance-based algorithm that finds the *k* closest samples from the training set and predicts the label that is most common. For example, if we choose `k` equal to 5, then we may get 4 labels that are `Iris setosa` and 1 that is `Iris virginica`. In this case, the estimator would predict Iris-setosa because that is the most common label. To instantiate the learner, pass the value of hyper-parameter `k` to the constructor of the learner. Refer to the docs for more info on KNN's additional hyper-parameters. +Next, we'll instantiate the [K Nearest Neighbors](https://docs.rubixml.com/latest/classifiers/k-nearest-neighbors.html) classifier and choose the value of the `k` hyper-parameter. Hyper-parameters are constructor parameters that effect the behavior of the learner during training and inference. KNN is a distance-based algorithm that finds the *k* closest samples from the training set and predicts the label that is most common. For example, if we choose `k` equal to 5, then we may get 4 labels that are `Iris setosa` and 1 that is `Iris virginica`. In this case, the estimator would predict Iris-setosa because that is the most common label. To instantiate the learner, pass the value of hyper-parameter `k` to the constructor of the learner. Refer to the docs for more info on KNN's additional hyper-parameters. ```php use Rubix\ML\Classifiers\KNearestNeighbors; @@ -66,7 +66,7 @@ During inference, the KNN algorithm interprets the features of the samples as sp ![Iris Dataset 3D Plot](https://raw.githubusercontent.com/RubixML/Iris/master/docs/images/iris-dataset-3d-plot.png) ### Validation Score -We can test the model generated during training by comparing the predictions it makes to the ground-truth labels from the testing set. We'll need to choose a cross validation [Metric](https://docs.rubixml.com/cross-validation/metrics/api.html) to output a score that we'll interpret as the generalization ability of our newly trained estimator. The [Accuracy](https://docs.rubixml.com/cross-validation/metrics/accuracy.html) is a simple classification metric that ranges from 0 to 1 and is calculated as the number of correct predictions to the total number of predictions. To obtain the accuracy score, pass the predictions we generated from the model earlier along with the labels from the testing set to the `score` method on the metric instance. +We can test the model generated during training by comparing the predictions it makes to the ground-truth labels from the testing set. We'll need to choose a cross validation [Metric](https://docs.rubixml.com/latest/cross-validation/metrics/api.html) to output a score that we'll interpret as the generalization ability of our newly trained estimator. The [Accuracy](https://docs.rubixml.com/latest/cross-validation/metrics/accuracy.html) is a simple classification metric that ranges from 0 to 1 and is calculated as the number of correct predictions to the total number of predictions. To obtain the accuracy score, pass the predictions we generated from the model earlier along with the labels from the testing set to the `score` method on the metric instance. ```php use Rubix\ML\CrossValidation\Metrics\Accuracy; @@ -90,7 +90,7 @@ Accuracy is 90% ``` ### Next Steps -Congratulations on completing the introduction to machine learning in PHP with Rubix ML using the Iris dataset. Now you're ready to experiment on your own. For example, you may want to try different values of `k` or swap out the default [Euclidean](https://docs.rubixml.com/kernels/distance/euclidean.html) distance kernel for another one such as [Manhattan](https://docs.rubixml.com/kernels/distance/manhattan.html) or [Minkowski](https://docs.rubixml.com/kernels/distance/minkowski.html). +Congratulations on completing the introduction to machine learning in PHP with Rubix ML using the Iris dataset. Now you're ready to experiment on your own. For example, you may want to try different values of `k` or swap out the default [Euclidean](https://docs.rubixml.com/latest/kernels/distance/euclidean.html) distance kernel for another one such as [Manhattan](https://docs.rubixml.com/latest/kernels/distance/manhattan.html) or [Minkowski](https://docs.rubixml.com/latest/kernels/distance/minkowski.html). ## Original Dataset Creator: Ronald Fisher