-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
restructuring of ANN for flexibility and performance gains #2
Comments
Hey @beeedy, thank you so much for your advice. I truly believe that the method you advised for simplifying the NN using matrix and vector operations is more feasible than the solution MLKit currently provides. Which is why I will be taking the steps to overhaul the NN and revise it so that later down the road the Metal implementation will become much more easier. Thank you again. I have created a new branch just now if you are interested in contributing or checking out updates for the new NN class. I'll be working on it very soon. |
@iamtrask has a good example of a 3 layer network using this implementation in python that I will paste below. Here is the article itself this example was pulled from: http://iamtrask.github.io/2015/07/12/basic-python-network/ I will take a stab at helping out if/when I am able to find the time to do so!
|
@Somnibyte I looked over your changes during my lunch and they are looking good! When you say you have a NN architecture working based off that tutorial does that mean you have hand writing recognition working? If so, that would be an impressive demonstration of the libraries capabilities and be worth adding as an example! I might make a PR over the weekend if I can find some time to implement some small changes. A ReLU and leaky-ReLU would be good activation functions to add, read more about them here: One thing I noticed is you have weights and biases sort of separated. Now there is nothing wrong with this at all, but I just wanted to bring your attention to one way this is commonly implemented. What you will sometimes see is people adding an extra neuron to each layer that is fixed at a constant value (usually 1.0). This neuron never takes any inputs (or all its input weights are stuck at 0.0) and will always output a value 1.0 to the next layer. This 'special' neuron effectively wraps up all the functionality of a bias without having to explicitly go through and train weights and biases separately. No reason to really go back and change this unless you feel so inclined, just wanted to bring it to your attention as you will most likely run into it implemented in this way at some point :) I will take some time this evening hopefully to play around more in-depth with the changes! All in all, very good work! |
@beeedy Thank you for your feedback! I'm planning on working on a MNIST digit handwriting example where MLKit provides a separate example project where users can draw digits and the example app will try to predict what digit was drawn. Currently, this branch does not include the hand writing example. Thank you for the links on ReLU, I'll definitely give those a good read later on today. Also, good note on the bias implementation. What you described is what I did (sort of) in the last version of MLKit except the programmer had to input a value of 1 by themselves. This version was based on how the tutorial handled bias, the user can manually make the bias values all 1, but to make it easier I'll try to package the weights and bias together soon in an upcoming update. |
First off, what you are working on here is immensely impressive. I just wanted to point out somethings I have learned implementing NN's myself and pass along any possible insight.
If I understand the structure currently, you have an overarching NN class that contains references to a layer class which itself contains references to your final neurons. A possible simplification you may want to look into is to completely eliminate the neurons class and rather represent each layer in the network as a single 2D vector/tensor with dimension mxn where m is the number of neurons in the layer and n is the number of neurons in the previous layer. With this approach you can calculate forward propagation at each layer by taking said layers mxn vector/tensor and dotting it with the previous layers output vector/tensor, resulting in an output vector/tensor that can either be used to feed into the next layer or be used as the output of the network as a whole.
So while this approach simplifies your forward propagation, it also simplifies your back propagation as you can use the same dot product to go back through your layers and calculate the amount by which you should adjust the weights. If you have a layers input vector, it's output error as a vector, and a vector containing the derivative of the activation function at each output, the amount each weight should be adjusted is the dot product of the input vector and (output error vector * derivative value error). Hopefully I explained that well enough, apologizes if not :(
You have eluded to a desire to implement some performance increases using Metal down the road and I feel you may also find dot products are ideal for parallelization on a GPU.
Any who, feel free to ignore this but I just wanted to pass it along. Excited to see where this project ends up!
The text was updated successfully, but these errors were encountered: