---
title: Neural Nets
teaser: 'Neural nets are a common buzzword when it comes to the modern world of programming
  and machine learning. But what is a neural net? Why is it talked about so much?
  How does it work? In this post we are going to learn about all these things and
  more!

  '
tags: neural networks,machine learning
author: Fraser Mince
published_on: 2019-03-06
---

[![xkcd](https://images.thoughtbot.com/blog-vellum-image-uploads/CZHjYUc9QBOTCueCBQIT_machine_learning.png
)](https://xkcd.com/1838/)

Neural nets are a common buzzword when it comes to the modern world of
programming and machine learning. But what is a neural net? Why is it talked
about so much? How does it work? First of all, if you are completely new to
machine learning I recommend [our introduction to machine
learning](https://thoughtbot.com/blog/what-is-machine-learning) which offers
a great general overview.

A neural net is, at its core, a flexible program that can be adapted to solve
many different problems. There are two phases to using a neural net: first it
is trained to recognize patterns in data to solve a specific problem, and then
it can be given new examples of the problem and come up with the right answers.
It is roughly modeled off the way a human brain works. It consists of layers of
nodes that have numeric weights and biases associated with them. As we feed
data and expected answers into this network, we slowly adjust these weights,
giving us better answers over time until the results converge on something good
enough to use in production. This can be used for a large variety of problems,
anywhere from recognizing dogs in pictures, to training a bot to play a video
game, to learning to translate text. The neural net is behind all of these
technologies.

A simple neural net might look something like this:

![Image of a neural net with an input layer, a hidden layer, and an output
layer](https://images.thoughtbot.com/blog-vellum-image-uploads/QSkHQr5PR0iOwErik6rJ_neural_net.png
"Fully Connected Neural Net")

At a high level we will feed in inputs as i<sub>1</sub>, i<sub>2</sub>,
i<sub>3</sub> and it will output an answer in the form of o<sub>1</sub> and
o<sub>2</sub>. We call i<sub>1</sub>, i<sub>2</sub>, and i<sub>3</sub> the
input layer and call o<sub>1</sub>, and o<sub>2</sub> the output layer. Any
layer in between (in this case layer 2) is called a hidden layer. Each arrow
will have a value associated with it called the weight. This number will
determine the importance of that arrow and will be updated over time.
b<sub>1</sub> is the bias for the hidden layer. Instead of influencing a single
node as a weight does it is applied to everything in the layer. b<sub>2</sub>
is also a bias and is a value to influence the whole output layer.

We will initialize each of these arrows with random weights. We then feed in
training data multiplying the inputs by the weights to scale them by
importance.  We then sum these products per the node they came from. We take
the result of this and treat is as a probability of being a certain result. So
the diagram above could have o<sub>1</sub> be probability that a picture
contains a dog and o<sub>2</sub> could be probability that a picture contains a
cat. That way we could see if an image contains either a cat, a dog, or both.
We then compare it to the expected answer and calculate the error between the
expected and the actual. We can then use this error to determine how weights
should be adjusted.  This is done through a process called backpropagation.
This uses the chain rule and treats every node in the neural net as a
mathematical function, taking the partial derivative with respect to the whole
and finding how much effect the node has on the whole. Using backpropagation we
can see in which direction (up or down in this case) we should move each of the
weights and biases. A practical example of this in action might look like this:

![Shows a neural net finding divisions between quadrants of differently colored
data.](https://images.thoughtbot.com/blog-vellum-image-uploads/LwQl33tRk6LOvICcLzJy_cropped-playground.gif
"Playground Gif")

Notice that given an input of a horizontal divider and a vertical divider, it
is able to fit to the expected shape over time. The background color of the
output box represents what the network is predicting and the dots themselves
are the data that we are feeding in. Also notice that the size of the
connecting lines throughout the net are changing over time. This represents how
the weights of the inputs are changing as we feed data in. Blue represent
positive weights and orange represents negative. You can see that in the hidden
layer there are different shapes that are being looked for with different
weights attached to them. Then, by combining those shapes with different
weights we can get new shapes. In this example we have several negative and
positive lines of pretty equal strength going into the final layer and also
have the top output layer with a strong positive and three strong negatives.
However we can also have more straightforward example like this:

![Shows a straight line being placed to divide to groups of data of different
kinds.](https://images.thoughtbot.com/blog-vellum-image-uploads/BZWK4HhQREaj9a278ec2_cropped-straightline.gif
"Straightline Playground Gif")

Here we reach the result much faster and it’s clear that some of the shapes are
being relied upon much more than the others. Feel free to play around with this
[here](https://playground.tensorflow.org).

Finally, I want to use the steps described above and walk through a simple and
slightly contrived example. Let’s say we want to train a neural net to tell if
a number is even. It could look like this

![A simpler neural net with 1 and 0 as the inputs]
(https://images.thoughtbot.com/blog-vellum-image-uploads/72XnWgzOQoK4YaHKQreN_two_neural_net.png
"Second Neural Net")

Where the inputs are the binary representation of the number. We will have a 1
represent true (that a number is even) and a zero represent false (that a
number is odd). So if the answer is closer to 1 that means it has a higher
probability of being even and if it’s closer to 0 it has a higher probability
that it’s odd. Let’s examine what a single forward pass of this network looks
like. We start by randomly initializing the weights of the neural net as well
as biases.

![Same neural net as above but now with weights and biases]
(https://images.thoughtbot.com/blog-vellum-image-uploads/IDVnePgQHKNfWWiIM1yA_two_neural_net_weights.png
"Weighted Neural Net")

The actual equation for calculating the hidden layer with these weights is:

![Equation for hidden layer values in linear algebra form. We take the inputs
and multiply them by the hidden weights and add the bias.]
(https://images.thoughtbot.com/blog-vellum-image-uploads/XWR1rkgDRvOsM5QofTBL_hidden.png
"Hidden Equation")

With the inputs transposed so it’s 1 x 2 matrix instead of a 2 x 1 vector,
which when worked out would also look like this:

![Equation for hidden 1 extracted from the linear algebra equation]
(https://images.thoughtbot.com/blog-vellum-image-uploads/r2vJoFEQLWBwXWoCFfbW_h1.png
"Hidden 1") ![Equation for hidden 2 extracted from the linear algebra equation]
(https://images.thoughtbot.com/blog-vellum-image-uploads/gcIvwBhNTmCjqc7H28Ld_h2.png
"Hidden 2")

Once we have calculated these values, we want to scale them to ensure that they
will be between zero and one. This is not technically necessary here because
our inputs will always be either 0 or 1, but for teaching purposes I will walk
through it. In this case we will be using the logistic function, which looks
like this:

![Logistic Function]
(https://images.thoughtbot.com/blog-vellum-image-uploads/iNNHIaQvRu9acEH3WA8a_logistic.png
"Logistic")

We use this function to scale h<sub>1</sub> and h<sub>2</sub>:

![Result of applying the logistic function to the h1 result]
(https://images.thoughtbot.com/blog-vellum-image-uploads/nRLxzz65TR60vk8mudmE_logistic_h1.png
"Logistic H1")

![Result of applying the logistic function to the h2 result]
(https://images.thoughtbot.com/blog-vellum-image-uploads/l1rH130RMmHBrKhlrRBk_logistic_h2.png
"Logistic H2")

We then use these with the values of h<sub>1</sub> and h<sub>2</sub> to provide
us the correct answer. We do the same process to find o<sub>1</sub>.

![Final result]
(https://images.thoughtbot.com/blog-vellum-image-uploads/vXlrK7SNS3e5nbFQY5OD_output.png
"Output")

Once again we transpose the result from last time and end up getting our
result. After applying the logistic function to that 0.73681285 we get
0.676298522. This is our prediction. 0.676298522 is closer to 1 than zero. So
we are predicting based on our randomized weights that the number one is even.
Note the closer we are to 1 or 0 the stronger we are claiming the probability
of being that respective answer is. So this is not a strong prediction.
Obviously this is not correct. However, we have made a prediction. This
concludes the forward pass of our neural net. From here we calculate the error
given the expected answer and the prediction and adjust weights based on this
answer. I will not delve into the backward pass in this article, but if you are
interested
[this](https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/)
is a great place to get started.

Included are some links that are great resources for learning more about
machine learning:

[A Neural Network Playground](https://playground.tensorflow.org)

[A Step By Step Backpropagation Example](https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/)

[Under The Hood Of Neural Network Forward Propagation - The Dreaded Matrix Multiplication](https://towardsdatascience.com/under-the-hood-of-neural-network-forward-propagation-the-dreaded-matrix-multiplication-a5360b33426)

[The fast.ai Course](https://course.fast.ai)