Making a neural network with Numpy



Welcome, in this blog I will teach you to program a neural network in a few code lines using only the Numpy library.

It is recommended that you have previous basic knowledge of python, numpy and a bit of lieal algebra.

What is NumPy?

NumPy is a library for the Python programming language that supports creating large multidimensional arrays and vectors, along with a large collection of high-level mathematical functions to operate on them.
More info Here.



  • Python 3
    First we need to install python, any version> 3.5 will do, in this case we will use Python 3.8
    On another occasion I will blog on how to install python correctly, for now you should find out how to do it.
  • A code editor
    You can use the one you like the most, here some examples:


We will use a virtual environment to facilitate things using virtualenv that is already included in the python installation.

For create a virtual environment run this in your terminal.

# We first create our project folder.
mkdir neural_network 

# We move to the folder.
cd neural_network

# This creates the virtual environment in my_env folder
python3.8 -m venv my_env 

For activate the virtual environment run.

source my_env/bin/activate

Now we are going to install the Numpy library in the virtual environment.

# upgrade the pip
pip install -u pip

#install numpy library
pip install numpy


Neural network and perceptron

A neural network is made up of functions like artificial neurons or perceptrons.RedNeuronalEsquema1
A perceptron is composed of a linear function, and an activation function. neurona
Mathematically the perceptron would look like this:

  • y = f(wx + b)
  • "x" is the data or signal it receives from another neuron
  • "w" is the weight that the signal received will have.
  • "b" is the bias of the neuron that corrects the prediction, in a linear equation "y" = "b" when all "x" is equal to 0.
    In our implementation we will calculate the predictions in this way.
  • "f()", is the activation function, then we explain about this

The variables "w" and "b" control the result of the output "y". See this example


  • "X,W,B" are matrixs
  • "m" is the input examples number
  • "i" is the inputs number
  • "n" is the outputs number


Open your code editor, create the file "" and write this.

class Layer:
    def __init__(self, neurons):
        self.neurons:int = neurons # the layer neurons
        self.shape:tuple = None # input shape
        self.W = None # the wight matrix, shape (inputs, neurons)
        self.B = None # the bias matrix,  shape (1, neurons)

    def forward(self, X): # output shape (input_examples, neurons)
        if self.shape is None:
        return np.matmul(X, self.W) + self.B # matrix multiplication

    def init_layer(self, X):
        self.shape = (X.shape[-1], self.neurons) # X.shape (input_examples, input_data_size)
        # create initial parameters
        self.W = np.random.rand(*self.shape) * 2 - 1
        self.B = np.random.rand(1, self.neurons) * 2 - 1

    def __call__(self, X, training=False):
        Y = self.forward(X)
        if training: # if the model is training
            self.X = X
            self.Y = Y
        return Y

This is the implementation of a layer, a layer is a set of neurons receiving the same inputsRedNeuronalEsquema1-1
But, this is layers of linear functions, only it can do linear regressions.
For make a neural network, the layers need activation functions. we will implement it.

Activation function

First we implement the activation functions, this are the sigmoid activation and tanh activation



We implement it:

# a dict for select a activation function
activations = {
    'sigmoid': {
        'function': lambda X: 1 / (1 + np.e**(-X)),
        'derivative': lambda X: np.e**(-X) / ((1 + np.e**(-X))**2)
    'tanh': {
        'function': lambda X: np.tanh(X),
        'derivative': lambda X: 1 - np.tanh(X) ** 2
    'relu': {
        'function': lambda X: ((X / (X.max() + 1)) * (X > 0)),
        'derivative': lambda X: 1 * (X >= 0)

Now, the Layer with activation function.

class Dense(Layer): # this inherits from the Layer class
    def __init__(self, neurons, activation='sigmoid'): # sigmoid activations for default
        super().__init__(neurons) # init the layer 
        self.activation:dict = activations[activation] # get specific activations functions

    def forward(self, X):
        Y = super().forward(X)
        return self.activation['function'](Y) # calculate with activations function
Neural Network Model

Finally, the neural network.

class Model:
    def __init__(self, layers:list=[]):
        self.layers:list = layers

    def add(self, layer: Layer):# add a Layer

    def forward(self, X, training=False): # forward in all layers
        Y = X
        for layer in self.layers:
            Y = layer(Y, training)
        return Y

    def __call__(self, X, training=False):
        return self.forward(X, training)

Testing our model.

In other file "" copy this:

import unittest
import numpy as np
from neural_network import Dense, Layer, Model

def testing_model_outputs(model: Model, condition):
    examples: int = np.random.randint(512) + 1  # random number of examples between 1-512
    inputs: int = np.random.randint(256) + 1  # random number of inputs between 1-256
    X = np.random.rand(examples, inputs)
    Y = model(X)
    assertion = condition(X, Y)
    return assertion

def add_many_layers_to_model(model: Model, layer:Layer):
    layers = np.random.randint(10) + 1  # random number of layers between 1-10
    for _ in range(layers):
        neurons = np.random.randint(128) + 1

class TestNeuralNetwork(unittest.TestCase):
    def test_model_without_layers(self):
        The model has no intermediate layers, the input and output must be the same 
        because the data was processed
        assert testing_model_outputs(
            # input and output are the same, so their subtraction is 0
            condition=lambda X, Y: np.mean(Y - X) == 0

    def test_model_number_of_examples_constant(self):
        The input and the output are different but must have the same number of examples.
        If the input is from (m, i) the output must be from (m, n)
        model = Model([])
        add_many_layers_to_model(model, Layer) # let's pass the Layer class as an argument
        assert testing_model_outputs(
            # X.shape[0] is the number of examples
            condition=lambda X, Y: Y.shape[0] == (X.shape[0])

    def test_model_output_shape(self):
        if you have an input with the form (m, i) and the last layer has "n" neurons, 
        the output must have the form (m, n)
        model = Model([])
        add_many_layers_to_model(model, Layer) # let's pass the Layer class as an argument
        last_layer_neurons = 3
        assert testing_model_outputs(
            # X.shape[0] is the number of examples of the input
            condition=lambda X, Y: Y.shape == (X.shape[0], last_layer_neurons)
if __name__ == "__main__":

Don't panic!, this is just a test to see if the model works.
In this file 3 tests are carried out:

  • test_model_without_layers: the model has no layers, the input is not rendered, so the model returns the input as output.

  • test_model_number_of_examples_constant: the models have a random number of layers, but the number of examples must be the same, for example, if it receives 4 mathematical exercises it will deliver 4 mathematical exercises

  • test_model_output_shape: in the previous test we saw that the number of examples is constant, but each example must have the same number of outputs as the number of neurons in the last layer.




Ran 3 tests in 0.009s

This means it works. As an exercise you can add a method to the TestNeuralNetwork class and create a model with Dense layers, the model must return random values between 0-1
You can also initialize the Dense layers with a second argument "tanh" if the last layer has this parameter it will return values between -1 to 1

Training the model

We have the neural network implemented, now we need to train it.

Calculation of losses

We create a new file in the same folder "" and we start typing this:

import time
import numpy as np

# we import our model
from neural_network import Model

def mse(Y, target):
    return np.mean((target - Y) ** 2, axis=1)

def d_mse(Y, target):
    return (2 * (Y - target)) / Y.shape[1]

"mse" mean square error, the error calculation model

  • "m", is the output examples number.
  • "t", targets
  • "y", output prediction for the model

Calculation of gradients

Now type this in the same file:

def train_step(model: Model, X: np.ndarray, target: np.ndarray, lr: float):
Y = model(X, training=True) # model prediction
    acc = np.int64(target.argmax(axis=1) == Y.argmax(axis=1)).mean() # accuracy
    loss: np.ndarray = mse(Y, target) # prediction loss
    gradients = [] # gradients 
    for layer in reversed(model.layers): # Back-propagation
        derivative = layer.activation['derivative'] # activation derivative
        if layer == model.layers[-1]:
            fault: np.ndarray = d_mse(Y, target)
        # calculating the fault of the neuron
        fault = fault * derivative(layer.Y)
        # Gradient calculation
        gradient: np.ndarray = np.matmul(fault.T, layer.X).T
        b_gradient: np.ndarray = np.matmul(
            fault.T, np.ones((X.shape[0], 1))).T
        gradients.append((gradient, b_gradient)) # save to gradients
        # update fault for back layer
        fault: np.ndarray = np.matmul(fault, layer.W.T)

    for layer, gradient in zip(reversed(model.layers), gradients):
        gradient, b_gradient = gradient
        layer.W = layer.W - lr * gradient # Update weights
        layer.B = layer.B - lr * b_gradient # Update bias

    return loss.mean(), acc

Important functions and variables:

  • gradient, is a vector that indicates where the new value of the weights "W" should move to
  • b_gradient, gradients for the bias "B"
  • update_parameters, this functions update the "W" and "B using the gradients"
  • lr learning rate, is the step size that the weights "W" travel to approach the point of least loss

Gradient calculations are very complex. It is calculated in this way.


Training cycles

Type this code:

def fit(model: Model, X: np.ndarray, targets: np.ndarray,
        batch_size: int, epochs: int, learning_rate: float = 0.0001):
    # X and targets must have the same number of examples
    assert X.shape[0] == targets.shape[0]
    for epoch in range(epochs):

        last_examples = len(X) % batch_size
        # batching the inputs
        x = list(X[last_examples:].reshape(
            len(X) // batch_size, -1, X.shape[-1]))
        if X[:last_examples].size > 0:
        # batching the targets
        target = list(targets[last_examples:].reshape(
            len(X) // batch_size, -1, targets.shape[-1]))
        # metrics variables
        error = None
        accuracy = None
        start = time.time()
        # epoch iteration
        print(f'Epoch: {epoch+1}:')
        for i, inp, tar in zip(range(len(x)), x, target):
            loss, acc = train_step(model, inp, tar, learning_rate)
            # metrics calculation
            if error is None and accuracy is None:
                error = loss
                accuracy = acc
                error = (error + loss) / 2
                accuracy = (accuracy + acc) / 2
            # printing metrics
            msg = f'batchs: {i+1}/{len(x)}, Error: {error} Acc: {acc}'
            print("", end='\r')
            print(msg, end='')
        print("", end='\r')
        print(msg + f' Time: {time.time() - start}')

Here we have the function "**fit ** ", here we define the training cycles that our model will have, and some parameters to adjust.

  • X is a matrix with the shape (m,i) contains the examples for the training of the model.
  • targets are the expected outputs with the shape (m, n), "n" is the number of the outputs.
  • batch_size is the size of the batch in which the examples are grouped, the data can be grouped in small batches to reduce the computational cost of training and help the model to generalize the information, if its value is 1 it will be trained the model sequentially.
  • epochs is the number of times the model will cycle through all the examples
  • learning_rate is already explaining

Testing our model with real data

Now we are going to use a jupyter notebook, if you aren't using vscode, you can code in a
we create a file "test_model.ipynb" if you are using vscode or "" if you use another code editor.
first we import what we need
We Install tensorflow to use a dataset and matplotlib to visualize the data.

pip install tensorflow matplotlib
# libraries
import numpy as np
from matplotlib import pyplot as plt
# Our code
from neural_network import Model, Dense
from training_model import fit, d_mse
# mnist dataset from tensorflow
from tensorflow.keras.datsets import mnist

Now we load the dataset

# Load dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

Viewing the dataset.

plt.figure(figsize=(15, 8))
for i, img, label in zip(range(10), x_train[:10], y_train[:10]):
    plt.subplot(2, 5, i + 1)

If you managed to run the code until this point you should be able to view images of handwritten numbers.
Now we transform the data so that the model can process it.

x_train = x_train.reshape(len(x_train), -1) # shape (60000, 28*28)
x_train = x_train / 255 # normalizing data to values between 0-1
zeros = np.zeros(
    shape=(y_train.size, int(y_train.max() + 1)) # zeros matrix shape (m, i)
zeros[np.arange(len(y_train)), y_train] = 1 # one hot encodig
y_train = zeros # shape (60000, 10)

x_test = x_test.reshape(len(x_test), -1) # shape (10000, 28*28)
x_test = x_test / 255 # normalizing data to values between 0-1
zeros = np.zeros(
    shape=(y_test.size, int(y_test.max()) + 1) # zeros matrix shape (m, i)
zeros[np.arange(len(y_test)), y_test] = 1 # one hot encodig
y_test = zeros # shape (10000, 10)

We create the model:

model = Model([])
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='sigmoid')) # output layer

The model has 2 layers, the hidden layer that has 128 neurons with the relu function and the output layer that has 10 neurons with the sigmoid function, the output layer will tell us what number it sees in each image.
Note that the number of neurons in the last layer is equal to the number of numerical characters that we have available in the dataset [0 1 2 3 4 5 6 7 8 9].
Now to train the modelĀ°

fit(model, x_train, y_train, batch_size=256, epochs=5, learning_rate=0.01)

Console output:
I explain:

  • x_train is the input data for the model.
  • y_train is the target, the expected output.
  • batch_size the model loops through the dataset in batches of 256 samples.
  • epoch 5 laps.
  • learning_rate 0.01, The step size of the weights is small enough to avoid sudden changes in training.
    Apparently the model predicts with an acc = 0.9, that is, an accuracy of 90%. We will see if this is true using the test data.
    Type this in the file:
predictions = model(x_test)
ACC = np.int64(y_test.argmax(axis=1) == predictions.argmax(axis=1)).mean()
print (f'accuracy: {ACC}')

If the console output looks like this accuracy: 0.9063, the model was training correctly.
Try it with other values in the parameters of the fit function to see what results you can get. You can also try increasing more layers to the model with the number of neurons you want, the possible values for the "activation" parameter are the strings: "sigmoid ","tanh" and "relu".
Now we are going to visualize the test results with the test data
In the same file write:

def plot_image(i, predictions_array, true_label, img):
  predictions_array, true_label, img = predictions_array, true_label[i], img[i]
  predicted_label = np.argmax(predictions_array)
  true_label = np.argmax(true_label)
  if predicted_label == true_label:
    color = 'blue'
    color = 'red'
  plt.xlabel("{} {:2.0f}% ({})".format(predicted_label,

def plot_value_array(i, predictions_array, true_label):
  predictions_array, true_label = predictions_array, true_label[i]
  thisplot =, predictions_array, color="#777777")
  plt.ylim([0, 1])
  predicted_label = np.argmax(predictions_array)
  true_label = np.argmax(true_label)


num_rows = 5
num_cols = 3
num_images = num_rows*num_cols
plt.figure(figsize=(2*2*num_cols, 2*num_rows))
for i in range(num_images):
  plt.subplot(num_rows, 2*num_cols, 2*i+1)
  plot_image(i, predictions[i], y_test, x_test.reshape(-1,28,28))
  plt.subplot(num_rows, 2*num_cols, 2*i+2)
  plot_value_array(i, predictions[i], y_test)

You should see a result like this.

That's it, I hope you have learned something from this blog.


  • Willy samuel paz Colque

Source code