Deep Learning with Keras - Quick Guides

Deep Learning With Keras Study Notes

Deep Learning essentially means training an Artificial Neural Network (ANN) with a huge amount of data. In deep learning, the network learns by itself and thus requires humongous data for learning. In this study notes, you will learn the use of Keras in building deep neural networks. We shall look at the practical examples for learning.


This study notes is prepared for professionals who are aspiring to make a career in the field of deep learning and neural network framework. This study notes is intended to make you comfortable in getting started with the Keras framework ideas.


Before proceeding with the different types of concepts given in this study notes, we assume that the readers have basic understanding of deep learning framework. In addition to this, it will be very useful, if the readers have a sound knowledge of Python and Machine Learning.

Deep Learning With Keras - Introduction

Deep Learning has become a buzzword in now a days in the field of Artificial Intelligence (AI). For many years, we used Machine Learning (ML) for imparting intelligence to machines. In recent days, deep learning has become more famous due to its supremacy in predictions as compared to traditional ML techniques.

Deep Learning essentially means training an Artificial Neural Network (ANN) with a large amount of data. In deep learning, the network learns by itself and thus requires humongous data for learning. While traditional machine learning is essentially a set of algorithms that parse data and learn from it. They then utilized this learning for making intelligent decisions.

Now, coming to Keras, it is a high-level neural networks API that runs on top of TensorFlow - an end-to-end open source machine learning platform. Utilizing Keras, you easily define complex ANN architectures to experiment on your big data. Keras also supports GPU, which becomes essential for processing huge amount of data and developing machine learning models.

In this study notes, you will learn the use of Keras in building deep neural networks. We shall look at the practical examples for learning. The problem at hand is recognizing handwritten digits using a neural network that is trained with deep learning.

Just to get you more excited in deep learning, below is a screenshot of Google trends on deep learning here −




As you can see from the screenshot, the interest in deep learning is steadily growing over the last several years. There are many areas such as computer vision, natural language processing, speech recognition, bioinformatics, drug design, and so on, where the deep learning has been successfully applied. This study notes will get you quickly started on deep learning.

So keep reading and happy reading!

Deep Learning With Keras - Deep Learning

As said in the introduction, deep learning is a process of training an artificial neural network with a large amount of data. Once trained, the network will be able to give us the predictions on unseen data. Before I go further in explaining what deep learning is, let us quickly study some terms used in training a neural network.

Neural Networks

The concept of artificial neural network was derived from neural networks in our brain. A typical neural network consists of three layers — input, output and hidden layer as shown in the image below.




This is also called a shallow neural network, as it contains only one hidden layer. You add more hidden layers in the above architecture to create a more complex architecture.

Deep Networks

The following image shows a deep network consisting of four hidden layers, an input layer and an output layer.




As the number of hidden layers are added to the network, its training becomes more complex in terms of required resources and the time it takes to fully train the network.

Network Training

After you define the network architecture, you train it for doing certain kinds of predictions. Training a network is a process of finding the proper weights for each link in the network. During training, the data flows from Input to Output layers through different hidden layers. As the data always moves in one direction from input to output, we call this network as Feed-forward Network and we call the data propagation as Forward Propagation.

Activation Function

At every layer, we calculate the weighted sum of inputs and feed it to an Activation function. The activation function brings nonlinearity to the network. It is simply some mathematical function that discretizes the output. Some of the most commonly used activations functions are sigmoid, hyperbolic, tangent (tanh), ReLU and Softmax.


Backpropagation is an algorithm for supervised learning. In Backpropagation, the errors propagate backwards from the output to the input layer. Given an error function, we calculate the gradient of the error function with respect to the weights assigned at each connection. The calculation of the gradient proceeds backwards through the network. The gradient of the final layer of weights is calculated first and the gradient of the first layer of weights is calculated last.

At every layer, the partial computations of the gradient are reused in the computation of the gradient for the previous layer. This is called Gradient Descent.

In this project-based study notes you will define a feed-forward deep neural network and train it with backpropagation and gradient descent methods. Luckily, Keras provides us all high level APIs for defining network architecture and training it using gradient descent. Next, you will learn how to do this in Keras.

Handwritten Digit Recognition System

In this small project, you will apply the methods described earlier. You will create a deep learning neural network that will be trained for recognizing handwritten digits. In any machine learning project, the first challenge is collecting the data. Especially, for deep learning networks, you need humongous data. Fortunately, for the problem that we are trying to solve, somebody has already created a dataset for training. This is called mnist, which is available as a part of Keras libraries. The dataset consists of several 28x28 pixel pictures of handwritten digits. You will train your model on the major portion of this dataset and the rest of the data would be used for validating your trained model.

Project Description

The mnist dataset consists of 70000 pictures of handwritten digits. A few sample pictures are reproduced here for your reference




Each picture is of size 28 x 28 pixels making it a total of 768 pixels of different gray scale levels. Most of the pixels tend towards black shade while only few of them are towards white. We will put the distribution of these pixels in an array or a vector. For example, the distribution of pixels for a typical picture of digits 4 and 5 is shown in the image below.

Each image is of size 28 x 28 pixels making it a total of 768 pixels of different gray scale levels. Most of the pixels tend towards black shade while only few of them are towards white. We will put the distribution of these pixels in an array or a vector. For example, the distribution of pixels for a typical picture of digits 4 and 5 is shown in the image below.




Clearly, you can see that the distribution of the pixels (especially those tending towards white tone) differ, this distinguishes the digits they represent. We will feed this distribution of 784 pixels to our network as its input. The output of the network will consist of 10 categories representing a digit between 0 and 9.

Our network will consist of 4 layers — one input layer, one output layer and two hidden layers. Each hidden layer will contain 512 nodes. Each layer is fully assosiated to the next layer. When we train the network, we will be computing the weights for each connection. We train the network by applying backpropagation and gradient descent that we talked earlier.

Deep Learning With Keras - Setting Up Project

With this background, let us now begin creating the project.

Setting Up Project

We will use Jupyter through Anaconda navigator for our project. As our project uses TensorFlow and Keras, you will need to install those in Anaconda setup. To install Tensorflow, run the following command in your console window:

>conda install -c anaconda tensorflow

To install Keras, use the following command −

>conda install -c anaconda keras

You are now ready to start Jupyter.

Starting Jupyter

When you start the Anaconda navigator, you would see the following opening screen.




Click ‘Jupyter’ to start it. The screen will show up the existing projects, if any, on your drive.

Starting a New Project

Start a new Python 3 project in Anaconda by selecting the following menu option −

File | New Notebook | Python 3

The screenshot of the menu selection is appered for your quick reference −




A new blank project will show up on your screen as shown below −


Change the project name to DeepLearningDigitRecognition by clicking and editing on the default name “Project Name”.

Deep Learning With Keras - Importing Libraries

We first import the different libraries required by the code in our project.

Array Handling and Plotting

As typical, we use numpy for array handling and matplotlib for plotting. These libraries are imported in our project using the following import statements

import numpy as np
import matplotlib
import matplotlib.pyplot as plot

Suppressing Warnings

As both Tensorflow and Keras keep on revising, if you do not sync their appropriate versions in the project, at runtime you would see plenty of warning errors. As they distract your attention from learning, we shall be suppressing all the warnings in this project. This is done with the following lines of code −

# silent all warnings
import os
import warnings
from tensorflow.python.util import deprecation


We use Keras libraries to import dataset. We will use the mnist dataset for handwritten digits. We import the required package using the following statement

from keras.datasets import mnist

We will be defining our deep learning neural network using Keras packages. We import the Sequential, Dense, Dropout and Activation packages for defining the network architecture. We use load_model package for saving and retrieving our model. We also use np_utils for a some utilities that we need in our project. These imports are finished with the following program statements −

from keras.models import Sequential, load_model
from keras.layers.core import Dense, Dropout, Activation
from keras.utils import np_utils

When you run this code, you will see a message on the console that says that Keras uses TensorFlow at the backend. The screenshot at this step is appeared here −




Now, as we have all the imports required by our project, we will proceed to charecterize the architecture for our Deep Learning network.

Deep Learning With Keras - Creating Deep Learning Model

Our neural network model will consist of a linear stack of layers. To define such a model, we call the Sequential function −

model = Sequential()

Input Layer

We characterize the input layer, which is the first layer in our network using the following program statement −

model.add(Dense(512, input_shape=(784,)))

This creates a layer with 512 nodes (neurons) with 784 input nodes. This is depicted in the image below −




Note that all the input nodes are fully connected to the Layer 1, that is each input node is connected to all 512 nodes of Layer 1.

Next, we have to add the activation function for the output of Layer 1. We will use ReLU as our activation. The activation function is added using the following program statement −


Next, we add Dropout of 20% using the statement below. Dropout is a technique used to prevent model from overfitting.


At this point, our input layer is fully defined. Next, we will add a hidden layer.

Hidden Layer

Our hidden layer will consist of 512 nodes. The input to the hidden layer comes from our previously defined input layer. All the nodes are fully connected as in the earlier case. The output of the hidden layer will go to the next layer in the network, which is going to be our final and output layer. We will use the same ReLU activation as for the previous layer and a dropout of 20%. The code for adding this layer is shown here −


The network at this stage can be visualized as follows −




Next, we will add the final layer to our network, which is the output layer. Note that you may add any number of hidden layers using the code similar to the one which you have used here. Adding more layers would make the network complex for training; however, giving a definite advantage of better outputs in many cases though not all.

Output Layer

The output layer consists of just 10 nodes as we want to classify the given pictures in 10 distinct digits. We add this layer, using the following statement −


As we want to classify the output in 10 distinct units, we use the softmax activation. In case of ReLU, the output is binary. We add the activation using the following statement −


At this point, our network can be visualized as appeared in the below image −




At this point, our network model is fully defined in the software. Run the code cell and if there are no errors, you will get a confirmation message on the screen.

Next, we have to compile the model.

Deep Learning With Keras - Compiling The Model

The compilation is performed using one single method call called compile.

model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')

The compile method requires several parameters. The loss parameter is specified to have type 'categorical_crossentropy'. The metrics parameter is set to 'accuracy' and finally we use the adam optimizer for training the network. The output at this step is appeared below −




Now, we are ready to feed in the data to our network.

Loading Data

As said earlier, we will use the mnist dataset provided by Keras. When we load the data into our system, we will split it in the training and test data. The data is loaded by calling the load_data method as appeared −

(X_train, y_train), (X_test, y_test) = mnist.load_data()

The output at this stage looks like the following −




Now, we shall learn the structure of the loaded dataset.

The data that is provided to us are the graphic pictures of size 28 x 28 pixels, each containing a single digit between 0 and 9. We will display the first ten pictures on the console. The code for doing so is given below −

# printing first 10 images
for i in range(10):

plot.imshow(X_train[i], cmap='gray', interpolation='none')
plot.title("Digit: {}".format(y_train[i]))

In an iterative loop of 10 counts, we create a subplot on each iteration and show an image from X_train vector in it. We title each image from the corresponding y_train vector. Note that the y_train vector contains the actual values for the corresponding image in X_train vector. We remove the x and y axes markings by calling the two methods xticks and yticks with null argument. When you run the code, you would see the following output −




Next, we will prepare data for feeding it into our network.

Deep Learning With Keras - Evaluating Model Performance

To evaluate the model performance, we call evaluate method as follows −

loss_and_metrics = model.evaluate(X_test, Y_test, verbose=2)

To evaluate the model performance, we call evaluate method as follows −

loss_and_metrics = model.evaluate(X_test, Y_test, verbose=2)

We will print the loss and accuracy utilizing the following two statements −

print("Test Loss", loss_and_metrics[0])
print("Test Accuracy", loss_and_metrics[1])

When you run the above statements, you would see the following output −

Test Loss 0.08041584826191042
Test Accuracy 0.9837

This shows a test accuracy of 98%, which should be acceptable to us. What it means to us that in 2% of the cases, the handwritten digits would not be classified correctly. We will also plot accuracy and loss metrics to see how the model performs on the test data.

Plotting Accuracy Metrics

We use the recorded history during our training to get a plot of accuracy metrics. The following code will plot the accuracy on each epoch. We pick up the training data accuracy (“acc”) and the validation data accuracy (“val_acc”) for plotting.

plot.title('model accuracy')
plot.legend(['train', 'test'], loc='lower right')

The output plot is shown as below −




As you can see in the picture, the accuracy increases rapidly in the first two epochs, indicating that the network is learning fast. Afterwards, the curve flattens indicating that not too many epochs are required to train the model further. Normally, if the training data accuracy (“acc”) keeps improving while the validation data accuracy (“val_acc”) gets worse, you are encountering overfitting. It indicates that the model is starting to memorize the data.

We will also plot the loss metrics to check our model’s performance.

Plotting Loss Metrics

Again, we plot the loss on both the training (“loss”) and test (“val_loss”) data. This is done utilizing the following code −

plot.title('model loss')
plot.legend(['train', 'test'], loc='upper right')

The output of this code is shown as below −




As you can see in the image, the loss on the training set decreases rapidly for the first two epochs. For the test set, the loss does not decrease at the same rate as the training set, but remains almost flat for multiple epochs. This means our model is generalizing well to unseen data.

Now, we will use our trained model to predict the digits in our test data.

Deep Learning With Keras - Predicting On Test Data

To predict the digits in an unseen data is very easy. You simply need to call the predict_classes method of the model by passing it to a vector consisting of your unknown data points.

predictions = model.predict_classes(X_test)

The method call returns the predictions in a vector that can be tested for 0’s and 1’s against the actual values. This is finished using the following two statements −

correct_predictions = np.nonzero(predictions == y_test)[0]
incorrect_predictions = np.nonzero(predictions != y_test)[0]

At last, we will print the count of correct and incorrect predictions using the following two program statements −

print(len(correct_predictions)," classified correctly")
print(len(incorrect_predictions)," classified incorrectly")

When you run the code, you will get the following output as below −

9137 classified correctly
263 classified incorrectly

Now, as you have satisfactorily trained the model, we will save it for future refrence.

Deep Learning With Keras - Conclusion

Keras provides a high level API for creating deep neural network. In this study notes, you learned to create a deep neural network that was trained for finding the digits in handwritten text. A multi-layer network was created for this purpose. Keras permits you to define an activation function of your choice at each layer. Using gradient descent, the network was trained on the training data. The accuracy of the trained network in predicting the unseen data was tested on the test data. You learned to plot the accuracy and error metrics. After the network is fully trained, you saved the network model for future reference.

Input your Topic Name and press Enter.