There’s not much code necessary, but we’ll step over it slowly so you can build your own models in the future.

  1. Load Data.
  2. Define Keras Model.
  3. Compile Keras Model.
  4. Fit Keras Model.
  5. Evaluate Keras Model.
  6. Tying it together.
  7. Make Predictions

This Keras tutorial has a few requirements:

  1. You have Python 2 or 3 installed and configured.
  2. You have SciPy (including NumPy) installed and configured.
  3. You have Keras and a backend (Theano or TensorFlow) installed and configured.

1. Load Data

The first step is to identify the features and courses for this tutorial.

We will load our dataset using the NumPy library and use the two Keras library courses to describe our template.

The necessary imports are mentioned below.

We can now load our dataset.

We will use the Pima Indian occurrence of diabetes in this Keras tutorial. This is a normal data set from the UCI Machine Learning repository. It defines Pima Indians patient medical records and whether they have diabetes in 5 years.

As such, this is an issue of binary classification (diabetes as 1 or not as 0). All the input variables describing each patient are numeric. This makes it simple to use with neural networks with numerical input and output values and perfect for our first Keras neural network.

The dataset is available from here:

Download and place the dataset in your local working folder, where your Python file is located.

Save it with the filename:


We can now load the file as a number matrix using the loadtxt() function using NumPy.

There are eight variables for input and one for output (last column). We will learn to map the input variables (X) rows to an output variable (y), which we often resume as y = f(X).

The variables can be summarized as follows:

Input Variables (X):

  1. Number of times pregnant
  2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
  3. Diastolic blood pressure (mm Hg)
  4. Triceps skin fold thickness (mm)
  5. 2-Hour serum insulin (mu U/ml)
  6. Body mass index (weight in kg/(height in m)^2)
  7. Diabetes pedigree function
  8. Age (years)

Output Variables (y):

  1. Class variable (0 or 1)

Once the CSV file is loaded into memory, we can split the columns of data into input and output variables.

Data is stored in a 2D array where the first dimension is rows and columns, e.g. [ rows, columns ] are the second dimensions.

By choosing sub-sets for columns using the default NumPy slice operator, or “we can divide the range into two rows:” We can pick the first 8 columns from index 0 to index 7 from slice 0:8. The output column (9th variable) can then be selected via index 8.

We are now prepared to define the model of our neural network.

2. Define Keras Model

Models in Keras are defined as a sequence of layers.

We generate a sequential model and add layers one by one until our network architecture is satisfied.

The first thing that can be done is to guarantee that the input layer has the correct amount of input functions. This can be defined when using the input dim argument to build the first layer and set to 8 for the 8 input factors.

How do we know the number of layers and their types?

It’s a very difficult question. There are heuristics to be used, and a method of testing and error testing often results in the finest network structure (I explain more here). In general, you need a network that is big enough to capture the issue structure.

In this instance, we are using a fully connected three-layer network framework.

The Dense class defines fully linked layers. The first criterion to be set is the number of neurons or nodes in the layer and the activation function can be specified using the activation argument.

On the first two layers, and the Sigmoid function on the output layer, we use the rectified linear unit activation function called ReLU.

The activation functions of Sigmoid and Tanh were used to be preferred for all layers. Better efficiency nowadays is accomplished with the ReLU activation function. We use a sigmoid on the output layer to guarantee our network output is from 0 to 1 and is simple to map to either a class 1 likelihood or a difficult ranking with a default limit of 0.5.

We can piece it all together by adding each layer:

  • The model expects rows of data with 8 variables (the input_dim=8 argument)
  • The first hidden layer has 12 nodes and uses the relu activation function.
  • The second hidden layer has 8 nodes and uses the relu activation function.
  • The output layer has one node and uses the sigmoid activation function.

3. Compile Keras Model

Now that the model is defined, we can compile it.

The model compiles using effective numerical libraries such as theano or TensorFlow, under the cover (the so-called backend). The backend automatically selects the best way to represent the network for training and prediction to operate on your hardware, such as a CPU or GPU.

When compiling, we need to define some extra characteristics necessary for network training. Recall training in a network implies finding the highest weight set in our data set for mapping inputs to outputs.

We must indicate the loss function to assess a number of weights, use the optimizer to search for distinct network weights and any optional metrics that we wish to retrieve and report during practice.

In this situation, the reasoning for loss is cross-entropy. This loss is for issues with binary classification and in Keras it is described as “binary crossentropy.”

We describe the optimizer as the effective algorithm for “adam” stochastic gradient descent. This is a common downward gradient version because it tunes itself automatically and outcomes in a broad spectrum of issues.

Lastly, we will obtain and report the classification precision identified through the method argument because it is a classification problem.

4. Fit Keras Model

We have described and compiled our model ready for effective calculation.

Now it is time to execute the model on some data.

We can train or fit our model on our loaded data by calling the fit() function on the model.

Training occurs over epochs and each epoch is split into batches.

  • Epoch: One pass through all of the rows in the training dataset.
  • Batch: One or more samples considered by the model within an epoch before weights are updated.

One epoch consists of one or more ‘ batches ‘ depending on the size of the selected ‘ batch ‘ and the model is fit for many times. 

The training method runs through the dataset epochs for a set amount of iterations which must be specified using epoch reasoning. We also have to set the number of rows in the dataset before the model weights are updated in each epoch, call the batch size, and set using the batch size argument.

We will work for a tiny amount of epochs (150) for this issue and use a comparatively small batch size of 10. This implies that every epoch (150/10) involves 15 model weight updates.

These settings can be experimentally selected by test and mistake. We want to train the model to learn how to map the input information rows to the output classification, either good enough or good enough. The model will always have some mistake, but after some stage for specified model settings, the quantity of mistake will level out. This is called the convergence model.

This is where the work happens on your CPU or GPU.

5. Evaluate Keras Model

We have taught our neural network throughout the data set and can assess the network’s output with the same dataset.

This gives us an idea of how well (e.g. train precision) we model the information set, but no idea how well the algorithm can work on fresh information. We did this for simplicity but ideally, you can separate your information into train and test information sets for your model training and assessment.

Use the evaluate() function on your model to evaluate your model on your training dataset and provide the same input and output to train the model.

This generates a forecast for each input and output pair and collects the results including the average loss and any measurements that you have set up, for example, precision.

The evaluate() feature returns a list of two values. The first is the loss of the model in the dataset, and the second is the precision of the model in the dataset. We only want to report the precision, so we disregard the importance of the loss.

6. Tying it together

You just saw how your first neural network model can readily be created in Keras.

Let’s tie all of it into a full instance of code.

You can copy all the code to your Python folder, and save it in the same folder as your file “pima-indians-diabetes.csv” as “keras first network.py.” You can then execute the Python file as a script (command prompt) from your command line as follows:

python keras_first_network.py

This instance will show you the message of loss and precision for each of the 150 epochs followed by a final assessment of the trained model on the training data set.

It takes about 10 seconds to execute on my workstation running on the CPU.

Ideally, we want the loss to be zero and the precision to be 1.0 (e.g. 100%). This is not possible, but the most trivial learning problems for the machine. We will always have a mistake in our template instead. The aim is to select a model configuration and training set up to ensure the smallest possible loss and precision for a particular dataset.

The reason is the output progress bars during training. You can easily turn these off by setting verbose=0 in the call to the fit() and evaluate() functions, for example:

Neural networks are a stochastic algorithm, which means that each time a code is run, a different model can be trained using the same algorithm on the same data.

The variation in model performance implies that you may need to fit it many times to get an average precision value to calculate how well your model performs.

For example, below are the accuracy scores from re-running the example 5 times:

We can see that all accuracy scores are around 77% and the average is 76.924%.

7. Make Predictions

How can I use my model after training to predict new data?

Great question.

We can adapt this example and use it to generate forecasts on the dataset, claiming it’s a new dataset we haven’t seen before.

Predicting is as simple as calling the model predict() function. We use a sigmoid activation feature on the output layer, so the projections are likely to range from 0 to 1. By completing them, we can readily convert them to a crisp binary forecast for this classification assignment.

For example:

Alternatively, the model feature predict_classes() can be called to directly predict crisp classes, for instance:

The following complete example predicts for each example in the dataset, and then prints the data, the predicted class and the expected class for the first 5 examples in the dataset.

The progress bar is not shown when running the instance as before, as the verbose argument is set to 0.

After the model is fit, all examples in the data set are predicted, the input rows and the predicted class value are printed for the initial 5 examples and compared with the expected class value.

We can see that most lines are predicted properly. In reality, we predict that approximately 76.9% of the rows would be predicted according to our projected model results in the past chapter.

<[6.0, 148.0, 72.0, 35.0, 0.0, 33.6, 0.627, 50.0] => 0 (expected 1) [1.0, 85.0, 66.0, 29.0, 0.0, 26.6, 0.351, 31.0] => 0 (expected 0) [8.0, 183.0, 64.0, 0.0, 0.0, 23.3, 0.672, 32.0] => 1 (expected 1) [1.0, 89.0, 66.0, 23.0, 94.0, 28.1, 0.167, 21.0] => 0 (expected 0) [0.0, 137.0, 40.0, 35.0, 168.0, 43.1, 2.288, 33.0] => 1 (expected 1)/code>