Understanding deep learning requires familiarity with:
- Tensors
- Tensor operations
- Differentiation
- Gradient descent
As a first example, we will try to classify grayscale images of handwritten digits (28 by 28 pixels) into their 10 categories (0-9). This example uses the MNIST dataset containing 60,000 training images and 10,000 test images, assembled by the National Institute of Standards and Technology (NIST) in the 1980s.
library(keras)
mnist <- dataset_mnist()
train_images <- mnist$train$x
train_labels <- mnist$train$y
test_images <- mnist$test$x
test_labels <- mnist$test$y
The images are encoded as 3D arrays and the labels are a 1D array of digits ranging from 0 to 9. The images and labels have a one-to-one correspondence.
First we'll build the network.
network <- keras_model_sequential() %>%
layer_dense(units = 512, activation = "relu", input_shape = c(28 * 28)) %>%
layer_dense(units = 10, activation = "softmax")
The core building block of neural networks is the layer, a data-processing module that you can think of as a filter for data. Some data goes in, and it comes out in a more useful form. Specifically, layers extract representations out of the data fed into them and hopefully the representations are more meaningful for the problem at hand. Most of deep learning consists of chaining together simple layers that will implement a form of progressive data distillation. A deep-learning model is like a sieve for data processing, made of a successfion of increasingly refined data filters, which are the layers.
Our network consists of a sequence of two layers, which are densely connected (also called fully connected) neural layers. The second (and last) layer is a 10-way softmax layer, which means it will return an array of 10 probability scores (summing to 1). Each score will be the probability that the current digit image belongs to one of the 10 digit classes.
We need to select three more things, as part of the compilation step, to get the network ready for training:
- A loss function - how the network will be able to measure its performance on the training data and thus how it will be able to steer itself in the right direction.
- An optimiser - the mechanism through which the network will update itself based on the data it sees and its loss function.
- Metrics to monitor during training and testing - measuring the fraction of the images that were correctly classified.
Below is the compliation step.
network %>% compile(
loss = "categorical_crossentropy",
optimizer = "rmsprop",
metrics = "accuracy")
Next, we'll preprocess the data by reshaping it into the shape the network expects and scaling the values such that they are in the [0, 1] interval. The original data is stored in an array of shape (60000, 28, 28) of type integer with values in the [0, 255] interval. We'll transform it into a double array of shape (60000, 28 * 28) with values between 0 and 1.
train_images <- array_reshape(train_images, c(60000, 28 * 28))
train_images <- train_images / 255
train_labels <- to_categorical(train_labels)
test_images <- array_reshape(test_images, c(10000, 28 * 28))
test_images <- test_images / 255
test_labels <- to_categorical(test_labels)
The fit function is used to fit the model to its training data.
network %>%
fit(train_images, train_labels, epochs = 5, batch_size = 128)
Two quantities are displayed during training: the loss of the network over the taining data and the accuracy of the network over the training data.
Now to test whether the model performs well on the test set:
metrics <- network %>% evaluate(test_images, test_labels)
metrics
Use predict_classes to generate predictions:
network %>% predict(test_images[1:10, ]) %>% k_argmax()
Next, we'll learn about tensors, which are the data-storing objects going into the network, tensor operations, and gradient descent, which allows the network to learn from its training examples.
2 comments