Data representations – Deep Learning

In the First example, we started from data stored in multidimensional arrays, which are also called tensors. Tensors are a generalisation of vectors and matrices to an arbitrary number of dimensions (in the context of tensors, a dimension is often called an axis). In R, vectors are used to create and manipulate 1D tensors and matrices are used for 2D tensors. For higher-level dimensions, array objects (which support any number of dimensions) are used.

Scalars (0D tensors)

A tensor that contains only one number is called a scalar (or scalar tensor, or zero-dimensional tensor, or OD tensor). R doesn't have a data type to represent scalars (all numeric objects are vectors, matrices, or arrays) but an R vector that's always of length one is conceptually similar to a scalar.

Vectors (1D tensors)

A one-dimensional array of numbers is called a vector, or 1D tensor. A 1D tensor is said to have exactly one axis.

x <- c(12, 3, 6, 14, 10)
str(x)
dim(as.array(x))

The vector above has five entries and is called a five-dimensional vector, not to be confused with a 5D tensor. A 5D vector has only one axis and has five dimensions along its axis, whereas a 5D tensor has five axes (and may have any number of dimensions along each axis). Dimensionality can denote either the number of entries along a specific axis (as in the case of the 5D vector) or the number of axes in a tensor (such as a 5D tensor). It is technically more correct to talk about a tensor of rank 5 (the rank of a tensor being the number of axes) but the ambiguous notation of 5D tensor is common, regardless.

Matrices (2D tensors)

A two-dimensional array of numbers is a matrix, or 2D tensor. A matrix has two axes (often referred to as rows and columns).

matrix(rep(0, 3*5), nrow = 3, ncol = 5)

3D tensors and higher-dimensional tensors

Matrices that are packed in a new array become a 3D tensor.

array(rep(0, 2*3*2), dim = c(2, 3, 2))

By packing 3D tensors in an array, a 4D tensor is created, and so on.

Key attributes

A tensor is defined by three key attributes:

Number of axes (rank) - a matrix has two axes and a 3D tensor has three axes
Shape - this is an integer vector that describes how many dimensions the tensor has along each axis. The matrix example has a (3, 5) shape and the 3D tensor example has a (2, 3, 2) shape. A vector has a shape with a single element, such as (5).
Data type - this is the type of data contained in the tensor; for instance, a tensor's type could be integer or double. On rare occasions, you may encounter a character tensor. But since tensors live in preallocated contiguous memory segments and since strings are variable-length, they require another implementation for memory storage.

Example

Load the MNIST dataset.

library(keras)
mnist <- dataset_mnist()
train_images <- mnist$train$x
train_labels <- mnist$train$y
test_images <- mnist$test$x
test_labels <- mnist$test$y

Number of axes in the train_images tensor:

length(dim(train_images))

Shape of train_images:

dim(train_images)

Datatype of train_images:

typeof(train_images)

train_images is a 3D tensor of integers; more precisely, it is an array of 60,000 matrices of 28 x 28 integers. Each matrix is a grayscale image, with coefficients between 0 and 255. For example the fifth digit in this 3D tensor is accessed as follows:

digit <- train_images[5,,]
plot(as.raster(digit, max = 255))

Slicing tensors in R

Selecting specific elements in a tensor is called tensor slicing. The following example selects training images from 10 to 99:

train_images[10:99,,]

In general, you may select between any two indices along each tensor axis. To select 14 x 14 pixels in the bottom-right corner of all images:

train_images[, 15:28, 15:28]

Data batches

In general, the first axis in all data tensors in deep learning will be the samples axis (or sometimes called the samples dimension). In the MNIST example, samples are images of digits. In addition, deep-learning models do not process an entire dataset at once; rather, they break the data into small batches.

train_images[1:128,,]
train_images[129:256,,]

When considering such a batch tensor, the first axis is called the batch axis or batch dimension and is a term that is frequently encountered when using Keras and other deep-learning libraries.

Examples of data tensors

The data most likely encountered will fall into one of the following categories:

Vector data - 2D tensors of shape (samples, features). Each single data point can be encoded as a vector and thus a batch of data will be encoded as a 2D tensor, where the first axis is the samples axis and the second axis is the features axis.
- For example, a dataset of people where we consider each person's age, ZIP code, and income can be characterised as a vector of three values and thus a dataset of 100,000 people can be stored in a 2D tensor of shape (100000, 3).
Timeseries data or sequence data - 3D tensors of shape (samples, timesteps, features). Whenever time or sequence order is measured, it makes sense to store the data in a 3D tensor with an explicit time axis. Each sample can be encoded as a sequence of vectors (a 2D tensor) and thus a batch of data will be encoded as a 3D tensor.
- A dataset of stock prices can be stored as a 3D tensor, where every minute we store the current price, highest price, and lowest price of a stock. Thus every minute is encoded as a 3D vector and an entire day of trading is encoded as a 2D tensor of shape (390, 3) (there are 390 minutes in a trading day), and 250 days' worth of data can be stored in a 3D tensor of shape (250, 390, 3). Each sample would be one day's worth of data.
Images - 4D tensors of shape (samples, height, width, channels) or (samples, channels, height, width). Images typically have three dimensions: height, width, and colour depth. Although grayscale images have only a single colour channel and could be stored in 2D tensors, by convention image tensors are always 3D, with a one-dimensional colour channel for grayscale images. A batch of 128 grayscale images of size 256 x 256 could thus be stored in a tensor of shape (128, 256, 256, 1) and a batch of 128 colour images could be stored in a tensor of shape (128, 256, 256, 3).
Video - 5D tensors of shape (samples, frames, height, width, channels) or (samples, frames, channels, height, and width) Video data is one of the few types of real-world data that requires 5D tensors. A video can be understood as a sequence of frames, each frame being a colour image. Since each frame can be stored in a 3D tensor (height, width, colour depth), a sequence of frames can be stored in a 4D tensor (frames, height, width, colour depth), and thus a batch of different videos can be stored in a 5D tensor of shape (samples, frames, height, width, colour depth).