1D Convolutions

The application domains with time-series nature (natural temporal ordering), for examples, Biomedical signals (e.g. EEG and ECG), financial data (e.g. stock market and currecy exchange rates), industrial devices (e.g. gas sensors and laser excitation), biometrics (e.g. voice, signature and gesture), video processing, music mining, forecasting and weather.

Image source

So, 1D Convolutions:

just 1-direction (time-axis) to calculate conv
input = [W], filter = [k], output = [W]
example, input = [1,1,1,1,1], filter = [0.25,0.5,0.25], output = [1,1,1,1,1]
output-shape is 1D array

Human Activity Recognition

The data can be downloaded from the UCI repository.

Each person performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING)
Using its embedded accelerometer and gyroscope, we captured 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz. In total, it has 3 groups of x-y-z (9 channels).

Image source

We can process the data into a tensor form, the generated tensors has the following dimensions:

(batch, seq_len, n_channels)

where batch is the number of training examples in each batch, seq_len is the number of steps in the time series (128) and n_channels is the number of channels where observations are made (9).

Image source

Text Classification

We denote the dimensionality of the word vectors by d. If the length of a given sentence is s, then the dimensionality of the sentence matrix is s x d. For example, s=7, d=5. Although, the sentence matrix is 2D, the computation is 1-direction.

Here we depict three filter region sizes: 2, 3 and 4, each of which has 2 filters. Every filter performs convolution on the sentence matrix and generates (variable-length) feature maps. Then 1-max pooling is performed over each map, i.e., the largest number from each feature map is recorded. Thus a univariate feature vector is generated from all six maps, and these 6 features are concatenated to form a feature vector for the penultimate layer. The final softmax layer then receives this feature vector as input and uses it to classify the sentence; here we assume binary classification and hence depict two possible output states.

Source: Zhang, Y., & Wallace, B. (2015). A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification.

Image source

Eventhough input is 2D ex) 20x14
output-shape is not 2D , but 1D Matrix
because filter height = L must be matched with input height = L
1-direction (x) to calcuate conv! not 2D
input = [W,L], filter = [k,L] output = [W]
output-shape is 1D Matrix
what if we want to train N filters (N is number of filters)
then output shape is (stacked 1D) 2D = 1D x N matrix

2D Convolutions

Image source

2-direction (x,y) to calculate conv
output-shape is 2D Matrix
input = [W, H], filter = [k,k] output = [W,H]
example) computer vision, edge detection algorithms, Sobel Egde Fllter

Image source

2D Convolutions with 3D input - LeNet, VGG, ...,

Eventhough input is 3D ex) 224x224x3, 112x112x32, 3 for RGB
output-shape is not 3D Volume, but 2D Matrix
because filter depth = L must be matched with input channels = L
2-direction (x,y) to calcuate conv! not 3D
input = [W,H,L], filter = [k,k,L] output = [W,H]
output-shape is 2D Matrix
what if we want to train N filters (N is number of filters)
then output shape is (stacked 2D) 3D = 2D x N matrix.

Animation (2D Conv with 3D-inputs)

Image source

The author: Martin Görner
Twitter: @martin_gorner
Google +: plus.google.com/+MartinGorne

Image source

1x1 conv in CNN - GoogLeNet, ...,

1x1 conv is confusing when you think this as 2D image filter like sobel
for 1x1 conv in CNN, input is 3D shape as above picture.
it calculate depth-wise filtering
input = [W,H,L], filter = [1,1,L] output = [W,H]
output stacked shape is 3D = 2D x N matrix.

3D Convolutions

3D convolutional networks are more expensive in the computation efficiency. 3D matrix needs more memories in the computer. 3D convolutional operations needs more calculations than 2D convolutional operations.

Image source

3-direction (x,y,z) to calcuate conv
output-shape is 3D Volume
input = [W,H,L], filter = [k,k,d] output = [W,H,M]
d < L is important! for making volume output
example) C3D video descriptor

For example, application of 3D Convolutional Neural Network to QSM

All 20 three-dimensional images are re-sized to the same voxel size (1mm, 1mm, 3mm) and cropped to matrix size (160, 220, 48). This provides a volume coverage of (16cm, 22cm, 14.4cm), large enough for average human brain.

Image source

Summary

Ignoring number of dimensions briefly, the following can be considered strengths of a convolutional neural network (CNN), compared to fully-connected models, when dealing with certain types of data:

The use of shared weights for each location that the convolution processes significantly reduces the number of parameters that need to be learned, compared to the same data processed through fully-connected network.
Shared weights is a form of regularisation.
The structure of a convolutional model makes strong assumptions about local relationships in the data, which when true make it a good fit to the problem.
- Local patterns provide good predictive data (and/or can be usefully combined into more complex predictive patterns in higher layers)
- The types of pattern found in the data can be found in multiple places. Finding the same pattern in a different set of data points is meaningful.

These properties of CNNs are independent of the number of dimensions.

Image source

Reference

Classification of Time-Series Images Using Deep Convolutional Neural Networks
In a nutshell, convolutional direction & output shape is important!
What is an 1D Convolutional Layer in Deep Learning?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1d-2d-and-3d-convolutions-in-cnn.md

1d-2d-and-3d-convolutions-in-cnn.md

Table of Contents

1D Convolutions

Human Activity Recognition

Text Classification

2D Convolutions

3D Convolutions

Summary

Reference

Files

1d-2d-and-3d-convolutions-in-cnn.md

Latest commit

History

1d-2d-and-3d-convolutions-in-cnn.md

File metadata and controls

Table of Contents

1D Convolutions

Human Activity Recognition

Text Classification

2D Convolutions

3D Convolutions

Summary

Reference