Useful shortcut keys in Linux terminal
2022-02-23
Autocorrelation and Partial Autocorrelation explained with Python code
2022-02-25
Show all

Understanding 1D, 2D, and 3D convolutional layers in deep neural networks

21 mins read

In deep learning, convolutional layers have been major building blocks in many deep neural networks. The design was inspired by the visual cortex, where individual neurons respond to a restricted region of the visual field known as the receptive field. A collection of such fields overlap to cover the entire visible area.

Though convolutional layers were initially applied in computer vision, their shift-invariant characteristics have allowed convolutional layers to be applied in natural language processing, time series, recommender systems, and signal processing.

When we say Convolution Neural Network (CNN), generally we refer to a 2-dimensional CNN which is used for image classification. But there are two other types of Convolution Neural Networks used in the real world, which are 1 dimensional and 3-dimensional CNNs. In this article, we will see how convolution layers work and explore the effects of each parameter. I am assuming you are already familiar with the concept of Convolutions Networks in general.

2D Convolution | Conv2D

Operation of 2D CNN

Overview:

  • The convolutional kernel moves in a 2-direction (x,y) to calculate the convolutional output.
  • The output shape of the output is a 2D Matrix.
  • Use cases: Image Classification, Generating New Images, Image Inpainting, Image Colorization, etc.

This is the standard Convolution Neural Network which was first introduced in Lenet-5 architecture. Conv2D is generally used on Image data. It is called 2 dimensional CNN because the kernel slides along 2 dimensions of the data as shown in the following image.

Kernal sliding over the Image

The whole advantage of using CNN is that it can extract the spatial features from the data using its kernel, which other networks are unable to do. For example, CNN can detect edges, distribution of colors, etc in the image which makes these networks very robust in image classification and other similar data which contain spatial properties.

Implementation

import keras

from keras.layers import Conv2D

model = keras.models.Sequential()

model.add(Conv2D(1, kernel_size=(3,3), input_shape = (128, 128, 3)))

model.summary()

Argument input_shape (128, 128, 3) represents (height, width, depth) of the image. Argument kernel_size (3, 3) represents (height, width) of the kernel, and kernel depth will be the same as the depth of the image.

Here we will perform 2D convolution in TensorFlow 2.x

import tensorflow as tf
import numpy as np
from PIL import Image

im = np.array(Image.open(<some image>).convert('L'))#/255.0
x = np.expand_dims(np.expand_dims(im,0),-1)


kernel_init = np.array(
    [
     [[[-1, 1.0/9, 0]],[[-1, 1.0/9, -1]],[[-1, 1.0/9, 0]]],
     [[[-1, 1.0/9, -1]],[[8, 1.0/9,5]],[[-1, 1.0/9,-1]]],
     [[[-1, 1.0/9,0]],[[-1, 1.0/9,-1]],[[-1, 1.0/9, 0]]]
     ])

kernel = tf.Variable(kernel_init, dtype=tf.float32)

out = tf.nn.conv2d(x, kernel, strides=[1,1,1,1], padding='SAME')

What might this look like in real life?

Here you can see the output produced by the above code. The first image is the original and going clock-wise you have outputs of the 1st filter, 2nd filter, and 3 filters.

Original image with the output of three kernels

What do multiple channels mean?

In the context, of 2D convolution, it is much easier to understand what these multiple channels mean. Say you are doing face recognition. You can think of (this is a very unrealistic simplification but gets the point across) each filter represents an eye, mouth, nose, etc. So that each feature map would be a binary representation of whether that feature is there in the image you provided. I don’t think I need to stress that for a face recognition model those are very valuable features. This can be articulated through this:

Kernel designed to find eyes-like features in the image

1D Convolution | Conv1D

Operation of 1D CNN

Overview:

  • The convolutional kernel/filter moves in just one direction(say along the time-axis) to calculate the output.
  • Output-shape is a 1D array.
  • Use case: Signal smoothing, Sentence Classification

Before going through Conv1D, let me give you a hint. In Conv1D, the kernel slides along one dimension. Now let’s pause the blog here and think about which type of data requires kernel sliding in only one dimension and has spatial properties?

The answer is Time-Series data. Let’s look at the following data.

Time series data from an accelerometer

This data is collected from an accelerometer that a person is wearing on his arm. Data represent the acceleration in all the 3 axes. 1D CNN can perform activity recognition tasks from accelerometer data, such as if the person is standing, walking, jumping, etc. This data has 2 dimensions. The first dimension is time steps and the other is the values of the acceleration in 3 axes.

The following plot illustrates how the kernel will move on accelerometer data. Each row represents time series acceleration for some axis. The kernel can only move in one dimension along the axis of time.

Kernel sliding over accelerometer data

Implementation:

import keras

from keras.layers import Conv1D

model = keras.models.Sequential()

model.add(Conv1D(1, kernel_size=5, input_shape = (120, 3)))

model.summary()

Argument input_shape (120, 3), represents 120 time steps with 3 data points in each time step. These 3 data points are acceleration for the x, y, and z axes. Argument kernel_size is 5, representing the width of the kernel, and kernel height will be the same as the number of data points in each time step.

Similarly, 1D CNNs are also used on audio and text data since we can also represent the sound and texts as time series data. Please refer to the images below.

Text data as Time Series

Conv1D is widely applied to sensory data, and accelerometer data is one of them.

Here is how we perform 1D convolution in TensorFlow 2.x.

import tensorflow as tf
import numpy as np

inp = np.array([[[0],[1],[2],[3],[4]],[[5],[4],[3],[2],[1]]]).astype(np.float32)
kernel = tf.Variable(tf.initializers.glorot_uniform()([5, 1, 4]), dtype=tf.float32)
out = tf.nn.conv1d(inp, kernel, stride=1, padding='SAME') # Notice the use of 1D conv. 

print(inp.shape, kernel.shape, out.shape)

We observe that the shapes are:

  • Input 1D vector – [batch size, width, in channels] (e.g. 2, 5, 1)
  • Convolutional kernel – [width, in channels, out channels] (e.g. 5, 1, 4)
  • Output Volume – [batch size, width, out_channels] (e.g. 2, 5, 4)

What might this look like in real life?

So let’s understand what this is doing using a signal smoothing example. On the left, you got the original and on the right, you got the output of a Convolution 1D which has 3 output channels.

Left we have the original signal, right we have the output of 1D CNN

What do multiple channels mean?

Multiple channels are basically multiple feature representations of an input. In this example, you have three representations obtained by three different filters. The first channel is the equally-weighted smoothing filter. The second is a filter that weights the middle of the filter more than the boundaries. The final filter does the opposite of the second. So you can see how these different filters bring about different effects.

Visual representation of 1D convolutional kernel

3D Convolution | Conv3D

Overview

  • The convolutional kernel moves in 3 directions (x,y,z) to calculate the convolutional output.
  • Output-shape is 3D Volume
  • Use Case: Conv3D is mostly used with 3D image data such as Magnetic Resonance Imaging (MRI) or Computerized Tomography (CT) Scan.

In Conv3D, the kernel slides in 3 dimensions as shown below. Let’s think again about which data type requires the kernel moving across the 3 dimension?

Kernel sliding on 3D data

Conv3D is mostly used with 3D image data. Such as Magnetic Resonance Imaging (MRI) data. MRI data is widely used for examining the brain, spinal cords, internal organs, and many more. A Computerized Tomography (CT) Scan is also an example of 3D data, which is created by combining a series of X-rays image taken from different angles around the body. We can use Conv3D to classify this medical data or extract features from it.

Cross Section of 3D Image of CT Scan and MRI

One more example of 3D data is Video. Video is nothing but a sequence of image frames together. We can apply Conv3D on video as well since it has spatial features.

Following is the code to add the Conv3D layer in Keras.

import keras

from keras.layers import Conv3D

model = keras.models.Sequential()

model.add(Conv3D(1, kernel_size=(3,3,3), input_shape = (128, 128, 128, 3)))
         
model.summary()

Here argument Input_shape (128, 128, 128, 3) has 4 dimensions. A 3D image is a 4-dimensional data where the fourth dimension represents the number of color channels. Just like a flat 2D image has 3 dimensions, where the 3rd dimension represents color channels. Argument kernel_size (3,3,3) represents (height, width, depth) of the kernel, and the 4th dimension of the kernel will be the same as the color channel.

More details on 1D convolution

How does convolution work? (Kernel size = 1)

Convolution is a linear operation that involves a multiplicating of weights with input and producing an output. The multiplication is performed between an array of input data and an array of weights, called a kernel (or a filter). The operation applied between the input and the kernel, is a sum of an element-wise dot product. The result of each operation is a single value.

Let us start with the simplest example, using 1D convolution when you have 1D data. Applying a convolution on a 1D array performs the multiplication of the value in the kernel with every value in the input vector.

Assume that the value in our kernel (also known as “weights”) is “2”, we will multiply each element in the input vector by 2, one after another until the end of the input vector, and get our output vector. The size of the output vector is the same as the size of the input.

Apply convolution with a kernel of size 1

First, we multiply 1 by the weight, 2, and get “2” for the first element. Then we shift the kernel by 1 step, multiply 2 by the weight, 2 to get “4”. We repeat this until the last element, 6, and multiply 6 by the weight, and we get “12”. This process produces the output vector.

class TestConv1d(nn.Module):
    def __init__(self):
        super(TestConv1d, self).__init__()
        self.conv = nn.Conv1d(in_channels=1, out_channels=1, kernel_size=1, bias=False)
        self.init_weights()

    def forward(self, x):
        return self.conv(x)
    
    def init_weights(self):
        self.conv.weight[0,0,0] = 2.


in_x = torch.tensor([[[1,2,3,4,5,6]]]).float()
print("in_x.shape", in_x.shape)
print(in_x)

net = TestConv1d()
out_y = net(in_x)

print("out_y.shape", out_y.shape)
print(out_y)
in_x.shape torch.Size([1, 1, 6])  
tensor([[[1., 2., 3., 4., 5., 6.]]])  
out_y.shape torch.Size([1, 1, 6])  
tensor([[[ 2.,  4.,  6.,  8., 10., 12.]]], grad_fn=<SqueezeBackward1>)

Effect of kernel size (Kernel size = 2)

The different-sized kernel will detect differently sized features in the input and, in turn, will result in different-sized feature maps. Let’s look at another example, where the kernel size is 1×2, with the weights “2”. Like before, we slide the kernel across the input vector over each element. We perform convolution by multiplying each element to the kernel and adding up the products to get the final output value. We repeat this multiplication and addition, one after another until the end of the input vector, and produce the output vector.

Apply convolution with a kernel of size 2.

First, we multiply 1 by 2 and get “2”, and multiply 2 by 2 and get “2”. Then we add the two numbers, 2 and 4, and we get “6”–that is the first element in the output vector. We repeat the same process until the end of the input vector and produce the output vector.

class TestConv1d(nn.Module):
    def __init__(self):
        super(TestConv1d, self).__init__()
        self.conv = nn.Conv1d(in_channels=1, out_channels=1, kernel_size=2, bias=False)
        self.init_weights()

    def forward(self, x):
        return self.conv(x)
    
    def init_weights(self):
        self.conv.weight[0,0,0] = 2.
        self.conv.weight[0,0,1] = 2.


in_x = torch.tensor([[[1,2,3,4,5,6]]]).float()
print("in_x.shape", in_x.shape)
print(in_x)

net = TestConv1d()
out_y = net(in_x)

print("out_y.shape", out_y.shape)
print(out_y)
in_x.shape torch.Size([1, 1, 6])  
tensor([[[1., 2., 3., 4., 5., 6.]]])  
out_y.shape torch.Size([1, 1, 5])  
tensor([[[ 6., 10., 14., 18., 22.]]], grad_fn=<SqueezeBackward1>)

How to calculate the output vector’s shape

As you might have noticed, the output vector is slightly smaller than before. That is because we increased the kernel’s size, from 1×1 to 1×2. Looking at the PyTorch documentation, we can calculate the output vector’s length with the following:

Calculate the shape of the output. [https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html#torch.nn.Conv1d]

If we apply a kernel with size 1×2 on an input vector of size 1×6, we can substitute the values accordingly and get the output length of 1×5:

Shape of output vector after applying the 1x2 kernel.

Common kernel sizes are in odd numbers (Kernel size = 3)

In the previous example, a kernel size of 2 is a little uncommon, so let’s take another example where our kernel size is 3, where its weights are “2”. Like before, we perform convolution by multiplying each element to the kernel and adding up the products. We repeat this process until the end of the input vector, which produces the output vector.

Apply convolution with a kernel of size 3.

Likewise, the output vector is smaller than the input. Applying a 1×3 kernel on a 1×6 input vector will result in a feature vector with a size of 1×4.

In image processing, it is common to use 3×3, 5×5 sized kernels. Sometimes we might use kernels of size 7×7 for larger input images.

class TestConv1d(nn.Module):
    def __init__(self):
        super(TestConv1d, self).__init__()
        self.conv = nn.Conv1d(in_channels=1, out_channels=1, kernel_size=3, bias=False)
        self.init_weights()

    def forward(self, x):
        return self.conv(x)
    
    def init_weights(self):
        new_weights = torch.ones(self.conv.weight.shape) * 2.
        self.conv.weight = torch.nn.Parameter(new_weights, requires_grad=False)


in_x = torch.tensor([[[1,2,3,4,5,6]]]).float()
print("in_x.shape", in_x.shape)
print(in_x)

net = TestConv1d()
out_y = net(in_x)

print("out_y.shape", out_y.shape)
print(out_y)
in_x.shape torch.Size([1, 1, 6])  
tensor([[[1., 2., 3., 4., 5., 6.]]])  
out_y.shape torch.Size([1, 1, 4])  
tensor([[[12., 18., 24., 30.]]])

How to produce an output vector of the same size? (Padding)

Applying convolution with a 1×3 kernel on a 1×6 input, we got a shorter output vector, 1×4. By default, a kernel starts on the left of the vector. The kernel is then stepped across the input vector one element at a time until the rightmost kernel element is on the last element of the input vector. Thus, the larger the kernel size is, the small the output vector is going to be.

When to use paddings? Sometimes, it is desirable to produce a feature vector of the same length as the input vector. We can achieve that by adding padding. Padding is adding zeros at the beginning and the end of the input vector.

By adding 1 padding to the 1×6 input vector, we are artificially creating an input vector with a size of 1×8. This adds an element at the beginning and the end of the input vector. Performing convolutions with a kernel size of 3, the output vector is essentially the same size as the input vector. The padding added has zero value; thus it has no effect on the dot product operation when the kernel is applied.

Apply convolution with a kernel with padding.

For a convolution with a kernel size of 5, we can also produce an output vector of the same length by adding 2 paddings at the front and the end of the input vector. Likewise, for images, applying a 3×3 kernel to the 128×128 images, we can add a border of one pixel around the outside of the image to produce the size 128×128 output feature map.

class TestConv1d(nn.Module):
    def __init__(self):
        super(TestConv1d, self).__init__()
        self.conv = nn.Conv1d(in_channels=1, out_channels=1, kernel_size=3, padding=1, bias=False)
        self.init_weights()

    def forward(self, x):
        return self.conv(x)
    
    def init_weights(self):
        new_weights = torch.ones(self.conv.weight.shape) * 2.
        self.conv.weight = torch.nn.Parameter(new_weights, requires_grad=False)


in_x = torch.tensor([[[1,2,3,4,5,6]]]).float()
print("in_x.shape", in_x.shape)
print(in_x)

net = TestConv1d()
out_y = net(in_x)

print("out_y.shape", out_y.shape)
print(out_y)
in_x.shape torch.Size([1, 1, 6])  
tensor([[[1., 2., 3., 4., 5., 6.]]])  
out_y.shape torch.Size([1, 1, 6])  
tensor([[[ 6., 12., 18., 24., 30., 22.]]])
class TestConv1d(nn.Module):
    def __init__(self):
        super(TestConv1d, self).__init__()
        self.conv = nn.Conv1d(in_channels=1, out_channels=1, kernel_size=5, padding=2, bias=False)
        self.init_weights()

    def forward(self, x):
        return self.conv(x)
    
    def init_weights(self):
        new_weights = torch.ones(self.conv.weight.shape) * 2.
        self.conv.weight = torch.nn.Parameter(new_weights, requires_grad=False)


in_x = torch.tensor([[[1,2,3,4,5,6]]]).float()
print("in_x.shape", in_x.shape)
print(in_x)

net = TestConv1d()
out_y = net(in_x)

print("out_y.shape", out_y.shape)
print(out_y)
in_x.shape torch.Size([1, 1, 6])  
tensor([[[1., 2., 3., 4., 5., 6.]]])  
out_y.shape torch.Size([1, 1, 6])  
tensor([[[12., 20., 30., 40., 36., 30.]]])

We can shift the kernel by more steps (Stride)

So far, we have been sliding the kernel 1 step at a time. The amount of movement on the kernel to the input image is referred to as “stride”, and the default stride value is 1. But we can always shift the kernel by any number of elements, by increasing the stride size.

For example, we can shift our kernel with a stride of 3. First, we will multiply and sum the first three elements. Then we will slide the kernel by three steps and perform the same operation for the next three elements. As a result, our output vector is of size 2.

Apply convolution with a kernel with a stride size of 3.

When to increase stride size? In most cases, we increase the stride size to down-sample the input vector. Applying a stride size of 2 will reduce the length of the vector by half. Sometimes, we can use a larger stride to replace pooling layers to reduce the spatial size, reducing the model’s size and increasing speed.

class TestConv1d(nn.Module):
    def __init__(self):
        super(TestConv1d, self).__init__()
        self.conv = nn.Conv1d(in_channels=1, out_channels=1, kernel_size=3, stride=3, bias=False)
        self.init_weights()

    def forward(self, x):
        return self.conv(x)
    
    def init_weights(self):
        new_weights = torch.ones(self.conv.weight.shape) * 2.
        self.conv.weight = torch.nn.Parameter(new_weights, requires_grad=False)


in_x = torch.tensor([[[1,2,3,4,5,6]]]).float()
print("in_x.shape", in_x.shape)
print(in_x)

net = TestConv1d()
out_y = net(in_x)

print("out_y.shape", out_y.shape)
print(out_y)
in_x.shape torch.Size([1, 1, 6])  
tensor([[[1., 2., 3., 4., 5., 6.]]])  
out_y.shape torch.Size([1, 1, 2])  
tensor([[[12., 30.]]])

Increase the convolution’s receptive field (Dilation)

While you were reading deep learning literature, you may have noticed the term “dilated convolutions”. Dilated convolutions “inflate” the kernel by inserting spaces between the kernel elements, and a parameter controls the dilation rate. A dilation rate of 2 means there is a space between the kernel elements. Essentially, a convolution kernel with dilation = 1 corresponds to a regular convolution.

Dilated convolutions are used in the DeepLab architecture, and that is how the atrous spatial pyramid pooling (ASPP) works. With ASPP, high-resolution input feature maps were extracted, and it manages to encode image context at multiple scales. For signal processing, it can effectively increase the output vector’s receptive field without increasing the kernel size (without increasing the model’s size too).

Apply convolution with a kernel with dilation rate of 2.

When to use dilated convolutions? Generally, dilated convolutions have shown better segmentation performance in DeepLab and in Multi-Scale Context Aggregation by Dilated Convolutions. You might want to use dilated convolutions if you want an exponential expansion of the receptive field without loss of resolution or coverage. This allows us to have a larger receptive field with the same computation and memory costs while preserving resolution.

class TestConv1d(nn.Module):
    def __init__(self):
        super(TestConv1d, self).__init__()
        self.conv = nn.Conv1d(in_channels=1, out_channels=1, kernel_size=3, dilation=2, bias=False)
        self.init_weights()

    def forward(self, x):
        return self.conv(x)
    
    def init_weights(self):
        new_weights = torch.ones(self.conv.weight.shape) * 2.
        self.conv.weight = torch.nn.Parameter(new_weights, requires_grad=False)


in_x = torch.tensor([[[1,2,3,4,5,6]]]).float()
print("in_x.shape", in_x.shape)
print(in_x)

net = TestConv1d()
out_y = net(in_x)

print("out_y.shape", out_y.shape)
print(out_y)
in_x.shape torch.Size([1, 1, 6])  
tensor([[[1., 2., 3., 4., 5., 6.]]])  
out_y.shape torch.Size([1, 1, 2])  
tensor([[[18., 24.]]])

Separate the weights (Groups)

By default, the “groups” parameter is set to 1, where all the input channels are convolved to all outputs. To use groupwise convolution, we can increase the “groups” value; this will force the training to split the input vector’s channels into different groupings of features.

When groups=2, this is essentially equivalent to having two convolution layers side by side, where each only processes half the input channels. Each group then produces half the output channels and then subsequently concatenated them to form the final output vector.

Apply groupwise convolution.
class TestConv1d(nn.Module):
    def __init__(self):
        super(TestConv1d, self).__init__()
        self.conv = nn.Conv1d(in_channels=2, out_channels=2, kernel_size=1, groups=2, bias=False)
        self.init_weights()

    def forward(self, x):
        return self.conv(x)
    
    def init_weights(self):
        print(self.conv.weight.shape)
        self.conv.weight[0,0,0] = 2.
        self.conv.weight[1,0,0] = 4.


in_x = torch.tensor([[[1,2,3,4,5,6],[10,20,30,40,50,60]]]).float()
print("in_x.shape", in_x.shape)
print(in_x)

net = TestConv1d()
out_y = net(in_x)

print("out_y.shape", out_y.shape)
print(out_y)
in_x.shape torch.Size([1, 2, 6])  
tensor([[[ 1.,  2.,  3.,  4.,  5.,  6.],  
         [10., 20., 30., 40., 50., 60.]]])  
torch.Size([2, 1, 1])  
out_y.shape torch.Size([1, 2, 6])  
tensor([[[  2.,   4.,   6.,   8.,  10.,  12.],  
         [ 40.,  80., 120., 160., 200., 240.]]], grad_fn=<SqueezeBackward1>)

Depthwise convolution. Groups are utilized when we want to perform depthwise convolution, for example, if we want to extract image features on R, G, and B channels separately. When groups == in_channels and out_channels == K * in_channels; this operation is also termed in literature as depthwise convolution.

class TestConv1d(nn.Module):
    def __init__(self):
        super(TestConv1d, self).__init__()
        self.conv = nn.Conv1d(in_channels=2, out_channels=4, kernel_size=1, groups=2, bias=False)
        self.init_weights()

    def forward(self, x):
        return self.conv(x)
    
    def init_weights(self):
        print(self.conv.weight.shape)
        self.conv.weight[0,0,0] = 2.
        self.conv.weight[1,0,0] = 4.
        self.conv.weight[2,0,0] = 6.
        self.conv.weight[3,0,0] = 8.


in_x = torch.tensor([[[1,2,3,4,5,6],[10,20,30,40,50,60]]]).float()
print("in_x.shape", in_x.shape)
print(in_x)

net = TestConv1d()
out_y = net(in_x)

print("out_y.shape", out_y.shape)
print(out_y)
in_x.shape torch.Size([1, 2, 6])  
tensor([[[ 1.,  2.,  3.,  4.,  5.,  6.],  
         [10., 20., 30., 40., 50., 60.]]])  
torch.Size([4, 1, 1])  
out_y.shape torch.Size([1, 4, 6])  
tensor([[[  2.,   4.,   6.,   8.,  10.,  12.],  
         [  4.,   8.,  12.,  16.,  20.,  24.],  
         [ 60., 120., 180., 240., 300., 360.],  
         [ 80., 160., 240., 320., 400., 480.]]], grad_fn=<SqueezeBackward1>)

In 2012, grouped convolutions were introduced in the AlexNet paper, where their primary motivation was to allow the network’s training over two GPUs. However, there was an interesting side-effect to this engineering hack, that they learn better representations. Training an AlexNet with and without grouped convolutions has different accuracy and computational efficiency. AlexNet without grouped convolutions is less efficient and is also slightly less accurate.

In my work, I have also applied grouped convolutions to effectively trained a scalable multi-task learning model. I can tweak and scale to any number of tasks by tweaking the “group” parameter.

1×1 convolution

Several papers use 1×1 convolutions, as first investigated by Network in Network. It can be confusing to see 1×1 convolutions and seems like it does not make sense as it is just pointwise scaling.

However, this is not the case because, for example, in computer vision, we are operating over 3-dimensional volumes; the kernels always extend through the full depth of the input. If the input is 128x128x3, then doing 1×1 convolutions would effectively be doing 3-dimensional dot products since the input depth is 3 channels.

In GoogLeNet, the 1×1 kernel was used for dimensionality reduction and for increasing the dimensionality of feature maps. The 1×1 kernel is also used to increase the number of feature maps after pooling; this artificially creates more feature maps of the downsampled features.

In ResNet, the 1×1 kernel was used as a projection technique to match the number of filters of input to the residual output modules in the design of the residual network.

In TCN, the 1×1 kernel was added to account for discrepant input-output widths, as the input and output could have different widths. 1×1 kernel convolution ensures that the elementwise addition receives tensors of the same shape.

Summary

The following charts summarize the key differences between 1D, 2D, and 3D convolutional neural networks. Note that the input and output shapes are for TensorFlow.

Input shape for 1D, 2D, and 3D CNN in TensorFlow.

Output shape for 1D, 2D, and 3D CNN in TensorFlow.

Direction of operation for 1D, 2D, and 3D CNN in TensorFlow.

  • In 1D CNN, the kernel moves in 1 direction. Input and output data of 1D CNN is 2 dimensional. Mostly used on Time-Series data.
  • In 2D CNN, the kernel moves in 2 directions. Input and output data of 2D CNN is 3-dimensional. Mostly used on Image data.
  • In 3D CNN, the kernel moves in 3 directions. Input and output data of 3D CNN is 4-dimensional. Mostly used on 3D Image data (MRI, CT Scans, Video).

Resources:

Understanding 1D and 3D Convolution Neural Network | Keras | by Shiva Verma | Towards Data Science
https://towardsdatascience.com/understanding-1d-and-3d-convolution-neural-network-keras-9d8f76e29610

Intuitive understanding of 1D, 2D, and 3D convolutions in convolutional neural networks. – Weights & Biases
https://wandb.ai/ayush-thakur/dl-question-bank/reports/Intuitive-understanding-of-1D-2D-and-3D-convolutions-in-convolutional-neural-networks—VmlldzoxOTk2MDA

How do Convolutional Layers Work in Deep Learning Neural Networks? – Hong Jing (Jingles)
https://jinglescode.github.io/2020/11/01/how-convolutional-layers-work-deep-learning-neural-networks/

Amir Masoud Sefidian
Amir Masoud Sefidian
Machine Learning Engineer

Comments are closed.