Calculate CNN backprop with padding and stride set to 2
by Hide Inada
In this article, I'll discuss how you can calculate backprop for one layer of a convolutional neural network (CNN) when stride is set to two.
I would like to thank Mr. Sujit Rai for explaining backpropagation calculation in his article
Forward And Backpropagation in Convolutional Neural Network
in which backprop calculation for 3x3 input, 2x2 kernel, 2x2 output is discussed. I got an idea to calculate
backprop from this article and applied to calculating equations for 4x4 zero-padded input, 3x3 kernel, 2x2 output when stride is set to 2 is used during front prop.
I was trying to be careful in coming up with equations and calculations, but if you see any error,
please open an issue in
my ML repository.
Please note that any error in this article is mine.
0. Prerequisite
This article is focused on a specific area of CNN backprop and knowledge in the following areas will be required as a prerequisite:
- How forward prop is calculated for CNN including cases where padding is used and stride is not 1
- How backprop is calculated for a plain-vanilla multi-layer neural network
- Basic multi-variable calculus
1. Objectives
The objectives of this article are to calculate the gradient of overall loss with respect to the previous layer's
activation as well as the gradient of overall loss with respect to the weights stored in the
convolutional kernel of the current layer when front prop of the layer was calculated by the following equation:
\begin{equation}
Z = zero\_pad(a_{prev}) *_{stride2} W
\end{equation}
where
- \(Z\) is the set of values of this layer before activation function (e.g. sigmoid) is applied.
- \(a_{prev}\) is previous layer's activation.
- zero_pad is a function to put 0 around the matrix so that the convolution operation output will be the same size as the input matrix.
- \(W\) is the convolution kernel (filter).
- \(*\) is the convolution operator. Note that convolution in machine learning means cross-correlation, and I'm following that naming convention.
- \(*_{stride2}\) denotes the convolution operation with stride is set to 2.
The matrix form of each variable is given as the following:
\begin{equation}
Z =
\begin{bmatrix}
z_{11} & z_{12} \\
z_{21} & z_{22}
\end{bmatrix}
\end{equation}
\begin{equation}
a_{prev} =
\begin{bmatrix}
a_{11} & a_{12} & a_{13} & a_{14} \\
a_{21} & a_{22} & a_{23} & a_{24} \\
a_{31} & a_{32} & a_{33} & a_{34} \\
a_{41} & a_{42} & a_{43} & a_{44}
\end{bmatrix}
\end{equation}
\begin{equation}
zero\_pad(a_{prev}) =
\begin{bmatrix}
0 & 0 & 0 & 0 & 0 & 0 \\
0 & a_{11} & a_{12} & a_{13} & a_{14} & 0 \\
0 & a_{21} & a_{22} & a_{23} & a_{24} & 0 \\
0 & a_{31} & a_{32} & a_{33} & a_{34} & 0 \\
0 & a_{41} & a_{42} & a_{43} & a_{44} & 0 \\
0 & 0 & 0 & 0 & 0 & 0
\end{bmatrix}
\end{equation}
\begin{equation}
W =
\begin{bmatrix}
w_{11} & w_{12} & w_{13} \\
w_{21} & w_{22} & w_{23} \\
w_{31} & w_{32} & w_{33}
\end{bmatrix}
\end{equation}
Each element of Z is expressed by the following equations. Note that the convolution kernel W is shifted by two slots after each operation since stride is set to 2:
\begin{equation}
Z_{11} =
0 \cdot w_{11} + 0 \cdot w_{12} + 0 \cdot w_{13} +
0 \cdot w_{21} + a_{11} \cdot w_{22} + a_{12} \cdot w_{23} +
0 \cdot w_{31} + a_{21} \cdot w_{32} + a_{22} \cdot w_{33}
\end{equation}
\begin{equation}
Z_{12} =
0 \cdot w_{11} + 0 \cdot w_{12} + 0 \cdot w_{13} +
a_{12} \cdot w_{21} + a_{13} \cdot w_{22} + a_{14} \cdot w_{23} +
a_{22} \cdot w_{31} + a_{23} \cdot w_{32} + a_{24} \cdot w_{33}
\end{equation}
\begin{equation}
Z_{21} =
0 \cdot w_{11} + a_{21} \cdot w_{12} + a_{22} \cdot w_{13} +
0 \cdot w_{21} + a_{31} \cdot w_{22} + a_{32} \cdot w_{23} +
0 \cdot w_{31} + a_{41} \cdot w_{32} + a_{42} \cdot w_{33}
\end{equation}
\begin{equation}
Z_{22} =
a_{22} \cdot w_{11} + a_{23} \cdot w_{12} + a_{24} \cdot w_{13} +
a_{32} \cdot w_{21} + a_{33} \cdot w_{22} + a_{34} \cdot w_{23} +
a_{42} \cdot w_{31} + a_{43} \cdot w_{32} + a_{44} \cdot w_{33}
\end{equation}
During a backprop, gradients of the loss are propagated from the final layer over each layer of the network.
Specifically propagation will take in the form of \( \frac{\partial L}{\partial a} \) for this layer
where
- \(L\) is the loss calculated by the cost function of the network.
- \(a\) is the activation of this layer.
Once you know \( \frac{\partial L}{\partial a}\), you can apply the following chain rule to calculate the gradient with respect to Z.
This spefic calculation is an element-wise operation, so it is straightforward.
\begin{equation}
\frac{\partial L}{\partial Z} = \frac{\partial L}{\partial a} \cdot \frac{\partial a}{\partial Z}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial Z} =
\begin{bmatrix}
\frac{\partial L}{\partial z_{11}} & \frac{\partial L}{\partial z_{12}} \\
\frac{\partial L}{\partial z_{21}} & \frac{\partial L}{\partial z_{22}} \\
\end{bmatrix}
\end{equation}
The objectives of backprop for this layer are to find the values for the following two matrices:
\begin{equation}
\frac{\partial L}{\partial a_{prev}} =
\begin{bmatrix}
\frac{\partial L}{\partial a_{11}} & \frac{\partial L}{\partial a_{12}} & \frac{\partial L}{\partial a_{13}} & \frac{\partial L}{\partial a_{14}} \\
\frac{\partial L}{\partial a_{21}} & \frac{\partial L}{\partial a_{22}} & \frac{\partial L}{\partial a_{23}} & \frac{\partial L}{\partial a_{24}} \\
\frac{\partial L}{\partial a_{31}} & \frac{\partial L}{\partial a_{32}} & \frac{\partial L}{\partial a_{33}} & \frac{\partial L}{\partial a_{34}} \\
\frac{\partial L}{\partial a_{41}} & \frac{\partial L}{\partial a_{42}} & \frac{\partial L}{\partial a_{43}} & \frac{\partial L}{\partial a_{44}}
\end{bmatrix}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial W} =
\begin{bmatrix}
\frac{\partial L}{\partial w_{11}} & \frac{\partial L}{\partial w_{12}} & \frac{\partial L}{\partial w_{13}} \\
\frac{\partial L}{\partial w_{21}} & \frac{\partial L}{\partial w_{22}} & \frac{\partial L}{\partial w_{23}} \\
\frac{\partial L}{\partial w_{31}} & \frac{\partial L}{\partial w_{32}} & \frac{\partial L}{\partial w_{33}}
\end{bmatrix}
\end{equation}
2. Solution
To find \(\frac{\partial L}{\partial a_{prev}}\), we just have to calculate each of 16 elements of the matrix \( (12) \).
Note that for all the 16 equations below, the second equation is derived by substituting w for the partial
derivative of z with respect to a. This is done by differentiating both sides of equations
\( (6) \) through \( (9) \) with respect to \( a \).
Let's take an example of \( (6) \).
Differentiating both sides of the equation with respect to \( a_{11} \) yields
\begin{equation}
\frac{\partial z_{11}}{\partial a_{11}} = w_{22}
\end{equation}
This is because all other terms on the right side will disappear during differentiation.
2.1. Calculate \(\frac{\partial L}{\partial a_{prev}}\)
2.1.1. Calculate 16 elements
\( \frac{\partial L}{\partial a_{11}} \)
\begin{equation}
\frac{\partial L}{\partial a_{11}} =
\frac{\partial L}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial a_{11}} +
\frac{\partial L}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial a_{11}} +
\frac{\partial L}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial a_{11}} +
\frac{\partial L}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial a_{11}}
\end{equation}
\begin{equation}
=
\frac{\partial L}{\partial z_{11}} \cdot w_{22} +
\frac{\partial L}{\partial z_{12}} \cdot 0 +
\frac{\partial L}{\partial z_{21}} \cdot 0 +
\frac{\partial L}{\partial z_{22}} \cdot 0
\end{equation}
\begin{equation}
\frac{\partial L}{\partial a_{11}} =
\frac{\partial L}{\partial z_{11}} \cdot w_{22}
\end{equation}
\( \frac{\partial L}{\partial a_{12}} \)
\begin{equation}
\frac{\partial L}{\partial a_{12}} =
\frac{\partial L}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial a_{12}} +
\frac{\partial L}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial a_{12}} +
\frac{\partial L}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial a_{12}} +
\frac{\partial L}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial a_{12}}
\end{equation}
\begin{equation}
=
\frac{\partial L}{\partial z_{11}} \cdot w_{23} +
\frac{\partial L}{\partial z_{12}} \cdot w_{21} +
\frac{\partial L}{\partial z_{21}} \cdot 0 +
\frac{\partial L}{\partial z_{22}} \cdot 0
\end{equation}
\begin{equation}
\frac{\partial L}{\partial a_{12}} =
\frac{\partial L}{\partial z_{11}} \cdot w_{23} +
\frac{\partial L}{\partial z_{12}} \cdot w_{21}
\end{equation}
\( \frac{\partial L}{\partial a_{13}} \)
\begin{equation}
\frac{\partial L}{\partial a_{13}} =
\frac{\partial L}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial a_{13}} +
\frac{\partial L}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial a_{13}} +
\frac{\partial L}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial a_{13}} +
\frac{\partial L}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial a_{13}}
\end{equation}
\begin{equation}
=
\frac{\partial L}{\partial z_{11}} 0 +
\frac{\partial L}{\partial z_{12}} \cdot w_{22} +
\frac{\partial L}{\partial z_{21}} \cdot 0 +
\frac{\partial L}{\partial z_{22}} \cdot 0
\end{equation}
\begin{equation}
\frac{\partial L}{\partial a_{13}} =
\frac{\partial L}{\partial z_{12}} \cdot w_{22}
\end{equation}
\( \frac{\partial L}{\partial a_{14}} \)
\begin{equation}
\frac{\partial L}{\partial a_{14}} =
\frac{\partial L}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial a_{14}} +
\frac{\partial L}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial a_{14}} +
\frac{\partial L}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial a_{14}} +
\frac{\partial L}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial a_{14}}
\end{equation}
\begin{equation}
=
\frac{\partial L}{\partial z_{11}} \cdot 0 +
\frac{\partial L}{\partial z_{12}} \cdot w_{23} +
\frac{\partial L}{\partial z_{21}} \cdot 0 +
\frac{\partial L}{\partial z_{22}} \cdot 0
\end{equation}
\begin{equation}
\frac{\partial L}{\partial a_{14}} =
\frac{\partial L}{\partial z_{12}} \cdot w_{23}
\end{equation}
\( \frac{\partial L}{\partial a_{21}} \)
\begin{equation}
\frac{\partial L}{\partial a_{21}} =
\frac{\partial L}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial a_{21}} +
\frac{\partial L}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial a_{21}} +
\frac{\partial L}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial a_{21}} +
\frac{\partial L}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial a_{21}}
\end{equation}
\begin{equation}
=
\frac{\partial L}{\partial z_{11}} \cdot w_{32} +
\frac{\partial L}{\partial z_{12}} \cdot 0 +
\frac{\partial L}{\partial z_{21}} \cdot w_{12} +
\frac{\partial L}{\partial z_{22}} \cdot 0
\end{equation}
\begin{equation}
\frac{\partial L}{\partial a_{21}} =
\frac{\partial L}{\partial z_{11}} \cdot w_{32} +
\frac{\partial L}{\partial z_{21}} \cdot w_{12}
\end{equation}
\( \frac{\partial L}{\partial a_{22}} \)
\begin{equation}
\frac{\partial L}{\partial a_{22}} =
\frac{\partial L}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial a_{22}} +
\frac{\partial L}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial a_{22}} +
\frac{\partial L}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial a_{22}} +
\frac{\partial L}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial a_{22}}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial a_{22}} =
\frac{\partial L}{\partial z_{11}} \cdot w_{33} +
\frac{\partial L}{\partial z_{12}} \cdot w_{31} +
\frac{\partial L}{\partial z_{21}} \cdot w_{13} +
\frac{\partial L}{\partial z_{22}} \cdot w_{11}
\end{equation}
\( \frac{\partial L}{\partial a_{23}} \)
\begin{equation}
\frac{\partial L}{\partial a_{23}} =
\frac{\partial L}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial a_{23}} +
\frac{\partial L}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial a_{23}} +
\frac{\partial L}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial a_{23}} +
\frac{\partial L}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial a_{23}}
\end{equation}
\begin{equation}
=
\frac{\partial L}{\partial z_{11}} \cdot 0 +
\frac{\partial L}{\partial z_{12}} \cdot w_{32} +
\frac{\partial L}{\partial z_{21}} \cdot 0 +
\frac{\partial L}{\partial z_{22}} \cdot w_{12}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial a_{23}} =
\frac{\partial L}{\partial z_{12}} \cdot w_{32} +
\frac{\partial L}{\partial z_{22}} \cdot w_{12}
\end{equation}
\( \frac{\partial L}{\partial a_{24}} \)
\begin{equation}
\frac{\partial L}{\partial a_{24}} =
\frac{\partial L}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial a_{24}} +
\frac{\partial L}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial a_{24}} +
\frac{\partial L}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial a_{24}} +
\frac{\partial L}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial a_{24}}
\end{equation}
\begin{equation}
=
\frac{\partial L}{\partial z_{11}} \cdot 0 +
\frac{\partial L}{\partial z_{12}} \cdot w_{33} +
\frac{\partial L}{\partial z_{21}} \cdot 0 +
\frac{\partial L}{\partial z_{22}} \cdot w_{13}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial a_{24}} =
\frac{\partial L}{\partial z_{12}} \cdot w_{33} +
\frac{\partial L}{\partial z_{22}} \cdot w_{13}
\end{equation}
\( \frac{\partial L}{\partial a_{31}} \)
\begin{equation}
\frac{\partial L}{\partial a_{31}} =
\frac{\partial L}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial a_{31}} +
\frac{\partial L}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial a_{31}} +
\frac{\partial L}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial a_{31}} +
\frac{\partial L}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial a_{31}}
\end{equation}
\begin{equation}
=
\frac{\partial L}{\partial z_{11}} \cdot 0 +
\frac{\partial L}{\partial z_{12}} \cdot 0 +
\frac{\partial L}{\partial z_{21}} \cdot w_{22} +
\frac{\partial L}{\partial z_{22}} 0
\end{equation}
\begin{equation}
\frac{\partial L}{\partial a_{31}} =
\frac{\partial L}{\partial z_{21}} \cdot w_{22}
\end{equation}
\( \frac{\partial L}{\partial a_{32}} \)
\begin{equation}
\frac{\partial L}{\partial a_{32}} =
\frac{\partial L}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial a_{32}} +
\frac{\partial L}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial a_{32}} +
\frac{\partial L}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial a_{32}} +
\frac{\partial L}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial a_{32}}
\end{equation}
\begin{equation}
=
\frac{\partial L}{\partial z_{11}} \cdot 0 +
\frac{\partial L}{\partial z_{12}} \cdot 0 +
\frac{\partial L}{\partial z_{21}} \cdot w_{23} +
\frac{\partial L}{\partial z_{22}} \cdot w_{21}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial a_{32}} =
\frac{\partial L}{\partial z_{21}} \cdot w_{23} +
\frac{\partial L}{\partial z_{22}} \cdot w_{21}
\end{equation}
\( \frac{\partial L}{\partial a_{33}} \)
\begin{equation}
\frac{\partial L}{\partial a_{33}} =
\frac{\partial L}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial a_{33}} +
\frac{\partial L}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial a_{33}} +
\frac{\partial L}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial a_{33}} +
\frac{\partial L}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial a_{33}}
\end{equation}
\begin{equation}
=
\frac{\partial L}{\partial z_{11}} \cdot 0 +
\frac{\partial L}{\partial z_{12}} \cdot 0 +
\frac{\partial L}{\partial z_{21}} \cdot 0 +
\frac{\partial L}{\partial z_{22}} \cdot w_{22}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial a_{33}} =
\frac{\partial L}{\partial z_{22}} \cdot w_{22}
\end{equation}
\( \frac{\partial L}{\partial a_{34}} \)
\begin{equation}
\frac{\partial L}{\partial a_{34}} =
\frac{\partial L}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial a_{34}} +
\frac{\partial L}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial a_{34}} +
\frac{\partial L}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial a_{34}} +
\frac{\partial L}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial a_{34}}
\end{equation}
\begin{equation}
=
\frac{\partial L}{\partial z_{11}} \cdot 0 +
\frac{\partial L}{\partial z_{12}} \cdot 0 +
\frac{\partial L}{\partial z_{21}} \cdot 0 +
\frac{\partial L}{\partial z_{22}} \cdot w_{23}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial a_{34}} =
\frac{\partial L}{\partial z_{22}} \cdot w_{23}
\end{equation}
\( \frac{\partial L}{\partial a_{41}} \)
\begin{equation}
\frac{\partial L}{\partial a_{41}} =
\frac{\partial L}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial a_{41}} +
\frac{\partial L}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial a_{41}} +
\frac{\partial L}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial a_{41}} +
\frac{\partial L}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial a_{41}}
\end{equation}
\begin{equation}
=
\frac{\partial L}{\partial z_{11}} \cdot 0 +
\frac{\partial L}{\partial z_{12}} \cdot 0 +
\frac{\partial L}{\partial z_{21}} \cdot w_{32} +
\frac{\partial L}{\partial z_{22}} 0
\end{equation}
\begin{equation}
\frac{\partial L}{\partial a_{41}} =
\frac{\partial L}{\partial z_{21}} \cdot w_{32}
\end{equation}
\( \frac{\partial L}{\partial a_{42}} \)
\begin{equation}
\frac{\partial L}{\partial a_{42}} =
\frac{\partial L}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial a_{42}} +
\frac{\partial L}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial a_{42}} +
\frac{\partial L}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial a_{42}} +
\frac{\partial L}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial a_{42}}
\end{equation}
\begin{equation}
=
\frac{\partial L}{\partial z_{11}} \cdot 0 +
\frac{\partial L}{\partial z_{12}} \cdot 0 +
\frac{\partial L}{\partial z_{21}} \cdot w_{33} +
\frac{\partial L}{\partial z_{22}} \cdot w_{31}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial a_{42}} =
\frac{\partial L}{\partial z_{21}} \cdot w_{33} +
\frac{\partial L}{\partial z_{22}} \cdot w_{31}
\end{equation}
\( \frac{\partial L}{\partial a_{43}} \)
\begin{equation}
\frac{\partial L}{\partial a_{43}} =
\frac{\partial L}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial a_{43}} +
\frac{\partial L}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial a_{43}} +
\frac{\partial L}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial a_{43}} +
\frac{\partial L}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial a_{43}}
\end{equation}
\begin{equation}
=
\frac{\partial L}{\partial z_{11}} \cdot 0 +
\frac{\partial L}{\partial z_{12}} \cdot 0 +
\frac{\partial L}{\partial z_{21}} \cdot 0 +
\frac{\partial L}{\partial z_{22}} \cdot w_{32}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial a_{43}} =
\frac{\partial L}{\partial z_{22}} \cdot w_{32}
\end{equation}
\( \frac{\partial L}{\partial a_{44}} \)
\begin{equation}
\frac{\partial L}{\partial a_{43}} =
\frac{\partial L}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial a_{44}} +
\frac{\partial L}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial a_{44}} +
\frac{\partial L}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial a_{44}} +
\frac{\partial L}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial a_{44}}
\end{equation}
\begin{equation}
=
\frac{\partial L}{\partial z_{11}} \cdot 0 +
\frac{\partial L}{\partial z_{12}} \cdot 0 +
\frac{\partial L}{\partial z_{21}} \cdot 0 +
\frac{\partial L}{\partial z_{22}} \cdot w_{33}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial a_{44}} =
\frac{\partial L}{\partial z_{22}} \cdot w_{33}
\end{equation}
Now if we put all the elements into a matrix, we will get:
\begin{equation}
\frac{\partial L}{\partial a_{prev}} =
\begin{bmatrix}
\frac{\partial L}{\partial z_{11}} \cdot w_{22} &
\frac{\partial L}{\partial z_{11}} \cdot w_{23} +
\frac{\partial L}{\partial z_{12}} \cdot w_{21} &
\frac{\partial L}{\partial z_{12}} \cdot w_{22} &
\frac{\partial L}{\partial z_{12}} \cdot w_{23} \\
\frac{\partial L}{\partial z_{11}} \cdot w_{32} +
\frac{\partial L}{\partial z_{21}} \cdot w_{12} &
\frac{\partial L}{\partial z_{11}} \cdot w_{33} +
\frac{\partial L}{\partial z_{12}} \cdot w_{31} +
\frac{\partial L}{\partial z_{21}} \cdot w_{13} +
\frac{\partial L}{\partial z_{22}} \cdot w_{11} &
\frac{\partial L}{\partial z_{12}} \cdot w_{32} +
\frac{\partial L}{\partial z_{22}} \cdot w_{12} &
\frac{\partial L}{\partial z_{12}} \cdot w_{33} +
\frac{\partial L}{\partial z_{22}} \cdot w_{13} \\
\frac{\partial L}{\partial z_{21}} \cdot w_{22} &
\frac{\partial L}{\partial z_{21}} \cdot w_{23} +
\frac{\partial L}{\partial z_{22}} \cdot w_{21} &
\frac{\partial L}{\partial z_{22}} \cdot w_{22} &
\frac{\partial L}{\partial z_{22}} \cdot w_{23} \\
\frac{\partial L}{\partial z_{21}} \cdot w_{32} &
\frac{\partial L}{\partial z_{21}} \cdot w_{33} +
\frac{\partial L}{\partial z_{22}} \cdot w_{31} &
\frac{\partial L}{\partial z_{22}} \cdot w_{32} &
\frac{\partial L}{\partial z_{22}} \cdot w_{33}
\end{bmatrix}
\end{equation}
2.1.2. Simplifying the equations
This looks hairy and seems hard to calculate. However, you'll soon see that that's not the case.
Step 1. Interweave \( \frac{\partial L}{\partial z} \) with zeros
First step is to put a row and a column after every element of \( \frac{\partial L}{\partial z} \):
\begin{equation}
\frac{\partial L}{\partial z} =
\begin{bmatrix}
\frac{\partial L}{\partial z_{11}} & \frac{\partial L}{\partial z_{12}} \\
\frac{\partial L}{\partial z_{21}} & \frac{\partial L}{\partial z_{22}}
\end{bmatrix}
\end{equation}
\begin{equation}
zero \_ interweave(\frac{\partial L}{\partial z}) =
\begin{bmatrix}
\frac{\partial L}{\partial z_{11}} & 0 & \frac{\partial L}{\partial z_{12}} & 0 \\
0 & 0 & 0 & 0 \\
\frac{\partial L}{\partial z_{21}} & 0 & \frac{\partial L}{\partial z_{22}} & 0 \\
0 & 0 & 0 & 0
\end{bmatrix}
\end{equation}
Step 2. zero-pad along the edges
Next step is to put zero along the edges.
\begin{equation}
zero\_pad(zero \_ interweave(\frac{\partial L}{\partial z})) =
\begin{bmatrix}
0 & 0 & 0 & 0 & 0 & 0 \\
0 & \frac{\partial L}{\partial z_{11}} & 0 & \frac{\partial L}{\partial z_{12}} & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & \frac{\partial L}{\partial z_{21}} & 0 & \frac{\partial L}{\partial z_{22}} & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
\end{bmatrix}
\end{equation}
Step 3. Flip W vertically and horizontally.
\begin{equation}
W_{2} = flip_{verthorz}(W) =
\begin{bmatrix}
w_{33} & w_{32} & w_{31} \\
w_{23} & w_{22} & w_{21} \\
w_{13} & w_{12} & w_{11}
\end{bmatrix}
\end{equation}
where \(flip_{verthorz}(W) \) is an operator to flip the matrix both vertically and horizontally.
Step 4. Convolve two matrices
\begin{equation}
\frac{\partial L}{\partial a_{prev}} = zero\_pad(zero \_ interweave(\frac{\partial L}{\partial z})) * W_{2}
\end{equation}
This yields the same matrix as the equation \( (62) \).
2.2. Calculate \(\frac{\partial L}{\partial W}\)
Calculating \(\frac{\partial L}{\partial W}\) is similar to the steps above.
We just have to calculate each of the 9 elements of the matrix \( (13) \) this time.
2.2.1. Calculate 9 elements
\( \frac{\partial L}{\partial w_{11}} \)
\begin{equation}
\frac{\partial L}{\partial w_{11}} =
\frac{\partial L}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial w_{11}} +
\frac{\partial L}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial w_{11}} +
\frac{\partial L}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial w_{11}} +
\frac{\partial L}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial w_{11}}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial w_{11}} =
\frac{\partial L}{\partial z_{11}} \cdot 0 +
\frac{\partial L}{\partial z_{12}} \cdot 0 +
\frac{\partial L}{\partial z_{21}} \cdot 0 +
\frac{\partial L}{\partial z_{22}} \cdot a_{22}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial w_{11}} =
\frac{\partial L}{\partial z_{22}} \cdot a_{22}
\end{equation}
\( \frac{\partial L}{\partial w_{12}} \)
\begin{equation}
\frac{\partial L}{\partial w_{12}} =
\frac{\partial L}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial w_{12}} +
\frac{\partial L}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial w_{12}} +
\frac{\partial L}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial w_{12}} +
\frac{\partial L}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial w_{12}}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial w_{12}} =
\frac{\partial L}{\partial z_{11}} \cdot 0 +
\frac{\partial L}{\partial z_{12}} \cdot 0 +
\frac{\partial L}{\partial z_{21}} \cdot a_{21} +
\frac{\partial L}{\partial z_{22}} \cdot a_{23}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial w_{12}} =
\frac{\partial L}{\partial z_{21}} \cdot a_{21} +
\frac{\partial L}{\partial z_{22}} \cdot a_{23}
\end{equation}
\( \frac{\partial L}{\partial w_{13}} \)
\begin{equation}
\frac{\partial L}{\partial w_{13}} =
\frac{\partial L}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial w_{13}} +
\frac{\partial L}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial w_{13}} +
\frac{\partial L}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial w_{13}} +
\frac{\partial L}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial w_{13}}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial w_{13}} =
\frac{\partial L}{\partial z_{11}} \cdot 0 +
\frac{\partial L}{\partial z_{12}} \cdot 0 +
\frac{\partial L}{\partial z_{21}} \cdot a_{22} +
\frac{\partial L}{\partial z_{22}} \cdot a_{24}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial w_{13}} =
\frac{\partial L}{\partial z_{21}} \cdot a_{22} +
\frac{\partial L}{\partial z_{22}} \cdot a_{24}
\end{equation}
\( \frac{\partial L}{\partial w_{21}} \)
\begin{equation}
\frac{\partial L}{\partial w_{21}} =
\frac{\partial L}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial w_{21}} +
\frac{\partial L}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial w_{21}} +
\frac{\partial L}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial w_{21}} +
\frac{\partial L}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial w_{21}}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial w_{21}} =
\frac{\partial L}{\partial z_{11}} \cdot 0 +
\frac{\partial L}{\partial z_{12}} \cdot a_{12} +
\frac{\partial L}{\partial z_{21}} \cdot 0 +
\frac{\partial L}{\partial z_{22}} \cdot a_{32}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial w_{21}} =
\frac{\partial L}{\partial z_{12}} \cdot a_{12} +
\frac{\partial L}{\partial z_{22}} \cdot a_{32}
\end{equation}
\( \frac{\partial L}{\partial w_{22}} \)
\begin{equation}
\frac{\partial L}{\partial w_{22}} =
\frac{\partial L}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial w_{22}} +
\frac{\partial L}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial w_{22}} +
\frac{\partial L}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial w_{22}} +
\frac{\partial L}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial w_{22}}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial w_{22}} =
\frac{\partial L}{\partial z_{11}} \cdot a_{11} +
\frac{\partial L}{\partial z_{12}} \cdot a_{13} +
\frac{\partial L}{\partial z_{21}} \cdot a_{31} +
\frac{\partial L}{\partial z_{22}} \cdot a_{33}
\end{equation}
\( \frac{\partial L}{\partial w_{23}} \)
\begin{equation}
\frac{\partial L}{\partial w_{23}} =
\frac{\partial L}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial w_{23}} +
\frac{\partial L}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial w_{23}} +
\frac{\partial L}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial w_{23}} +
\frac{\partial L}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial w_{23}}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial w_{23}} =
\frac{\partial L}{\partial z_{11}} \cdot a_{12} +
\frac{\partial L}{\partial z_{12}} \cdot a_{14} +
\frac{\partial L}{\partial z_{21}} \cdot a_{32} +
\frac{\partial L}{\partial z_{22}} \cdot a_{34}
\end{equation}
\( \frac{\partial L}{\partial w_{31}} \)
\begin{equation}
\frac{\partial L}{\partial w_{21}} =
\frac{\partial L}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial w_{31}} +
\frac{\partial L}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial w_{31}} +
\frac{\partial L}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial w_{31}} +
\frac{\partial L}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial w_{31}}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial w_{31}} =
\frac{\partial L}{\partial z_{11}} \cdot 0 +
\frac{\partial L}{\partial z_{12}} \cdot a_{22} +
\frac{\partial L}{\partial z_{21}} \cdot 0 +
\frac{\partial L}{\partial z_{22}} \cdot a_{42}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial w_{31}} =
\frac{\partial L}{\partial z_{12}} \cdot a_{22} +
\frac{\partial L}{\partial z_{22}} \cdot a_{42}
\end{equation}
\( \frac{\partial L}{\partial w_{32}} \)
\begin{equation}
\frac{\partial L}{\partial w_{32}} =
\frac{\partial L}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial w_{32}} +
\frac{\partial L}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial w_{32}} +
\frac{\partial L}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial w_{32}} +
\frac{\partial L}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial w_{32}}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial w_{32}} =
\frac{\partial L}{\partial z_{11}} \cdot a_{21} +
\frac{\partial L}{\partial z_{12}} \cdot a_{23} +
\frac{\partial L}{\partial z_{21}} \cdot a_{41} +
\frac{\partial L}{\partial z_{22}} \cdot a_{43}
\end{equation}
\( \frac{\partial L}{\partial w_{33}} \)
\begin{equation}
\frac{\partial L}{\partial w_{23}} =
\frac{\partial L}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial w_{33}} +
\frac{\partial L}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial w_{33}} +
\frac{\partial L}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial w_{33}} +
\frac{\partial L}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial w_{33}}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial w_{33}} =
\frac{\partial L}{\partial z_{11}} \cdot a_{22} +
\frac{\partial L}{\partial z_{12}} \cdot a_{24} +
\frac{\partial L}{\partial z_{21}} \cdot a_{42} +
\frac{\partial L}{\partial z_{22}} \cdot a_{44}
\end{equation}
\begin{equation}
\frac{\partial L}{\partial W} =
\begin{bmatrix}
\frac{\partial L}{\partial z_{22}} \cdot a_{22} &
\frac{\partial L}{\partial z_{21}} \cdot a_{21} +
\frac{\partial L}{\partial z_{22}} \cdot a_{23} &
\frac{\partial L}{\partial z_{21}} \cdot a_{22} +
\frac{\partial L}{\partial z_{22}} \cdot a_{24} \\
\frac{\partial L}{\partial z_{12}} \cdot a_{12} +
\frac{\partial L}{\partial z_{22}} \cdot a_{32} &
\frac{\partial L}{\partial z_{11}} \cdot a_{11} +
\frac{\partial L}{\partial z_{12}} \cdot a_{13} +
\frac{\partial L}{\partial z_{21}} \cdot a_{31} +
\frac{\partial L}{\partial z_{22}} \cdot a_{33} &
\frac{\partial L}{\partial z_{11}} \cdot a_{12} +
\frac{\partial L}{\partial z_{12}} \cdot a_{14} +
\frac{\partial L}{\partial z_{21}} \cdot a_{32} +
\frac{\partial L}{\partial z_{22}} \cdot a_{34} \\
\frac{\partial L}{\partial z_{12}} \cdot a_{22} +
\frac{\partial L}{\partial z_{22}} \cdot a_{42} &
\frac{\partial L}{\partial z_{11}} \cdot a_{21} +
\frac{\partial L}{\partial z_{12}} \cdot a_{23} +
\frac{\partial L}{\partial z_{21}} \cdot a_{41} +
\frac{\partial L}{\partial z_{22}} \cdot a_{43} &
\frac{\partial L}{\partial z_{11}} \cdot a_{22} +
\frac{\partial L}{\partial z_{12}} \cdot a_{24} +
\frac{\partial L}{\partial z_{21}} \cdot a_{42} +
\frac{\partial L}{\partial z_{22}} \cdot a_{44}
\end{bmatrix}
\end{equation}
2.2.2. Simplifying the equations
This matrix also looks complicated, but we can simplify using the below steps.
Step 1. Interweave \( \frac{\partial L}{\partial z} \) with zeros
First step is to put a row and a column after every element of \( Z \). This operation is the same as the one
discussed in 2.1.2.
\begin{equation}
zero \_ interweave(\frac{\partial L}{\partial z}) =
\begin{bmatrix}
\frac{\partial L}{\partial z_{11}} & 0 & \frac{\partial L}{\partial z_{12}} & 0 \\
0 & 0 & 0 & 0 \\
\frac{\partial L}{\partial z_{21}} & 0 & \frac{\partial L}{\partial z_{22}} & 0 \\
0 & 0 & 0 & 0
\end{bmatrix}
\end{equation}
Step 2. Zero-pad \(a_{prev} \)
Put zeros along the edges of \( a_{prev} \). This is the same operation as in \( (4) \).
\begin{equation}
zero\_pad(a_{prev}) =
\begin{bmatrix}
0 & 0 & 0 & 0 & 0 & 0 \\
0 & a_{11} & a_{12} & a_{13} & a_{14} & 0 \\
0 & a_{21} & a_{22} & a_{23} & a_{24} & 0 \\
0 & a_{31} & a_{32} & a_{33} & a_{34} & 0 \\
0 & a_{41} & a_{42} & a_{43} & a_{44} & 0 \\
0 & 0 & 0 & 0 & 0 & 0
\end{bmatrix}
\end{equation}
Step 3. Convolve two matrices
\begin{equation}
\frac{\partial L}{\partial W} = zero\_pad(a_{prev}) * zero \_ interweave(\frac{\partial L}{\partial Z})
\end{equation}
This yields the same matrix as the equation \( (91) \), and we are done.