CNN
CNNは、画像認識や音声認識などの分野で高い精度を誇るニューラルネットワークの一つです。
Mono | Multi |
---|---|
![]() |
![]() |
forward¶
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
$$
\begin{cases}
\begin{aligned}
a_{i,j,c'}^{{k+1}}
&= \sum_c\sum_{m=0}^{M-1}\sum_{n=0}^{N-1}w_{m,n,c,c'}^{k+1}z_{i+m,j+n,c}^{k} + b_{c'}^{k+1}\\
z_{i,j,c'}^{{k}}
&= h^{k}\left(a_{i,j,c'}^{{k}}\right)
\end{aligned}
\end{cases}
$$
backprop¶
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
$w_{m,n,c,c'}^k, b_{c'}^k$¶
$$
\begin{aligned}
\frac{\partial E}{\partial w_{m,n,c,c'}^{k+1}}
&= \sum_{i}\sum_{j}\frac{\partial E}{\partial a_{i,j,c'}^{k+1}}\frac{\partial a_{i,j,c'}^{k+1}}{\partial w_{m,n,c,c'}^{k+1}}\\
&= \sum_{i}\sum_{j}\frac{\partial E}{\partial a_{i,j,c'}^{k+1}}z_{i+m,j+n,c}^{k}\\
&= \sum_{i}\sum_{j}\delta_{i,j,c'}^{k+1}\cdot z_{i+m,j+n,c}^{k}\\
\frac{\partial E}{\partial b_{c'}^{k+1}}
&= \sum_{i}\sum_{j}\delta_{i,j,c'}^{k+1}
\end{aligned}
$$
$\delta_{i,j,c}^k$¶
$$
\begin{aligned}
\delta_{i,j,c}^{k}
&= \frac{\partial E}{\partial a_{i,j,c}^{k}} \\
&= \sum_{c'}\sum_{m=0}^{M-1}\sum_{n=0}^{N-1}\left(\frac{\partial E}{\partial a_{i-m,j-n,c'}^{k+1}}\right)\left(\frac{\partial a_{i-m,j-n,c'}^{k+1}}{\partial a_{i,j,c}^k}\right)\\
&= \sum_{c'}\sum_{m=0}^{M-1}\sum_{n=0}^{N-1} \left(\delta_{i-m,j-n,c'}^{k+1}\right)\left(w_{m,n,c,c'}^{k+1}h'\left(a_{i,j,c}^k\right)\right) \\
&= h'\left(a_{i,j,c}^k\right)\sum_{c'}\sum_{m=0}^{M-1}\sum_{n=0}^{N-1} \delta_{i-m,j-n,c'}^{k+1}\cdot w_{m,n,c,c'}^{k+1}
\end{aligned}
$$
Pooling Layer(プーリング層)¶
Pooling Layerには、Max・Avg・Global-Max・Global-Avgなどの種類がありますが、ここでは簡単のためにMax-Poolingの順伝播・逆伝播計算のについてのみ紹介し、それ以外に関してはここで実演を踏まえて示しています。
forward¶
backprop¶
MNIST¶
MNISTは、画像処理のチュートリアルでよく使われるタスクで、以下の手書き数字($0\sim9$)をモデルに認識させるというものです。
ここでは、簡単のために、数字の $0$ を適当に学習させます。
※ モデルが初めから $0$ だと予測していると学習の流れが見られないので、便宜上、最も予測確率の低いものを正解ラベルとします。
0 | 1 | 2 | 3 | 4 |
![]() |
![]() |
![]() |
![]() |
![]() |
5 | 6 | 7 | 8 | 9 |
![]() |
![]() |
![]() |
![]() |
![]() |
In [1]:
import cv2
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
In [2]:
from kerasy.layers.convolutional import Conv2D
from kerasy.layers.pool import MaxPooling2D
from kerasy.layers.core import Input, Dense, Flatten
from kerasy.engine.sequential import Sequential
In [3]:
cv_path = "/Users/iwasakishuto/Github/portfolio/Kerasy/doc/https://iwasakishuto.github.io/Kerasy/doc/theme/img/MNIST-sample/0.png"
image = np.expand_dims(cv2.imread(cv_path, 0), axis=2)/255
In [4]:
plt.imshow(cv2.cvtColor(cv2.imread(cv_path), cv2.COLOR_BGR2RGB))
plt.xticks([]), plt.yticks([]), plt.title("Sample Training Data.")
plt.show()
In [5]:
model = Sequential()
model.add(Input(input_shape=(28,28,1)))
model.add(Conv2D(filters=32, kernel_size=(3, 3), activation='relu'))
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='tanh'))
model.add(Dense(10, activation='softmax'))
In [6]:
model.compile(optimizer='sgd', loss="categorical_crossentropy")
In [7]:
for layer in model.layers:
print(f"== {layer.name} ==")
print(f"output shape: {layer.output_shape}")
print(f"kernel shape: {layer._losses.get('kernel', np.zeros(shape=())).shape}")
print(f"bias shape : {layer._losses.get('bias', np.zeros(shape=())).shape}")
In [8]:
# #=== Train Only Dense Layer ===
# model.layers[1].trainable = False
# model.layers[2].trainable = False
In [9]:
# #=== Train Only Convolutional Layer ===
model.layers[-1].trainable = False
model.layers[-2].trainable = False
In [10]:
x_train = np.expand_dims(image, axis=0)
In [11]:
original_pred = model.predict(x_train)
print(f"original prediction: {np.argmax(original_pred)}\n{original_pred}")
In [12]:
ans_label = np.argmin(original_pred)
print(f"ans_label: {ans_label}")
In [13]:
y_true = np.zeros(shape=(10,))
y_true[ans_label] = 1
In [14]:
preds=[]
for _ in range(100):
y_pred = model.forward(x_train[0])
model.backprop(y_true, y_pred)
pred = np.argmax(y_pred)
model.updates(1)
preds.append([y_pred[ans_label], pred])
if ans_label==pred:
break
In [15]:
prob,label = np.array(preds).T
In [16]:
counts = np.arange(len(prob))
plt.plot(counts, prob, color="black", alpha=0.3)
for l in np.unique(label):
ix = np.where(label == l)
plt.scatter(counts[ix], prob[ix], color=cm.hsv(float(l) / 10), label=l)
plt.legend(), plt.grid()
plt.show()
In [17]:
y_pred
Out[17]:
In [ ]: