Chiustin: How to implement a neural network

這可能會給你帶來驚喜：神經網絡並不複雜！術語“神經網絡”被廣泛用作流行語，但實際上它們通常比人們想像的要簡單得多。

本文僅供初學者使用，並假設ZERO具有機器學習的先驗知識。我們將理解神經網絡如何在Python中從頭開始實現。

讓我們開始吧！

1. Building Blocks: Neurons
首先，我們要討論神經元，神經網絡的基本單元。神經元接收輸入，對它們進行一些數學計算，並產生一個輸出。這是2個輸入神經元的長相：

這裡發生了3件事。首先，每個輸入乘以一個權重：

x_{1} \to x_{1} * w_{1}

x_2 \rightarrow x_2 * w_2

接下來，所有被加權的輸入與偏差相加：

Next, all the weighted inputs are added together with a bias:

(x1∗w1)+(x2∗w2)+b

最後，這些總和帶入激活函數：

y=f(x1∗w1+x2∗w2+b)

激活函數用於將無邊界輸入轉換為具有良好，可預測形式的輸出。這個例子選擇sigmoid函數作為激活函數：

sigmoid函數輸出範圍 (0, 1) 。您可以將其視為壓縮(−∞,+∞) to (0, 1) - 大的負數變為 ~0，大的正數變為 ~1 。

A Simple Example
假設我們有一個2輸入神經元，它使用sigmoid激活函數並具有以下參數：

w = [0, 1]

b = 4

w = [0,1]只是w1 = 0，w2 = 1的一種向量方式。現在，讓我們給神經元輸入x = [2,3] 。我們將使用點積來更簡潔地寫出：

給定輸入x = [2,3]，神經元輸出0.999。將輸入向前傳遞以獲得輸出的過程稱為前饋(feedforward)。

Coding a Neuron
是時候執行一個神經元了！我們將使用一種流行且功能強大的Python計算函式庫NumPy來幫助我們進行數學運算：

import numpy as np

def sigmoid(x):
  # Our activation function: f(x) = 1 / (1 + e^(-x))
  return 1 / (1 + np.exp(-x))

class Neuron:
  def __init__(self, weights, bias):
    self.weights = weights
    self.bias = bias

  def feedforward(self, inputs):
    # Weight inputs, add bias, then use the activation function
    total = np.dot(self.weights, inputs) + self.bias
    return sigmoid(total)

weights = np.array([0, 1]) # w1 = 0, w2 = 1
bias = 4                   # b = 4
n = Neuron(weights, bias)

x = np.array([2, 3])       # x1 = 2, x2 = 3
print(n.feedforward(x))    # 0.9990889488055994

認得到這些數字？這就是我們剛剛做的例子！我們得到相同的答案0.999。

2. Combining Neurons into a Neural Network
神經網絡只不過是連接在一起的一堆神經元。這是一個簡單的神經網絡的長相：

該網絡有2個輸入，一個隱藏層有2個神經元（h1和h2），輸出層有1個神經元（o1）。請注意，o1的輸入是來自h1和h2的輸出 - 這就是使其成為網絡的原因。

隱藏層是輸入(第一)層和輸出(最後)層之間的任何層。可以有多個隱藏層！

An Example: Feedforward
讓我們使用上面描繪的網絡並假設所有神經元具有相同的權重w [0,1]，相同的偏差b = 0，以及相同的S形激活函數。設h1，h2，o1表示它們代表的神經元的輸出。如果我們傳入輸入x = [2,3]會發生什麼？

輸入x = [2,3]的神經網絡的輸出是0.7216。很簡單吧？神經網絡可以具有任意數量的層，這些層中具有任意數量的神經元。基本思想保持不變：通過網絡中的神經元向前饋送輸入以獲得最後的輸出。為簡單起見，我們將繼續使用上圖所示的網絡來完成本文的其餘部分。

Coding a Neural Network: Feedforward

讓我們為神經網絡執行前饋。這是網絡的圖像再次供參考：

import numpy as np

# ... code from previous section here

class OurNeuralNetwork:
  '''
  A neural network with:
    - 2 inputs
    - a hidden layer with 2 neurons (h1, h2)
    - an output layer with 1 neuron (o1)
  Each neuron has the same weights and bias:
    - w = [0, 1]
    - b = 0
  '''
  def __init__(self):
    weights = np.array([0, 1])
    bias = 0

    # The Neuron class here is from the previous section
    self.h1 = Neuron(weights, bias)
    self.h2 = Neuron(weights, bias)
    self.o1 = Neuron(weights, bias)

  def feedforward(self, x):
    out_h1 = self.h1.feedforward(x)
    out_h2 = self.h2.feedforward(x)

    # The inputs for o1 are the outputs from h1 and h2
    out_o1 = self.o1.feedforward(np.array([out_h1, out_h2]))

    return out_o1

network = OurNeuralNetwork()
x = np.array([2, 3])
print(network.feedforward(x)) # 0.7216325609518421

又得到0.7216！

3. Training a Neural Network, Part 1
假設我們有以下測量值：

Name	Weight (lb)	Height (in)	Gender
Alice	133	65	F
Bob	160	72	M
Charlie	152	70	M
Diana	120	60	F

讓我們訓練我們的網絡，根據他們的體重和身高來預測某人的性別：

我們將使用0表示Male，使用1表示Female，我們還會將數據移位以使其更易於使用：

Name	Weight (minus 135)	Height (minus 66)	Gender
Alice	-2	-1	1
Bob	25	6	0
Charlie	17	4	0
Diana	-15	-6	1

我隨意選擇了移位金額（135和66）以使數字看起來不錯。通常情況下，你應該藉由平均值來做移位。

Loss
在我們訓練網絡之前，我們首先需要一種方法來量化它的“好”程度，以便它可以嘗試“更好”。這就是損失。

我們將使用均方誤差（mean squared error, MSE）損失：

MSE=n1i=1∑n(ytrue−ypred)2

讓我們將這公式拆解：

（i）n是樣本數，即4（Alice，Bob，Charlie，Diana）。
（ii）y代表預測的變量，即性別。
（iii）ytrue是變量的真值（“正確答案”）。例如，Alice的ytrue為11（女性）。
（iv）ypred是變量的預測值。這是我們的網絡輸出。

(ytrue-ypred)^2被稱為平方誤差。我們的損失函數只是取所有平方誤差的平均值(因此名稱均方誤差)。我們的預測越好，我們的損失就越低！

更好的預測=更低的損失。

訓練網絡=盡量減少損失。

An Example Loss Calculation
假設我們的網絡總是輸出0 - 換句話說，它確信所有人都是男性🤔。我們的損失是什麼？

Name	$y_{true}$	$(y_{true} - y_{pred})^2$
Alice	1	1
Bob	0	0
Charlie	0	0
Diana	1	1

MSE = \frac{1}{4} (1 + 0 + 0 + 1) = 0.5

Code: MSE Loss
以下是為我們計算損失的一些代碼：

import numpy as np

def mse_loss(y_true, y_pred):
  # y_true and y_pred are numpy arrays of the same length.
  return ((y_true - y_pred) ** 2).mean()

y_true = np.array([1, 0, 0, 1])
y_pred = np.array([0, 0, 0, 0])

print(mse_loss(y_true, y_pred)) # 0.5

4. Training a Neural Network, Part 2
我們現在有一個明確的目標：盡量減少神經網絡的損失。我們知道我們可以改變網絡的權重和偏差以影響其預測，但我們如何以減少損失呢？

本節使用了一些多變量微積分。如果您對微積分不滿意，請隨意跳過數學部分。

為簡單起見，讓我們假裝我們的數據集中只有Alice：

Name	Weight (minus 135)	Height (minus 66)	Gender
Alice	-2	-1	1

那麼均方誤差損失只是Alice的平方誤差：

考慮損失的另一種方式是權重和偏差。讓我們在網絡中標出每個權重和偏差：

然後，我們可以將損失寫為多變量函數：

L(w1,w2,w3,w4,w5,w6,b1,b2,b3)

想像一下，我們想調整w1。如果改變w1，L會如何改變損失？這是偏導數

\frac{\partial L}{\partial w _{1}}

可以回答的問題。我們如何計算呢？

這是數學開始變得更複雜的地方。不要氣餒！我建議讓筆和紙一起跟進 - 它會幫助你理解。

首先我們重寫偏導數

前面定義過L

代入可以得到

就像前面一樣，讓h1，h2，o1成為它們所代表的神經元的輸出。然後

f是sigmoid激活函數。w1只影響h1而不是h2，我們可以寫成

x1是體重，x2是身高，對sigmoid做偏微分導數可以得到

因此，最後我們可以將

分解成

這種通過逆向計算偏導數的系統稱為反向傳播(backpropagation, backprop)。

Example: Calculating the Partial Derivative

我們將繼續假裝只有Alice在我們的數據集中：

Name	Weight (minus 135)	Height (minus 66)	Gender
Alice	-2	-1	1

讓我們將所有權重初始化為1，將所有偏差初始化為0.如果我們通過網絡進行前饋傳遞，我們得到：

網絡輸出ypred = 0.524，表示男性(0)或女性(1)不符合我們的預期。讓我們用反向傳播計算：

這告訴我們，如果我們要增加w1，L會增加一點點。

Training: Stochastic Gradient Descent

我們現在擁有訓練神經網絡所需的所有工具！我們將使用一種稱為隨機梯度下降（SGD）的優化算法，該演算法告訴我們如何改變我們的權重和偏差以最小化損失。它基本上就是更新這個方程式：

是一個常數，稱為學習率(learning rate)，控制我們訓練的速度。

假如 $\frac{\partial L}{\partial w_1}$ 為正，w1 將會減少使L下降.
假如 $\frac{\partial L}{\partial w_1}$ 為負，w1 將會增加使L下降.

如果我們針對網絡中的每個重量和偏差做到這一點，那麼損失將逐漸減少，我們的網絡將會改善。

我們的訓練流程如下：

1. 從我們的數據集中選擇一個樣本。這就是隨機梯度下降的原因 - 我們一次只對一個樣本進行操作。

2. 根據權重或偏差計算損失的所有偏導數

3. 使用更新公式更新每個權重和偏差。

4. 回到第1步。

Code: A Complete Neural Network

現在開始執行一個完整的神經網絡：

Name	Weight (minus 135)	Height (minus 66)	Gender
Alice	-2	-1	1
Bob	25	6	0
Charlie	17	4	0
Diana	-15	-6	1

import numpy as np

def sigmoid(x):
  # Sigmoid activation function: f(x) = 1 / (1 + e^(-x))
  return 1 / (1 + np.exp(-x))

def deriv_sigmoid(x):
  # Derivative of sigmoid: f'(x) = f(x) * (1 - f(x))
  fx = sigmoid(x)
  return fx * (1 - fx)

def mse_loss(y_true, y_pred):
  # y_true and y_pred are numpy arrays of the same length.
  return ((y_true - y_pred) ** 2).mean()

class OurNeuralNetwork:
  '''
  A neural network with:
    - 2 inputs
    - a hidden layer with 2 neurons (h1, h2)
    - an output layer with 1 neuron (o1)

  *** DISCLAIMER ***:
  The code below is intended to be simple and educational, NOT optimal.
  Real neural net code looks nothing like this. DO NOT use this code.
  Instead, read/run it to understand how this specific network works.
  '''
  def __init__(self):
    # Weights
    self.w1 = np.random.normal()
    self.w2 = np.random.normal()
    self.w3 = np.random.normal()
    self.w4 = np.random.normal()
    self.w5 = np.random.normal()
    self.w6 = np.random.normal()

    # Biases
    self.b1 = np.random.normal()
    self.b2 = np.random.normal()
    self.b3 = np.random.normal()

  def feedforward(self, x):
    # x is a numpy array with 2 elements.
    h1 = sigmoid(self.w1 * x[0] + self.w2 * x[1] + self.b1)
    h2 = sigmoid(self.w3 * x[0] + self.w4 * x[1] + self.b2)
    o1 = sigmoid(self.w5 * h1 + self.w6 * h2 + self.b3)
    return o1

  def train(self, data, all_y_trues):
    '''
    - data is a (n x 2) numpy array, n = # of samples in the dataset.
    - all_y_trues is a numpy array with n elements.
      Elements in all_y_trues correspond to those in data.
    '''
    learn_rate = 0.1
    epochs = 1000 # number of times to loop through the entire dataset

    for epoch in range(epochs):
      for x, y_true in zip(data, all_y_trues):
        # --- Do a feedforward (we'll need these values later)
        sum_h1 = self.w1 * x[0] + self.w2 * x[1] + self.b1
        h1 = sigmoid(sum_h1)

        sum_h2 = self.w3 * x[0] + self.w4 * x[1] + self.b2
        h2 = sigmoid(sum_h2)

        sum_o1 = self.w5 * h1 + self.w6 * h2 + self.b3
        o1 = sigmoid(sum_o1)
        y_pred = o1

        # --- Calculate partial derivatives.
        # --- Naming: d_L_d_w1 represents "partial L / partial w1"
        d_L_d_ypred = -2 * (y_true - y_pred)

        # Neuron o1
        d_ypred_d_w5 = h1 * deriv_sigmoid(sum_o1)
        d_ypred_d_w6 = h2 * deriv_sigmoid(sum_o1)
        d_ypred_d_b3 = deriv_sigmoid(sum_o1)

        d_ypred_d_h1 = self.w5 * deriv_sigmoid(sum_o1)
        d_ypred_d_h2 = self.w6 * deriv_sigmoid(sum_o1)

        # Neuron h1
        d_h1_d_w1 = x[0] * deriv_sigmoid(sum_h1)
        d_h1_d_w2 = x[1] * deriv_sigmoid(sum_h1)
        d_h1_d_b1 = deriv_sigmoid(sum_h1)

        # Neuron h2
        d_h2_d_w3 = x[0] * deriv_sigmoid(sum_h2)
        d_h2_d_w4 = x[1] * deriv_sigmoid(sum_h2)
        d_h2_d_b2 = deriv_sigmoid(sum_h2)

        # --- Update weights and biases
        # Neuron h1
        self.w1 -= learn_rate * d_L_d_ypred * d_ypred_d_h1 * d_h1_d_w1
        self.w2 -= learn_rate * d_L_d_ypred * d_ypred_d_h1 * d_h1_d_w2
        self.b1 -= learn_rate * d_L_d_ypred * d_ypred_d_h1 * d_h1_d_b1

        # Neuron h2
        self.w3 -= learn_rate * d_L_d_ypred * d_ypred_d_h2 * d_h2_d_w3
        self.w4 -= learn_rate * d_L_d_ypred * d_ypred_d_h2 * d_h2_d_w4
        self.b2 -= learn_rate * d_L_d_ypred * d_ypred_d_h2 * d_h2_d_b2

        # Neuron o1
        self.w5 -= learn_rate * d_L_d_ypred * d_ypred_d_w5
        self.w6 -= learn_rate * d_L_d_ypred * d_ypred_d_w6
        self.b3 -= learn_rate * d_L_d_ypred * d_ypred_d_b3

      # --- Calculate total loss at the end of each epoch
      if epoch % 10 == 0:
        y_preds = np.apply_along_axis(self.feedforward, 1, data)
        loss = mse_loss(all_y_trues, y_preds)
        print("Epoch %d loss: %.3f" % (epoch, loss))

# Define dataset
data = np.array([
  [-2, -1],  # Alice
  [25, 6],   # Bob
  [17, 4],   # Charlie
  [-15, -6], # Diana
])
all_y_trues = np.array([
  1, # Alice
  0, # Bob
  0, # Charlie
  1, # Diana
])

# Train our neural network!
network = OurNeuralNetwork()
network.train(data, all_y_trues)

隨著網絡的學習，我們的損失穩定下降：

我們現在可以使用網絡來預測性別：

# Make some predictions
emily = np.array([-7, -3]) # 128 pounds, 63 inches
frank = np.array([20, 2])  # 155 pounds, 68 inches
print("Emily: %.3f" % network.feedforward(emily)) # 0.951 - F
print("Frank: %.3f" % network.feedforward(frank)) # 0.039 - M

快速回顧一下我們做了什麼：

1. 引入神經元，神經網絡的構建塊。
2. 在我們的神經元中使用了sigmoid激活功能。
3. 看到神經網絡只是連接在一起的神經元。
4. 創建一個數據集，其中Weight和Height作為輸入（或特徵），Gender作為輸出（或標籤）。
5. 學到損失函數和均方誤差（MSE）損失。
6. 了解到訓練網絡只是為了減少損失。
7. 使用反向傳播來計算偏導數。
8. 使用隨機梯度下降（SGD）來訓練我們的網絡。

參考
https://victorzhou.com/blog/intro-to-neural-networks/
https://peterroelants.github.io/posts/neural-network-implementation-part01/
https://ithelp.ithome.com.tw/articles/10198147
https://gadictos.com/neural-network-pt1/?fbclid=IwAR2nB0--RIceu9lBLFdTUFsofK7GMnG7onSjwuKbT1g2Ra6_cuZy3s43Me0

Chiustin

網頁

2019年7月27日星期六

How to implement a neural network - gradient descent

沒有留言:

張貼留言

總網頁瀏覽量

關於我自己

網頁

2019年7月27日 星期六

How to implement a neural network - gradient descent

沒有留言:

張貼留言

2019年7月27日星期六