卷积神经网络python实现

以下实现参考吴恩达的作业。

一、 padding

def zero_pad(X, pad):
    """
    Pad with zeros all images of the dataset X. The padding is applied to the height and width of an image, 
    as illustrated in Figure 1.
    
    Argument:
    X -- python numpy array of shape (m, n_H, n_W, n_C) representing a batch of m images
    pad -- integer, amount of padding around each image on vertical and horizontal dimensions
    
    Returns:
    X_pad -- padded image of shape (m, n_H + 2*pad, n_W + 2*pad, n_C)
    """
    

    X_pad = np.pad(X, ((0,0),(pad,pad),(pad,pad),(0,0)), 'constant', constant_values=(0,0))

    
    return X_pad

  从zero_pad的函数中,我们可以看出,我们只需要对原图片矩阵进行padding操作,而m是图片的个数,n_C则是channel的个数,这两个维度并不需要我们做任何操作。

二、 卷积计算

def conv_single_step(a_slice_prev, W, b):
    
    s = a_slice_prev * W
  
    Z = np.sum(s)
 
    Z = Z + float(b)

    return Z

卷积计算的过程中,a_slice_prev是我们在图片矩阵中的窗口,而W是filter的参数。随后我们对求得的结果进行求和,然后加上常数b。

三、 卷积forward

 1 def conv_forward(A_prev, W, b, hparameters):
 2     """
 3     Implements the forward propagation for a convolution function
 4     
 5     Arguments:
 6     A_prev -- output activations of the previous layer, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
 7     W -- Weights, numpy array of shape (f, f, n_C_prev, n_C)
 8     b -- Biases, numpy array of shape (1, 1, 1, n_C)
 9     hparameters -- python dictionary containing "stride" and "pad"
10         
11     Returns:
12     Z -- conv output, numpy array of shape (m, n_H, n_W, n_C)
13     cache -- cache of values needed for the conv_backward() function
14     """
15     
16     ### START CODE HERE ###
17     # Retrieve dimensions from A_prev's shape (≈1 line)  
18     (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
19     
20     # Retrieve dimensions from W's shape (≈1 line)
21     (f, f, n_C_prev, n_C) = W.shape
22     
23     # Retrieve information from "hparameters" (≈2 lines)
24     stride = hparameters['stride']
25     pad = hparameters['pad']
26     
27     # Compute the dimensions of the CONV output volume using the formula given above. Hint: use int() to floor. (≈2 lines)
28     n_H = int((n_H_prev + 2 * pad - f) / stride + 1)
29     n_W = int((n_W_prev + 2 * pad - f) / stride + 1)
30 
31     # Initialize the output volume Z with zeros. (≈1 line)
32     Z = np.zeros((m, n_H, n_W, n_C))
33     
34     # Create A_prev_pad by padding A_prev
35     A_prev_pad = zero_pad(A_prev, pad)
36     
37     for i in range(m):                               # loop over the batch of training examples
38         a_prev_pad = A_prev_pad[i]                               # Select ith training example's padded activation
39         for h in range(n_H):                           # loop over vertical axis of the output volume
40             for w in range(n_W):                       # loop over horizontal axis of the output volume
41                 for c in range(n_C):                   # loop over channels (= #filters) of the output volume
42                     
43                     # Find the corners of the current "slice" (≈4 lines)
44                     vert_start = h * stride
45                     vert_end = h * stride + f
46                     horiz_start = w * stride
47                     horiz_end = w * stride + f
48                     
49                     # Use the corners to define the (3D) slice of a_prev_pad (See Hint above the cell). (≈1 line)
50                     a_slice_prev = a_prev_pad[vert_start : vert_end, horiz_start : horiz_end]
51                     
52                     # Convolve the (3D) slice with the correct filter W and bias b, to get back one output neuron. (≈1 line)
53                     Z[i, h, w, c] = conv_single_step(a_slice_prev,W[:,:,:,c],b[:,:,:,c])
54                                         
55     ### END CODE HERE ###
56     
57     # Making sure your output shape is correct
58     assert(Z.shape == (m, n_H, n_W, n_C))
59     
60     # Save information in "cache" for the backprop
61     cache = (A_prev, W, b, hparameters)
62     
63     return Z, cache

参数中包含我们的图片A_prev,W,b以及超参数padding和strides。我们首先通过元组的方式获取了所有形状参数。根据形状对输出结果初始化。随后我们便可以对每一个图片中的每一个窗口进行遍历。通过f窗口长度的加法计算,我们得到窗口的横纵坐标位置。随后通过卷积计算得到最终结果。注意这里的参数适用于图中的每一个窗口。

四、 池化层

def pool_forward(A_prev, hparameters, mode = "max"):
    """
    Implements the forward pass of the pooling layer
    
    Arguments:
    A_prev -- Input data, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
    hparameters -- python dictionary containing "f" and "stride"
    mode -- the pooling mode you would like to use, defined as a string ("max" or "average")
    
    Returns:
    A -- output of the pool layer, a numpy array of shape (m, n_H, n_W, n_C)
    cache -- cache used in the backward pass of the pooling layer, contains the input and hparameters 
    """
    
    # Retrieve dimensions from the input shape
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
    
    # Retrieve hyperparameters from "hparameters"
    f = hparameters["f"]
    stride = hparameters["stride"]
    
    # Define the dimensions of the output
    n_H = int(1 + (n_H_prev - f) / stride)
    n_W = int(1 + (n_W_prev - f) / stride)
    n_C = n_C_prev
    
    # Initialize output matrix A
    A = np.zeros((m, n_H, n_W, n_C))              
    
    ### START CODE HERE ###
    for i in range(m):                         # loop over the training examples
        for h in range(n_H):                     # loop on the vertical axis of the output volume
            for w in range(n_W):                 # loop on the horizontal axis of the output volume
                for c in range (n_C):            # loop over the channels of the output volume
                    
                    # Find the corners of the current "slice" (≈4 lines)
                    vert_start = h * stride
                    vert_end = vert_start + f
                    horiz_start = w * stride
                    horiz_end = horiz_start + f
                    
                    # Use the corners to define the current slice on the ith training example of A_prev, channel c. (≈1 line)
                    a_prev_slice = A_prev[i, vert_start : vert_end, horiz_start : horiz_end, c]
                    
                    # Compute the pooling operation on the slice. Use an if statment to differentiate the modes. Use np.max/np.mean.
                    if mode == "max":
                        A[i, h, w, c] = np.max(a_prev_slice)
                    elif mode == "average":
                        A[i, h, w, c] = np.mean(a_prev_slice)
    
    ### END CODE HERE ###
    
    # Store the input and hparameters in "cache" for pool_backward()
    cache = (A_prev, hparameters)
    
    # Making sure your output shape is correct
    assert(A.shape == (m, n_H, n_W, n_C))
    
    return A, cache

池化层的计算和之前的卷积层大同小异;我们需要注意的就是这里的参数中存在mode,其中包括max和average两种模式。

五、 卷积层backward

def conv_backward(dZ, cache):
    """
    Implement the backward propagation for a convolution function
    
    Arguments:
    dZ -- gradient of the cost with respect to the output of the conv layer (Z), numpy array of shape (m, n_H, n_W, n_C)
    cache -- cache of values needed for the conv_backward(), output of conv_forward()
    
    Returns:
    dA_prev -- gradient of the cost with respect to the input of the conv layer (A_prev),
               numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
    dW -- gradient of the cost with respect to the weights of the conv layer (W)
          numpy array of shape (f, f, n_C_prev, n_C)
    db -- gradient of the cost with respect to the biases of the conv layer (b)
          numpy array of shape (1, 1, 1, n_C)
    """
    
    ### START CODE HERE ###
    # Retrieve information from "cache"
    (A_prev, W, b, hparameters) = cache
    
    # Retrieve dimensions from A_prev's shape
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
    
    # Retrieve dimensions from W's shape
    (f, f, n_C_prev, n_C) = W.shape
    
    # Retrieve information from "hparameters"
    stride = hparameters['stride']
    pad = hparameters['pad']
    
    # Retrieve dimensions from dZ's shape
    (m, n_H, n_W, n_C) = dZ.shape
    
    # Initialize dA_prev, dW, db with the correct shapes
    dA_prev = np.zeros(A_prev.shape)                           
    dW = np.zeros(W.shape)
    db = np.zeros(b.shape)

    # Pad A_prev and dA_prev
    A_prev_pad = zero_pad(A_prev, pad)
    dA_prev_pad = zero_pad(dA_prev, pad)
    
    for i in range(m):                       # loop over the training examples
        
        # select ith training example from A_prev_pad and dA_prev_pad
        a_prev_pad = A_prev_pad[i]
        da_prev_pad = dA_prev_pad[i]
        
        for h in range(n_H):                   # loop over vertical axis of the output volume
            for w in range(n_W):               # loop over horizontal axis of the output volume
                for c in range(n_C):           # loop over the channels of the output volume
                    
                    # Find the corners of the current "slice"
                    vert_start = h * stride
                    vert_end = h * stride + f
                    horiz_start = w * stride
                    horiz_end = w * stride + f
                    
                    # Use the corners to define the slice from a_prev_pad
                    a_slice = a_prev_pad[vert_start : vert_end, horiz_start : horiz_end, : ]

                    # Update gradients for the window and the filter's parameters using the code formulas given above
                    da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[ i, h, w ,c]

                    dW[:,:,:,c] += a_slice * dZ[ i, h, w ,c]
                    db[:,:,:,c] += dZ[ i, h, w ,c]
                    
        # Set the ith training example's dA_prev to the unpaded da_prev_pad (Hint: use X[pad:-pad, pad:-pad, :])
        dA_prev[i, :, :, :] = da_prev_pad[pad:-pad, pad:-pad, :]
    ### END CODE HERE ###
    
    # Making sure your output shape is correct
    assert(dA_prev.shape == (m, n_H_prev, n_W_prev, n_C_prev))
    
    return dA_prev, dW, db

这里对于dW,db的计算与BP神经网络的计算相似。在更新参数时,我们对整个图片所有位置进行遍历,进行一次计算。

六、池化层backward

我们了解池化层的原理之后,就需要根据其特征构造backward,对于max池,我们需要创建一个mask来获得我们的有效窗口。

def create_mask_from_window(x):
    """
    Creates a mask from an input matrix x, to identify the max entry of x.
    
    Arguments:
    x -- Array of shape (f, f)
    
    Returns:
    mask -- Array of the same shape as window, contains a True at the position corresponding to the max entry of x.
    """
    
    ### START CODE HERE ### (≈1 line)
    mask = (x == np.max(x))
    ### END CODE HERE ###
    
    return mask

对于average我们需要分配到窗口中的每个值。

def distribute_value(dz, shape):
    """
    Distributes the input value in the matrix of dimension shape
    
    Arguments:
    dz -- input scalar
    shape -- the shape (n_H, n_W) of the output matrix for which we want to distribute the value of dz
    
    Returns:
    a -- Array of size (n_H, n_W) for which we distributed the value of dz
    """
    
    ### START CODE HERE ###
    # Retrieve dimensions from shape (≈1 line)
    (n_H, n_W) = shape
    
    # Compute the value to distribute on the matrix (≈1 line)
    average = n_H * n_W
    
    # Create a matrix where every entry is the "average" value (≈1 line)
    a = dz / average * np.ones((n_H, n_W))
    ### END CODE HERE ###
    
    return a

之后我们便可以通过和卷积层backward相同的方法,对图片进行遍历,我们将每一次得到的有效输出dZ进行累加得到这一层的dZ。

def pool_backward(dA, cache, mode = "max"):
    """
    Implements the backward pass of the pooling layer
    
    Arguments:
    dA -- gradient of cost with respect to the output of the pooling layer, same shape as A
    cache -- cache output from the forward pass of the pooling layer, contains the layer's input and hparameters 
    mode -- the pooling mode you would like to use, defined as a string ("max" or "average")
    
    Returns:
    dA_prev -- gradient of cost with respect to the input of the pooling layer, same shape as A_prev
    """
    
    ### START CODE HERE ###
    
    # Retrieve information from cache (≈1 line)
    (A_prev, hparameters) = cache
    
    # Retrieve hyperparameters from "hparameters" (≈2 lines)
    stride = hparameters['stride']
    f = hparameters['f']
    
    # Retrieve dimensions from A_prev's shape and dA's shape (≈2 lines)
    m, n_H_prev, n_W_prev, n_C_prev = A_prev.shape
    m, n_H, n_W, n_C = dA.shape
    
    # Initialize dA_prev with zeros (≈1 line)
    dA_prev = np.zeros(A_prev.shape)
    
    for i in range(m):                       # loop over the training examples
        
        # select training example from A_prev (≈1 line)
        a_prev = A_prev[i]
        
        for h in range(n_H):                   # loop on the vertical axis
            for w in range(n_W):               # loop on the horizontal axis
                for c in range(n_C):           # loop over the channels (depth)
                    
                    # Find the corners of the current "slice" (≈4 lines)
                    vert_start = h * stride
                    vert_end = vert_start + f
                    horiz_start = w * stride
                    horiz_end = horiz_start + f
                    
                    # Compute the backward propagation in both modes.
                    if mode == "max":
                        
                        # Use the corners and "c" to define the current slice from a_prev (≈1 line)
                        a_prev_slice = a_prev[vert_start : vert_end, horiz_start : horiz_end, c]
                        # Create the mask from a_prev_slice (≈1 line)
                        mask = create_mask_from_window(a_prev_slice)
                        # Set dA_prev to be dA_prev + (the mask multiplied by the correct entry of dA) (≈1 line)
                        dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += mask * dA[i, h, w, c]
                        
                    elif mode == "average":
                        
                        # Get the value a from dA (≈1 line)
                        da = dA[i, h, w, c]
                        # Define the shape of the filter as fxf (≈1 line)
                        shape = (f, f)
                        # Distribute it to get the correct slice of dA_prev. i.e. Add the distributed value of da. (≈1 line)
                        dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += distribute_value(da, shape)
                        
    ### END CODE ###
    
    # Making sure your output shape is correct
    assert(dA_prev.shape == A_prev.shape)
    
    return dA_prev