API Documentation

When you design a neural network each layer consists of no of nurons and single nuron is made up of linear->activation functions. Following functions are atomic independent functions to create a neural net.

  1. Linear Layer

    nn.linear(W, X, b)

    This function will apply linear equation

    \[Z = W . X + b\]
    Parameters:
    • W – weight matrix of layer l

    • X – input or output of previous layer of neural net X or A[l-1]

    • b – bias vector for all the nodes

    Examples::
    >>> W = np.array([[.5, .8],[1, .4]])
    >>> b = np.ones((2,1))
    >>> X = np.array([[1],[2]])
    >>> Z = linear(W, X, b)
    >>> Z.shape
    (2,1)
    
  2. Linear Backward

    nn.linear_backward(dZ, cache, lambd=0.0)

    This function will compute backward pass for linear layer, meaning calculating gradients of W, X, and b with respect to Z and multiplying it with derivative of Z for chain rule.

    derivative of W will be its corresponding value in X. for example if c = a*b then der of a is b. then we multiply it with der of Z to apply the chain rule. And following is vectorized version.

    \[dW = \frac{1}{m}.dZ.X^T + \frac{\lambda}{m}*W\]

    Now derivative of sum 1 so local gradient will be 1 so what ever the gradient of dZ, will be passed to derivative of b. if the we have only 1 training semple then the shape of dZ will be (l,1) and if there are m training examples then (l, m). for second we need to sum all the dz for first node that is first row, and same for all that’s why in implementation you will see the sum on axis 1

    \[db = \frac{1}{m}.1*dZ\]

    And now you calculate derivative of input that is X which can be output of previous layer.

    \[dX = W^T.dZ\]
    Parameters:
    • dZ – Gradient matrix of Z, that you got from activation backward

    • cache – tuple of matrix (W,X)

    • lambd – L2 regularization penalty value. default is 0.0

  3. Activation

    nn.activation(Z, a_name='relu')

    This function will apply activation on given input. It supports ReLU, Sigmoid and Softmax.

    Parameters:
    • Z – output matrix of linear function.

    • a_name – activation name that you want to apply

    • Relu

    \[ \begin{align}\begin{aligned}Z = W.X + b\\A = g(Z)\end{aligned}\end{align} \]

    Here matrix Z is output of linear function and is input for activation function g().

    • ReLU

    \[A = max(0,Z)\]
    • Sigmoid

    \[A = \frac{1}{1+e^{-Z}}\]
  4. Softmax Activation

    nn.Softmax(Z)

    Softmax activation function

    Z : of shape (C, 1) where C is no of classes

    Return type:

    A of shape (C, 1) but the values will be probability and its sum will be 1

    Let say \(Z\) is matrix of shape \((C, m)\) where C is no of classes and m is no of training samples. And \(z\) is a vector of values \([z_1, z_2,...z_c]\) then the softmax output of vector z is following equation.

    \[A(z_i) = \frac{e^{z_i}}{\sum_{i=1}^{C}e^{z_i}}\]
  5. Neuron / Layerl

    nn.neuron(W, X, b, a_name='relu', drpout=False, keep_prob=0.5)

    This function can work as a single neuron or a single layer which computes linear->activation function with the help of previously defined atomic function. This takes weight & bias metrix, output of previous layer or input in case of first layer, activation, dropout, and keep_prob. You can directly use this function as a layer where you give input and you get output.

  6. Initialize Neural Network Parameters

    nn.initialize_parameters(layer_dims)

    Takes layer_dims : eg [nx, layer_1, 2, 3]

    Let \(L\) be a no of layers and \(\text{layer_dims} = [n_0, n_1, ..., n_L]\) where \(n_l\) denotes no of neurons in \(l^{th}\) layer. then the \(W\) and \(b\) are weight and bias matrix, where \(W^{[l]} \in R^{n_l \times n_{l-1}}\), \(b^{[l]} \in R^{n_l \times 1}\).

    And the \(parameters = {W^{[1]}, b^{[1]}, ..., W^{[L]}, b^{[L]}}\)

    Examples::
    >>> nx = 3
    >>> layer_dims = [nx, 4, 4, 1]
    >>> parameters = initialize_parameters(layer_dims)
    >>> len(parameters)
    6
    >>> parameters["W1"].shape
    (4, 3)
    >>> parameters["W2"].shape
    (4, 4)
    >>> parameters["W3"].shape
    (1, 4)
    >>> parameters["b1"].shape
    (4, 1)