API Documentation¶
When you design a neural network each layer consists of no of nurons and single nuron is made up of linear->activation functions. Following functions are atomic independent functions to create a neural net.
Linear Layer
- nn.linear(W, X, b)¶
This function will apply linear equation
\[Z = W . X + b\]- Parameters:
W – weight matrix of layer
lX – input or output of previous layer of neural net X or A[l-1]
b – bias vector for all the nodes
- Examples::
>>> W = np.array([[.5, .8],[1, .4]]) >>> b = np.ones((2,1)) >>> X = np.array([[1],[2]]) >>> Z = linear(W, X, b) >>> Z.shape (2,1)
Linear Backward
- nn.linear_backward(dZ, cache, lambd=0.0)¶
This function will compute backward pass for linear layer, meaning calculating gradients of W, X, and b with respect to Z and multiplying it with derivative of Z for chain rule.
derivative of
Wwill be its corresponding value in X. for example if c = a*b then der of a is b. then we multiply it with der of Z to apply the chain rule. And following is vectorized version.\[dW = \frac{1}{m}.dZ.X^T + \frac{\lambda}{m}*W\]Now derivative of sum 1 so local gradient will be 1 so what ever the gradient of dZ, will be passed to derivative of
b. if the we have only 1 training semple then the shape of dZ will be (l,1) and if there are m training examples then (l, m). for second we need to sum all the dz for first node that is first row, and same for all that’s why in implementation you will see the sum on axis 1\[db = \frac{1}{m}.1*dZ\]And now you calculate derivative of input that is
Xwhich can be output of previous layer.\[dX = W^T.dZ\]- Parameters:
dZ – Gradient matrix of
Z, that you got from activation backwardcache – tuple of matrix
(W,X)lambd – L2 regularization penalty value. default is 0.0
Activation
- nn.activation(Z, a_name='relu')¶
This function will apply activation on given input. It supports
ReLU,SigmoidandSoftmax.- Parameters:
Z – output matrix of linear function.
a_name – activation name that you want to apply
Relu
\[ \begin{align}\begin{aligned}Z = W.X + b\\A = g(Z)\end{aligned}\end{align} \]Here matrix
Zis output of linear function and is input for activation functiong().ReLU
\[A = max(0,Z)\]Sigmoid
\[A = \frac{1}{1+e^{-Z}}\]
Softmax Activation
- nn.Softmax(Z)¶
Softmax activation function
Z : of shape (C, 1) where C is no of classes
- Return type:
A of shape (C, 1) but the values will be probability and its sum will be 1
Let say \(Z\) is matrix of shape \((C, m)\) where
Cis no of classes andmis no of training samples. And \(z\) is a vector of values \([z_1, z_2,...z_c]\) then the softmax output of vectorzis following equation.\[A(z_i) = \frac{e^{z_i}}{\sum_{i=1}^{C}e^{z_i}}\]
Neuron / Layerl
- nn.neuron(W, X, b, a_name='relu', drpout=False, keep_prob=0.5)¶
This function can work as a single neuron or a single layer which computes linear->activation function with the help of previously defined atomic function. This takes weight & bias metrix, output of previous layer or input in case of first layer, activation, dropout, and keep_prob. You can directly use this function as a layer where you give input and you get output.
Initialize Neural Network Parameters
- nn.initialize_parameters(layer_dims)¶
Takes layer_dims : eg [nx, layer_1, 2, 3]
Let \(L\) be a no of layers and \(\text{layer_dims} = [n_0, n_1, ..., n_L]\) where \(n_l\) denotes no of neurons in \(l^{th}\) layer. then the \(W\) and \(b\) are weight and bias matrix, where \(W^{[l]} \in R^{n_l \times n_{l-1}}\), \(b^{[l]} \in R^{n_l \times 1}\).
And the \(parameters = {W^{[1]}, b^{[1]}, ..., W^{[L]}, b^{[L]}}\)
- Examples::
>>> nx = 3 >>> layer_dims = [nx, 4, 4, 1] >>> parameters = initialize_parameters(layer_dims) >>> len(parameters) 6 >>> parameters["W1"].shape (4, 3) >>> parameters["W2"].shape (4, 4) >>> parameters["W3"].shape (1, 4) >>> parameters["b1"].shape (4, 1)