{ "cells": [ { "cell_type": "markdown", "id": "f1a05dc7", "metadata": {}, "source": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
\n", "

MTH786 Machine Learning with Python

\n", "
\n", " \n", "

Semester A, 2023/2024

\n", "
\n", "

Coursework 9

\n", "
\n", " \n", "

Dr Nicola Perra

\n", "
" ] }, { "cell_type": "code", "execution_count": null, "id": "49e97a81", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from numpy.testing import assert_array_almost_equal, assert_array_equal\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "id": "10b3ac2c", "metadata": {}, "source": [ "### Multinomial logistic classification\n", "We now move on to multinomial logistic regression for multi-class classfication problems. To this end we will use a synthetic dataset built for these types of taks. The dataset consits on s samples, 10 features and different categories\n", "\n", "\n", "We start by loading data. **Important:** please check that file $\\mathtt{dataset.csv}$ is located in the same folder with your Jupyter notebook." ] }, { "cell_type": "code", "execution_count": null, "id": "70a0d912", "metadata": {}, "outputs": [], "source": [ "complete_data = np.genfromtxt(\"dataset.csv\", skip_header = 1, delimiter = ',')\n", "labels = complete_data[:, -1].astype(int) - np.min(complete_data[:, -1].astype(int))\n", "inputs = complete_data[:, :-1]" ] }, { "cell_type": "markdown", "id": "ca482452", "metadata": {}, "source": [ "1. Implement function $\\mathtt{linear\\_regression\\_data}$ that computes (and outputs) the linear regression data matrix. The function should take the NumPy array _data_inputs_ as argument." ] }, { "cell_type": "code", "execution_count": null, "id": "69bb0b37", "metadata": {}, "outputs": [], "source": [ "def linear_regression_data(data_inputs):\n", " # YOUR CODE HERE\n", " raise NotImplementedError()" ] }, { "cell_type": "markdown", "id": "b61222da", "metadata": {}, "source": [ "2. Write two functions \n", "- $\\mathtt{standardise}$ to standardise the columns of a multi-dimensional array. The function $\\mathtt{standardise}$\ttakes the multi-dimensional array _data_matrix_ as its input argument. It subtracts the means from each column and divides by the standard deviations. It returns the _standardised_matrix_, the _row_of_means_ and the _row_of_standard_deviations_.\n", "- $\\mathtt{de\\_standardise}$ to de-standardise the columns of a multi-dimensional array. The function $\\mathtt{de\\_standardise}$ reverses the above operation. It takes a _standardised_matrix_, the _row_of_means_ and the _row_of_standard_deviations_ as its arguments and returns a matrix for which the standardisation process is reversed." ] }, { "cell_type": "code", "execution_count": null, "id": "e4ddbbe3", "metadata": {}, "outputs": [], "source": [ "def standardise(data_matrix):\n", " # YOUR CODE HERE\n", " raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "id": "27125f41", "metadata": {}, "outputs": [], "source": [ "def de_standardise(standardised_matrix, row_of_means, row_of_stds):\n", " # YOUR CODE HERE\n", " raise NotImplementedError()" ] }, { "cell_type": "markdown", "id": "5e766a4e", "metadata": {}, "source": [ "Now we standardise the input data and prepare data matrix for a multinomial logistic regression. " ] }, { "cell_type": "code", "execution_count": null, "id": "cdda6017", "metadata": {}, "outputs": [], "source": [ "data_inputs, row_of_means, row_of_stds = standardise(inputs)\n", "data_matrix = linear_regression_data(data_inputs)" ] }, { "cell_type": "markdown", "id": "4252f7a9", "metadata": {}, "source": [ "3. As a first step, implement the softmax function $\\mathtt{softmax\\_function}$ as defined in the lectures. The function takes the NumPy array _argument_ as its main argument, but also has an optional _axis_ argument to determine across which array-dimension you apply the softmax operation (axis = 0 for columns, axis = 1 for rows, ...). If this argument is not specified (or set to _None_), then the softmax operation is applied to the entire array. Make sure your function works at least for NumPy arrays _argument_ with arbitrary numerical values and dimension one or two." ] }, { "cell_type": "code", "execution_count": null, "id": "fdce2591", "metadata": {}, "outputs": [], "source": [ "def softmax_function(argument, axis=None): \n", " # YOUR CODE HERE\n", " raise NotImplementedError()" ] }, { "cell_type": "markdown", "id": "71213ee4", "metadata": {}, "source": [ "Test your $\\mathtt{softmax\\_function}$ function with the following cell. Passing the test is awarded with **1 mark**. " ] }, { "cell_type": "code", "execution_count": null, "id": "6711c0a7", "metadata": {}, "outputs": [], "source": [ "assert_array_almost_equal(softmax_function(np.array([[1.5], [0.3], [-3.7]])), \\\n", " np.array([[0.76528029], [0.23049799], [0.00422172]]))" ] }, { "cell_type": "code", "execution_count": null, "id": "a0bfa8a5", "metadata": {}, "outputs": [], "source": [ "assert_array_almost_equal(softmax_function(np.array([[1.5, 3], [0.3, -0.7], [-3.7, 2]]), axis=0), \\\n", " np.array([[0.76528029, 0.71807976], [0.23049799, 0.01775346], \\\n", " [0.00422172, 0.26416678]]))" ] }, { "cell_type": "code", "execution_count": null, "id": "5a23453a", "metadata": {}, "outputs": [], "source": [ "assert_array_almost_equal(softmax_function(np.array([[1.5, 3], [0.3, -0.7], [-3.7, 2]]), axis=1), \\\n", " np.array([[0.182426, 0.817574],\\\n", " [0.731059, 0.268941],\\\n", " [0.003335, 0.996665]]))" ] }, { "cell_type": "markdown", "id": "fb2ce94b", "metadata": {}, "source": [ "4. Implement a function $\\mathtt{model\\_function}$ that outputs the values of the linear model function. Unlike the case of binary classification, the $\\mathtt{model\\_function}$ should return a vector of dimension $K$ for each data sample. This vector is defined as\n", "$$\n", "f\\left(\\mathbf{x},\\mathbf{W}\\right) = \n", "\\left( \\left\\langle \\phi\\left(\\mathbf{x}\\right), \\mathbf{w}^{(1)}\\right\\rangle \\quad \\left\\langle \\phi\\left(\\mathbf{x}\\right), \\mathbf{w}^{(2)}\\right\\rangle \\quad \\ldots \\quad \\left\\langle \\phi\\left(\\mathbf{x}\\right), \\mathbf{w}^{(K)}\\right\\rangle\\right),\n", "$$\n", "where \n", "$$\n", "\\mathbf{W} = \n", "\\begin{pmatrix}\n", "\\vdots & \\vdots & \\ldots & \\vdots \\\\\n", "\\mathbf{w}^{(1)} & \\mathbf{w}^{(2)} & \\ldots & \\mathbf{w}^{(K)}\\\\\n", "\\vdots & \\vdots & \\ldots & \\vdots \\\\\n", "\\end{pmatrix}\n", "$$\n", "is a mathematical representation of the weights matrix.\n", "\n", "As in the binary classification case, the arguments of $\\mathtt{model\\_function}$ are the data matrix _data_matrix_ and weights that are now named _weights_matrix_." ] }, { "cell_type": "code", "execution_count": null, "id": "a86c8d0f", "metadata": {}, "outputs": [], "source": [ "def model_function(data_matrix, weight_matrix):\n", " # YOUR CODE HERE\n", " raise NotImplementedError()" ] }, { "cell_type": "markdown", "id": "a75408cf", "metadata": {}, "source": [ "Test your $\\mathtt{model\\_function}$ function with the following cell. Passing the test is awarded with **1 mark**. " ] }, { "cell_type": "code", "execution_count": null, "id": "ad3927ee", "metadata": {}, "outputs": [], "source": [ "test_data_matrix = np.array([[1,2,3],[1,4,5],[1,6,7]])\n", "test_weight_matrix = np.array([[-1,1],[0,1],[1,0]])\n", "assert_array_almost_equal(model_function(test_data_matrix, test_weight_matrix),np.array([[2,3],[4,5],[6,7]]))" ] }, { "cell_type": "markdown", "id": "4394f672", "metadata": {}, "source": [ "5. Write a function $\\mathtt{multinomial\\_prediction\\_function}$ that turns your model function into labels. The function takes the arguments _data_matrix_ and _weight_matrix_ as inputs and returns a vector of labels with values in $\\{0, K - 1 \\}$ as its output." ] }, { "cell_type": "code", "execution_count": null, "id": "439c4b9e", "metadata": {}, "outputs": [], "source": [ "def multinomial_prediction_function(data_matrix, weight_matrix): \n", " # YOUR CODE HERE\n", " raise NotImplementedError()" ] }, { "cell_type": "markdown", "id": "93f53902", "metadata": {}, "source": [ "Test your $\\mathtt{multinomial\\_prediction\\_function}$ function with the following cell." ] }, { "cell_type": "code", "execution_count": null, "id": "5966c24c", "metadata": {}, "outputs": [], "source": [ "test_data_matrix = np.array([[6, 4, 5], [1, 2, 8], [-3, 3, 6], [6, 5, -100], [5, 7, 2]])\n", "test_weight_matrix = np.array([[2, 1, -2, -4], [ 2, -5, 1, 4], [-2, -3, -1, -2]])\n", "assert_array_almost_equal(multinomial_prediction_function(test_data_matrix, test_weight_matrix),np.array([0,2,3,1,0]))" ] }, { "cell_type": "markdown", "id": "b7a7de52", "metadata": {}, "source": [ "6. Implement a function $\\mathtt{gradient\\_descent}$ that performs gradient descent to numerically approximate a minimiser of a convex function. The function should take the following arguments\n", "- *objective* - a lambda-function representing function $E$. This itself should take a NumPy array as its argument and return a real number.\n", "- *gradient* - a lambda-function representing function $\\nabla E$. This itself should take a NumPy array as its argument and return a NumPy array representation of the gradient $\\nabla E$.\n", "- *initial_ weights* - a NumPy array with initial values $\\mathbf{w}^{(0)}$ for the first iterate \n", "- *step_size* - a step-size parameter $\\tau$ for the gradient descent step\n", "- *no_of_iterations* - an integer parameter that controls the number of iterations\n", "- *print_output* - an integer parameter that controls how often you are printing an intermediate result. If say *print_output = 100*, then after every 100th iteration you are asked to print your current iterate and a value of the objective as *Iteration k/m, objective = o.*, where $k$ is a number of current iteration, $m$ is a total number of iterations, and $o$ is a value of the objective evaluated at current iterate.\n", "\n", "Implement the function so that it returns a NumPy array of the weights obtained after gradient descent together with a list of objective values for all iterates." ] }, { "cell_type": "code", "execution_count": null, "id": "55f86058", "metadata": {}, "outputs": [], "source": [ "def gradient_descent(objective,\n", " gradient,\n", " initial_weights,\n", " step_size=1,\n", " no_of_iterations=100,\n", " print_output=10):\n", " # YOUR CODE HERE\n", " raise NotImplementedError()" ] }, { "cell_type": "markdown", "id": "884d213a", "metadata": {}, "source": [ "Test your function with the following unit test" ] }, { "cell_type": "code", "execution_count": null, "id": "cd016e65", "metadata": {}, "outputs": [], "source": [ "test_matrix_m = np.array([[3, 1], [2, 4]])\n", "test_vector_v = np.array([5, 6])\n", "test_objective = lambda x: x.T @ (test_matrix_m @ x) + x @ test_vector_v\n", "test_gradient = lambda x: (test_matrix_m + test_matrix_m.T) @ x + test_vector_v\n", "test_initial_weights = np.array([0.0, 0.0])\n", "test_step_size = 0.9 / (np.linalg.norm(test_matrix_m + test_matrix_m.T))\n", "test_no_of_iterations = 100\n", "test_print_output = 10\n", "assert_array_almost_equal(gradient_descent(test_objective, test_gradient, \\\n", " test_initial_weights,test_step_size,\\\n", " test_no_of_iterations, test_print_output)[0],np.array([-0.564103, -0.538462]))" ] }, { "cell_type": "markdown", "id": "ce725032", "metadata": {}, "source": [ "7. Next, write a function $\\mathtt{one\\_hot\\_vector\\_encoding}$ that converts an NumPy array _labels_ with values in the range of $\\{0, K - 1\\}$ into so-called one-hot vector encodings. For example, for $K = 3$ and a label vector $\\text{labels} = \n", "\\begin{pmatrix} 2 & 0 & 1 & 2\\end{pmatrix}^\\top$, the output of $\\mathtt{one\\_hot\\_vector\\_encoding}$ should be a two-dimensional NumPy array of the form\n", "$$\n", "\\begin{pmatrix} 0 & 0 & 1 \\\\ 1 & 0 & 0 \\\\ 0 & 1 & 0 \\\\ 0 & 0 & 1 \n", "\\end{pmatrix}.\n", "$$" ] }, { "cell_type": "code", "execution_count": null, "id": "706144e7", "metadata": {}, "outputs": [], "source": [ "def one_hot_vector_encoding(labels): \n", " # YOUR CODE HERE\n", " raise NotImplementedError()" ] }, { "cell_type": "markdown", "id": "092da0fe", "metadata": {}, "source": [ "Test your $\\mathtt{one\\_hot\\_vector\\_encoding}$ function with the following cell" ] }, { "cell_type": "code", "execution_count": null, "id": "158f6038", "metadata": {}, "outputs": [], "source": [ "assert_array_almost_equal(one_hot_vector_encoding(np.array([1, 2, 0, 3])), \\\n", " np.array([[0,1,0,0],[0,0,1,0],[1,0,0,0],[0,0,0,1]]))" ] }, { "cell_type": "code", "execution_count": null, "id": "79b88762", "metadata": {}, "outputs": [], "source": [ "assert_array_almost_equal(one_hot_vector_encoding(np.array([1,0,1,0])), \\\n", " np.array([[0,1],[1,0],[0,1],[1,0]]))" ] }, { "cell_type": "markdown", "id": "8758fb30", "metadata": {}, "source": [ "8. Implement the cost function and gradient for the multinomial logistic regression in terms of two functions $\\mathtt{multinomial\\_logistic\\_regression\\_cost\\_function}$ and $\\mathtt{multinomial\\_logistic\\_regression\\_gradient}$. As in the binary classification case, the arguments are the data matrix _data_matrix_ and weights that are now named _weights_matrix_. Instead of passing on labels as _outputs_ as in the binary case, you pass the one hot vector encoding representation _one_hot_vector_encodings_ as your third argument. Return the cost function value, respectively the gradient, following the mathematical formulas in the lecture notes.\n", "\n", "$$\n", "L\\left(\\mathbf{W}\\right) = \\sum\\limits_{i=1}^s\\log\\left[\\sum\\limits_{k=1}^K \\mathrm{e}^{f\\left(\\mathbf{x}^{(i)},\\mathbf{W}\\right)_k}\\right]\n", "- \\sum\\limits_{i=1}^s\\sum\\limits_{k=1}^K \\mathbf{1}_{y_i=k}f\\left(\\mathbf{x}^{(i)},\\mathbf{W}\\right)_k,\n", "$$\n", "where $f\\left(\\mathbf{x}, \\mathbf{W}\\right)$ is a model function. In the case of linear model function \n", "$$\n", "f\\left(\\mathbf{x},\\mathbf{W}\\right) = \n", "\\left( \\left\\langle \\phi\\left(\\mathbf{x}\\right), \\mathbf{w}^{(1)}\\right\\rangle \\quad \\left\\langle \\phi\\left(\\mathbf{x}\\right), \\mathbf{w}^{(2)}\\right\\rangle \\quad \\ldots \\quad \\left\\langle \\phi\\left(\\mathbf{x}\\right), \\mathbf{w}^{(K)}\\right\\rangle\\right)\n", "$$\n", "one can write\n", "$$\n", "\\begin{align*}\n", "L\\left(\\mathbf{W}\\right) &=& \\sum\\limits_{i=1}^s\\log\\left[\\sum\\limits_{k=1}^K \\mathrm{e}^{\\left\\langle \\phi\\left(\\mathbf{x}^{(i)}\\right), \\mathbf{w}^{(k)}\\right\\rangle}\\right]\n", "- \\sum\\limits_{i=1}^s\\sum\\limits_{k=1}^K \\mathbf{1}_{y_i=k}\\left\\langle \\phi\\left(\\mathbf{x}^{(i)}\\right), \\mathbf{w}^{(k)}\\right\\rangle\n", "\\\\\n", "\\frac{\\partial L\\left(\\mathbf{W}\\right)}{\\partial \\mathbf{w}^{(p)}_q} &=& \\sum\\limits_{i=1}^s \\phi\\left(\\mathbf{x}^{(i)}\\right)_q\\mathrm{softmax}\\left(f\\left(\\mathbf{x}^{(i)}, \\mathbf{W}\\right)\\right)_p\n", "- \\sum\\limits_{i=1}^s\\mathbf{1}_{y_i = p} \\phi\\left(\\mathbf{x}^{(i)}\\right)_q.\n", "\\end{align*}\n", "$$" ] }, { "cell_type": "code", "execution_count": null, "id": "aaa86a79", "metadata": {}, "outputs": [], "source": [ "def multinomial_logistic_regression_cost_function(data_matrix, weight_matrix, one_hot_vector_encodings): \n", " # YOUR CODE HERE\n", " raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "id": "c3a35fc6", "metadata": {}, "outputs": [], "source": [ "def multinomial_logistic_regression_gradient(data_matrix, weight_matrix, one_hot_vector_encodings): \n", " # YOUR CODE HERE\n", " raise NotImplementedError()" ] }, { "cell_type": "markdown", "id": "1f3abd1d", "metadata": {}, "source": [ "Test your functions with the following cells" ] }, { "cell_type": "code", "execution_count": null, "id": "72641160", "metadata": {}, "outputs": [], "source": [ "test_data_matrix = np.array([[6, 4, 5], [1, 2, 8], [-3, 3, 6], [6, 5, -100], [5, 7, 2]])\n", "test_weight_matrix = np.array([[2, 1, -2, -4], [ 2, -5, 1, 4], [-2, -3, -1, -2]])\n", "test_one_hot_vector_encoding = np.array([[1., 0., 0., 0.], [0., 0., 1., 0.], \\\n", " [0., 0., 0., 1.], [0., 1., 0., 0.], [1., 0., 0., 0.]])\n", "assert_array_almost_equal(multinomial_logistic_regression_cost_function(test_data_matrix, \\\n", " test_weight_matrix, \\\n", " test_one_hot_vector_encoding),\\\n", " 0.1430551433917744)" ] }, { "cell_type": "code", "execution_count": null, "id": "42807d8b", "metadata": {}, "outputs": [], "source": [ "test_data_matrix = np.array([[6, 4, 5], [1, 2, 8], [-3, 3, 6], [6, 5, -100], [5, 7, 2]])\n", "test_weight_matrix = np.array([[2, 1, -2, -4], [ 2, -5, 1, 4], [-2, -3, -1, -2]])\n", "test_one_hot_vector_encoding = np.array([[1., 0., 0., 0.], [0., 0., 1., 0.], \\\n", " [0., 0., 0., 1.], [0., 1., 0., 0.], [1., 0., 0., 0.]])\n", "assert_array_almost_equal(multinomial_logistic_regression_gradient(test_data_matrix, \\\n", " test_weight_matrix, \\\n", " test_one_hot_vector_encoding),\\\n", " np.array([[ 1.173099e-01, 1.203832e-11, -1.335569e-01, 1.624699e-02],\\\n", " [ 2.346201e-01, 2.407656e-11, -2.660032e-01, 3.138308e-02],\\\n", " [ 9.384832e-01, 9.630610e-11, -1.064753e+00, 1.262698e-01]]))" ] }, { "cell_type": "markdown", "id": "283f27bf", "metadata": {}, "source": [ "9. Write a function $\\mathtt{classification\\_accuracy}$ that takes two NumPy array arguments $\\mathtt{predicted\\_labels}$ and $\\mathtt{true\\_labels}$ and evaluates the ratio of labels that coincide in both arrays to the number of all labels." ] }, { "cell_type": "code", "execution_count": null, "id": "088df900", "metadata": {}, "outputs": [], "source": [ "def classification_accuracy(estimated_labels, true_labels):\n", " # YOUR CODE HERE\n", " raise NotImplementedError()" ] }, { "cell_type": "markdown", "id": "80d71c59", "metadata": {}, "source": [ "Test your function with the following unit tests." ] }, { "cell_type": "code", "execution_count": null, "id": "08cc51d2", "metadata": {}, "outputs": [], "source": [ "test_estimated_labels = np.array([0,4,0,0,2,4,0,0,2,2,2,3,3,1,3,4,0,3,4,0,1,1,2,0,0,0,\\\n", " 1,4,4,4,3,0,4,2,4,4,4,2,2,1,4,3,2,2,1,1,4,3,3,0,4,3,\\\n", " 0,0,0,2,4,3,4,3,1,3,2,4,2,3,2,3,2,3,1,0,4,3,2,3,1,3,\\\n", " 4,1,3,1,4,0,4,4,1,2,3,1,1,4,3,1,3,0,2,0,0,1])\n", "test_true_labels = np.array([4,4,4,4,1,3,0,4,1,0,2,3,1,2,0,1,3,4,3,4,4,4,2,0,3,4,3,2,3,\\\n", " 0,1,0,4,3,2,2,1,2,2,1,4,0,4,4,3,0,0,0,3,2,3,0,0,3,3,4,2,2,\\\n", " 3,4,2,3,3,0,0,2,0,3,0,4,4,4,4,1,2,3,1,3,1,1,2,0,2,3,3,3,4,\\\n", " 0,0,3,1,0,3,0,3,1,0,0,4,4])\n", "assert_array_almost_equal(\n", " classification_accuracy(test_estimated_labels, test_true_labels), 0.26)" ] }, { "cell_type": "markdown", "id": "b66c1a2d", "metadata": {}, "source": [ "10. Finally, we would like to apply all the above to the data. I.e.,\n", "- define two lambda functions called $\\mathtt{objective}$, $\\mathtt{gradient}$ that take $1$ argument _weight_;\n", "- define _initial_weights_ to be a zero matrix of appropriate size;\n", "- let the step size parameter _step_size_ be equal to $3.9/\\left\\|\\mathbf{X}\\right\\|^2$;\n", "- run 10,000 iterations of the gradient descent procedure to find an optimal weights _optimal_weights_;\n", "- evaluate an accuracy rate _accuracy_rate_ evaluated for the model defined to have _optimal_weights_ weights." ] }, { "cell_type": "code", "execution_count": null, "id": "e6a1a719", "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "id": "45805085", "metadata": {}, "outputs": [], "source": [ "print(\"The mulnomial logistic regression successfully classified {acc:2.2f} % of data\".format(acc = 100*accuracy_rate))" ] }, { "cell_type": "markdown", "id": "e3adba86", "metadata": {}, "source": [ "### Two layers Neural Network for mutliclass classification\n", "\n", "In the second part of the assignment you are asked to try the same classification task but considering a neural network with $L = 2$ layers. All layers have the same activation function which is given by the $\\mathtt{softmax\\_function}$ and same type of model function given by an affine-linear transformation. Namely, we define\n", "$$\n", "\\left\\{\n", "\\begin{array}{ll}\n", "Z^{(\\ell)} &= \\widetilde{X}^{(\\ell-1)}W^{(\\ell)},\\\\\n", "X^{(\\ell)} &= \\mathrm{softmax}\\left(Z^{(\\ell)}\\right)\n", "\\end{array}\n", "\\right.,\n", "\\label{eq:nn_model}\n", "$$\n", "for $\\ell=1,\\ldots,L$, where \n", "- $X^{(\\ell)}$ is a mathematical representation of data _inputs_ at the layer $\\ell+1$, in particular $X^{(0)}$ is a mathematical representation of an input at the first layer, which just represents data samples;\n", "- $Z^{(\\ell)}$ is a mathematical representation of model function values at the layer $\\ell$, in particular $Z^{(L)}$ is a mathematical representation the final output of neural network;\n", "- for every matrix $M$ we write $\\widetilde{M}$ to denote an augmented matrix $M$, i.e. the one with artificial column of ones inserted at the beginning of a matrix; \n", "- $W^{(\\ell)}$ are weight matrices representing linear transformation applied at the layer $\\ell$ for $\\ell = \\overline{1,L}$ that have dimensions $\\left( d_{\\ell} + 1 \\right)\\times d_{\\ell+1}$ with $d_1 = d$, $d_{L+1} = K$, while $d_2,\\ldots,d_L$ can take any integer values. In this exercise we would assume $d_2 = \\ldots = d_L = d$.\n", "\n", "By training a neural network we understand identifying optimal values of parameters $\\mathbf{W} = \\left(\\mathbf{W}^{(1)}, \\mathbf{W}^{(2)}, \\ldots, \\mathbf{W}^{(L)}\\right)$ such that the cost function\n", "$$\n", "L\\left(\\mathbf{W}\\right) = \n", "\\sum\\limits_{i=1}^s \\log\\left(\\sum\\limits_{j=1}^K\n", "\\mathrm{e}^{Z^{(L)}_{i,j}}\\right)\n", "-\n", "\\sum\\limits_{i=1}^s\\sum\\limits_{j=1}^K\n", "\\mathbf{1}_{y_i = j} Z^{(L)}_{i,j}.\n", "$$\n", "is minimised.\n", "\n", "The optimisation problem is set to be solved by using the gradient descent method. This involves a backpropagation approach to a calculation of the gradient $\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d}\\mathbf{W}}$ (see lecture notes). To help you with the problem, we provide you with a full set of equations you need to use to solve the problem. Every derivative here is thought as a gradient and is written in the form of a corresponding size matrix. For example,\n", "the derivative $\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d}W^{(1)}}$ should be thought as a gradient of the cost function $L\\left(\\mathbf{W}\\right)$ with respect to the arguments $W^{(1)}_{i,j}$ that is written as a matrix of size $\\left(d+1\\right)\\times d$. In what follows, we denote a matrix $M$ with the first row removed as $\\widehat{M}$ and also write $\\mathrm{OHV}$ for a one hot vector representation of input data.\n", "$$\n", "\\boxed{\n", "\\begin{array}{rcl}\n", "\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d} Z^{(\\ell)}}&=& \n", "\\begin{cases}\n", "\\mathrm{softmax}\\left(Z^{(L)}\\right) - \n", "\\mathrm{OHV},& \\ell = L,\\\\\n", "X^{(\\ell)}\\odot\\left[\n", "\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d} X^{(\\ell)}}\n", "- \\Sigma\\left(X^{(\\ell)}\\odot\n", "\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d} X^{(\\ell)}} \\right)\n", "\\right]\n", ",& \\ell < L\n", "\\end{cases}\n", "\\\\\n", "\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d} X^{(\\ell)}} &=& \\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d} Z^{(\\ell+1)}}\\cdot \\left(\\widehat{W}^{(l+1)}\\right)^{\\top}\n", "\\\\\n", "\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d} W^{(\\ell)}} &=& \n", "\\left(\\widetilde{X}^{(\\ell-1)}\\right)^{\\top}\\cdot\n", "\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d} Z^{(\\ell)}}\n", "\\end{array}\n", "},\n", "$$\n", "where for any matrix $M$ we define $\\Sigma\\left(M\\right)$ as a matrix of the same dimensions as $M$ but with every matrix element swapped with a row-sum of matrix elements of matrix $M$.\n", "\n", "\n", "In this section we will assume that the weights are now represented by a three dimensional NumPy array, which has a form of a row vector, where each element is a two dimensional NumPy array. I.e., weights have a form of\n", "\n", "$$\n", "\\mathbf{W} = \\left[\\mathbf{W}^{(1)}, \\mathbf{W}^{(2)}, \\ldots, \\mathbf{W}^{(L)}\\right],\n", "$$\n", "and $\\mathit{weights}[i,j,k] = \\mathbf{W}^{(i)}_{j,k}$. \n", "\n", "The neural network state is now described by $2\\cdot L + 1$ matrices. This is given by\n", "$$\n", "\\mathit{state} = \\left[\\mathbf{X}^{(0)}, \\mathbf{Z}^{(1)}, \\mathbf{X}^{(1)}, \\mathbf{Z}^{(2)}, \\mathbf{X}^{(2)}, \\ldots, \\mathbf{Z}^{(L)}, \\mathbf{X}^{(L)}\\right].\n", "$$\n", "\n", "\n", "Ideally, the code you will write here should work for any number of layers, despite your task being only to model $2$ layered network. Let us introduce a variable that stores the number of layers, which you can later change to see how this will affect the output." ] }, { "cell_type": "code", "execution_count": null, "id": "02def757", "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "markdown", "id": "8966f2aa", "metadata": {}, "source": [ "1. Start by writing the $\\mathtt{nn\\_forward\\_propagation}$ function that takes two arguments\n", "- _data_input_ - input data for the first layer ($\\mathbf{X}^{(0)}$);\n", "- _weight_matrix_ - is the list of weight matrices as described above.\n", "\n", "The function should return you the state of neural network following the above formulas. The output should have a form\n", "$$\n", "\\left[\\mathbf{X}^{(0)}, \\mathbf{Z}^{(1)}, \\mathbf{X}^{(1)}, \\mathbf{Z}^{(2)}, \\mathbf{X}^{(2)}, \\ldots, \\mathbf{Z}^{(L)}, \\mathbf{X}^{(L)}\\right].\n", "$$" ] }, { "cell_type": "code", "execution_count": null, "id": "8d6aad82", "metadata": {}, "outputs": [], "source": [ "def nn_forward_propagation(data_input, weight_matrix): \n", " # YOUR CODE HERE\n", " raise NotImplementedError()" ] }, { "cell_type": "markdown", "id": "7243d27b", "metadata": {}, "source": [ "Test your function with the following unit test" ] }, { "cell_type": "code", "execution_count": null, "id": "831c84c1", "metadata": {}, "outputs": [], "source": [ "test_data_input = np.array([[1,2],[3,4],[5,6]])\n", "test_weight_matrix = np.array([[[1,2],[3,4],[5,6]], [[-1,2],[-2,3],[3,1]]])\n", "assert_array_almost_equal(nn_forward_propagation(test_data_input,test_weight_matrix)[0],np.array([[1,2],[3,4],[5,6]]))\n", "assert_array_almost_equal(nn_forward_propagation(test_data_input,test_weight_matrix)[1],np.array([[14,18],[30,38],[46,58]]))\n", "assert_array_almost_equal(nn_forward_propagation(test_data_input,test_weight_matrix)[2],\\\n", " np.array([[1.798621e-02, 9.820138e-01],[3.353501e-04, 9.996646e-01],[6.144175e-06, 9.999939e-01]]))\n", "assert_array_almost_equal(nn_forward_propagation(test_data_input,test_weight_matrix)[3],\\\n", " np.array([[1.910069, 3.035972],[1.998323, 3.000671],[1.999969, 3.000012]]))\n", "assert_array_almost_equal(nn_forward_propagation(test_data_input,test_weight_matrix)[4],\\\n", " np.array([[0.244918, 0.755082], [0.26848 , 0.73152 ], [0.268933, 0.731067]]))" ] }, { "cell_type": "markdown", "id": "9414cbff", "metadata": {}, "source": [ "2. Implement the function $\\mathtt{nn\\_back\\_propagation}$ that takes $3$ arguments:\n", "- _data_input_ - input data for the first layer ($\\mathbf{X}^{(0)}$);\n", "- _weight_matrix_ - is the list of weight matrices as described above;\n", "- _one_hot_vector_encoding_ - one hot vector encoding of training data samples.\n", "\n", "Your function should output a gradient $\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d}\\mathbf{W}}$ as per the above formulas. it should have the same shape as _weight_matrix_. You should first evaluate a state of the neural network by following forward propagation method, and then evaluate corresponding derivatives by using the above formulas." ] }, { "cell_type": "code", "execution_count": null, "id": "f8339df6", "metadata": {}, "outputs": [], "source": [ "def nn_back_propagation(data_input, weight_matrix, one_hot_vector_encodings): \n", " # YOUR CODE HERE\n", " raise NotImplementedError()" ] }, { "cell_type": "markdown", "id": "23867b33", "metadata": {}, "source": [ "Test your function with the following unit test." ] }, { "cell_type": "code", "execution_count": null, "id": "330b0f75", "metadata": {}, "outputs": [], "source": [ "test_data_input = np.array([[1,2],[3,4],[5,6]])\n", "test_weight_matrix = np.array([[[1,2],[3,4],[5,6]], [[-1,2],[-2,3],[3,1]]])\n", "assert_array_almost_equal(nn_back_propagation(test_data_input, test_weight_matrix, np.array([[0,1],[1,0],[0,1]]))[0],\\\n", " np.array([[-0.028576, 0.028576], [-0.025189, 0.025189], [-0.053766, 0.053766]]))\n", "assert_array_almost_equal(nn_back_propagation(test_data_input, test_weight_matrix, np.array([[0,1],[1,0],[0,1]]))[1],\\\n", " np.array([[-0.217669, 0.217669], [ 0.004161, -0.004161], [-0.22183 , 0.22183 ]]))" ] }, { "cell_type": "markdown", "id": "16bece8c", "metadata": {}, "source": [ "3. Write two functions \n", "- $\\mathtt{nn\\_cost\\_function}$ which evaluates a cost function by following the above formula. This function should take $3$ arguments: _data_input_, _weight_matrix_, _one_hot_vector_encodings_ and return a value of cost function. \n", "- $\\mathtt{nn\\_prediction\\_function}$ which evaluates class labels for data samples. This function should take $2$ arguments: _data_input_, _weight_matrix_ and return a vector of class labels." ] }, { "cell_type": "code", "execution_count": null, "id": "8ea8e6f0", "metadata": {}, "outputs": [], "source": [ "def nn_cost_function(data_input, weight_matrix, one_hot_vector_encodings): \n", " # YOUR CODE HERE\n", " raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "id": "56db5292", "metadata": {}, "outputs": [], "source": [ "def nn_prediction_function(data_input, weight_matrix): \n", " # YOUR CODE HERE\n", " raise NotImplementedError()" ] }, { "cell_type": "markdown", "id": "fbf3229e", "metadata": {}, "source": [ "4. Finally, we would like to apply all the above to data. I.e.,\n", "- define two lambda functions called $\\mathtt{nn\\_objective}$, $\\mathtt{nn\\_gradient}$ that take $1$ argument _weight_;\n", "- define nn_initial_weights_ to be a vector of zero matrices of appropriate sizes;\n", "- let the step size parameter _nn_step_size_ be equal to $3.9/\\left\\|\\mathbf{X}\\right\\|^2$;\n", "- run 10,000 iterations of the gradient descent procedure to find an optimal weights nn_optimal_weights_;\n", "- evaluate an accuracy rate nn_accuracy_rate_ evaluated for the model defined to have nn_optimal_weights_ weights." ] }, { "cell_type": "code", "execution_count": null, "id": "d4dda842", "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "id": "23255c3c", "metadata": {}, "outputs": [], "source": [ "print(\"The two layered neural network successfully classified {acc:2.2f} % of data\".format(acc = 100 * nn_accuracy_rate))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.17" } }, "nbformat": 4, "nbformat_minor": 5 }