{ "cells": [ { "cell_type": "markdown", "id": "f1a05dc7", "metadata": {}, "source": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
\n", "

MTH786 Machine Learning with Python

\n", "
\n", " \n", "

Semester A, 2023/2024

\n", "
\n", "

Coursework 10

\n", "
\n", " \n", "

Dr Nicola Perra

\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "id": "49e97a81", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from numpy.testing import assert_array_almost_equal, assert_array_equal\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "id": "10b3ac2c", "metadata": {}, "source": [ "### Multinomial logistic classification\n", "We now move on to multinomial logistic regression for multi-class classfication problems. To this end we will use a synthetic dataset built for these types of taks. The dataset consits on s samples, 10 features and different categories\n", "\n", "\n", "We start by loading data. **Important:** please check that file $\\mathtt{dataset.csv}$ is located in the same folder with your Jupyter notebook." ] }, { "cell_type": "code", "execution_count": 2, "id": "70a0d912", "metadata": {}, "outputs": [], "source": [ "# first step is to get the data, labels and inputs\n", "complete_data = np.genfromtxt(\"dataset.csv\", skip_header = 1, delimiter = ',')\n", "labels = complete_data[:, -1].astype(int) - np.min(complete_data[:, -1].astype(int))\n", "inputs = complete_data[:, :-1]" ] }, { "cell_type": "markdown", "id": "ca482452", "metadata": {}, "source": [ "1. Implement function $\\mathtt{linear\\_regression\\_data}$ that computes (and outputs) the linear regression data matrix. The function should take the NumPy array _data_inputs_ as argument." ] }, { "cell_type": "code", "execution_count": 3, "id": "69bb0b37", "metadata": {}, "outputs": [], "source": [ "def linear_regression_data(data_inputs):\n", " first_column = np.ones((len(data_inputs), 1))\n", " X_matrix = np.c_[first_column,data_inputs]\n", " return X_matrix" ] }, { "cell_type": "markdown", "id": "b61222da", "metadata": {}, "source": [ "2. Write two functions \n", "- $\\mathtt{standardise}$ to standardise the columns of a multi-dimensional array. The function $\\mathtt{standardise}$\ttakes the multi-dimensional array _data_matrix_ as its input argument. It subtracts the means from each column and divides by the standard deviations. It returns the _standardised_matrix_, the _row_of_means_ and the _row_of_standard_deviations_.\n", "- $\\mathtt{de\\_standardise}$ to de-standardise the columns of a multi-dimensional array. The function $\\mathtt{de\\_standardise}$ reverses the above operation. It takes a _standardised_matrix_, the _row_of_means_ and the _row_of_standard_deviations_ as its arguments and returns a matrix for which the standardisation process is reversed." ] }, { "cell_type": "code", "execution_count": 4, "id": "e4ddbbe3", "metadata": {}, "outputs": [], "source": [ "def standardise(data_matrix):\n", " row_of_means = np.mean(data_matrix, axis=0)\n", " standardised_matrix = data_matrix - row_of_means\n", " row_of_stds = np.std(standardised_matrix, axis=0)\n", " return (standardised_matrix / row_of_stds), row_of_means, row_of_stds" ] }, { "cell_type": "code", "execution_count": 5, "id": "27125f41", "metadata": {}, "outputs": [], "source": [ "def de_standardise(standardised_matrix, row_of_means, row_of_stds):\n", " matrix = np.copy(standardised_matrix * row_of_stds)\n", " return matrix + row_of_means" ] }, { "cell_type": "markdown", "id": "5e766a4e", "metadata": {}, "source": [ "Now we standardise the input data and prepare data matrix for a multinomial logistic regression. " ] }, { "cell_type": "code", "execution_count": 6, "id": "cdda6017", "metadata": {}, "outputs": [], "source": [ "data_inputs, row_of_means, row_of_stds = standardise(inputs)\n", "data_matrix = linear_regression_data(data_inputs)" ] }, { "cell_type": "markdown", "id": "4252f7a9", "metadata": {}, "source": [ "3. As a first step, implement the softmax function $\\mathtt{softmax\\_function}$ as defined in the lectures. The function takes the NumPy array _argument_ as its main argument, but also has an optional _axis_ argument to determine across which array-dimension you apply the softmax operation (axis = 0 for columns, axis = 1 for rows, ...). If this argument is not specified (or set to _None_), then the softmax operation is applied to the entire array. Make sure your function works at least for NumPy arrays _argument_ with arbitrary numerical values and dimension one or two." ] }, { "cell_type": "code", "execution_count": 7, "id": "fdce2591", "metadata": {}, "outputs": [], "source": [ "def softmax_function(argument, axis=None): \n", " if axis == None:\n", " output = np.exp(argument - np.max(argument))\n", " output = output / np.sum(output)\n", " else:\n", " output = np.exp(argument - np.expand_dims(np.max(argument, axis), axis))\n", " output = output / np.expand_dims(np.sum(output, axis), axis)\n", " return output" ] }, { "cell_type": "markdown", "id": "71213ee4", "metadata": {}, "source": [ "Test your $\\mathtt{softmax\\_function}$ function with the following cell. " ] }, { "cell_type": "code", "execution_count": 8, "id": "6711c0a7", "metadata": {}, "outputs": [], "source": [ "assert_array_almost_equal(softmax_function(np.array([[1.5], [0.3], [-3.7]])), \\\n", " np.array([[0.76528029], [0.23049799], [0.00422172]]))" ] }, { "cell_type": "code", "execution_count": 9, "id": "a0bfa8a5", "metadata": {}, "outputs": [], "source": [ "assert_array_almost_equal(softmax_function(np.array([[1.5, 3], [0.3, -0.7], [-3.7, 2]]), axis=0), \\\n", " np.array([[0.76528029, 0.71807976], [0.23049799, 0.01775346], \\\n", " [0.00422172, 0.26416678]]))" ] }, { "cell_type": "code", "execution_count": 10, "id": "5a23453a", "metadata": {}, "outputs": [], "source": [ "assert_array_almost_equal(softmax_function(np.array([[1.5, 3], [0.3, -0.7], [-3.7, 2]]), axis=1), \\\n", " np.array([[0.182426, 0.817574],\\\n", " [0.731059, 0.268941],\\\n", " [0.003335, 0.996665]]))" ] }, { "cell_type": "markdown", "id": "fb2ce94b", "metadata": {}, "source": [ "4. Implement a function $\\mathtt{model\\_function}$ that outputs the values of the linear model function. Unlike the case of binary classification, the $\\mathtt{model\\_function}$ should return a vector of dimension $K$ for each data sample. This vector is defined as\n", "$$\n", "f\\left(\\mathbf{x},\\mathbf{W}\\right) = \n", "\\left( \\left\\langle \\phi\\left(\\mathbf{x}\\right), \\mathbf{w}^{(1)}\\right\\rangle \\quad \\left\\langle \\phi\\left(\\mathbf{x}\\right), \\mathbf{w}^{(2)}\\right\\rangle \\quad \\ldots \\quad \\left\\langle \\phi\\left(\\mathbf{x}\\right), \\mathbf{w}^{(K)}\\right\\rangle\\right),\n", "$$\n", "where \n", "$$\n", "\\mathbf{W} = \n", "\\begin{pmatrix}\n", "\\vdots & \\vdots & \\ldots & \\vdots \\\\\n", "\\mathbf{w}^{(1)} & \\mathbf{w}^{(2)} & \\ldots & \\mathbf{w}^{(K)}\\\\\n", "\\vdots & \\vdots & \\ldots & \\vdots \\\\\n", "\\end{pmatrix}\n", "$$\n", "is a mathematical representation of the weights matrix.\n", "\n", "As in the binary classification case, the arguments of $\\mathtt{model\\_function}$ are the data matrix _data_matrix_ and weights that are now named _weights_matrix_." ] }, { "cell_type": "code", "execution_count": 11, "id": "a86c8d0f", "metadata": {}, "outputs": [], "source": [ "def model_function(data_matrix, weight_matrix):\n", " return data_matrix @ weight_matrix" ] }, { "cell_type": "markdown", "id": "a75408cf", "metadata": {}, "source": [ "Test your $\\mathtt{model\\_function}$ function with the following cell. " ] }, { "cell_type": "code", "execution_count": 12, "id": "ad3927ee", "metadata": {}, "outputs": [], "source": [ "test_data_matrix = np.array([[1,2,3],[1,4,5],[1,6,7]])\n", "test_weight_matrix = np.array([[-1,1],[0,1],[1,0]])\n", "assert_array_almost_equal(model_function(test_data_matrix, test_weight_matrix),np.array([[2,3],[4,5],[6,7]]))" ] }, { "cell_type": "markdown", "id": "4394f672", "metadata": {}, "source": [ "5. Write a function $\\mathtt{multinomial\\_prediction\\_function}$ that turns your model function into labels. The function takes the arguments _data_matrix_ and _weight_matrix_ as inputs and returns a vector of labels with values in $\\{0, K - 1 \\}$ as its output." ] }, { "cell_type": "code", "execution_count": 13, "id": "439c4b9e", "metadata": {}, "outputs": [], "source": [ "def multinomial_prediction_function(data_matrix, weight_matrix): \n", " return np.argmax(model_function(data_matrix, weight_matrix), axis=1)" ] }, { "cell_type": "markdown", "id": "93f53902", "metadata": {}, "source": [ "Test your $\\mathtt{multinomial\\_prediction\\_function}$ function with the following cell." ] }, { "cell_type": "code", "execution_count": null, "id": "5966c24c", "metadata": {}, "outputs": [], "source": [ "test_data_matrix = np.array([[6, 4, 5], [1, 2, 8], [-3, 3, 6], [6, 5, -100], [5, 7, 2]])\n", "test_weight_matrix = np.array([[2, 1, -2, -4], [ 2, -5, 1, 4], [-2, -3, -1, -2]])\n", "assert_array_almost_equal(multinomial_prediction_function(test_data_matrix, test_weight_matrix),np.array([0,2,3,1,0]))" ] }, { "cell_type": "markdown", "id": "b7a7de52", "metadata": {}, "source": [ "6. Implement a function $\\mathtt{gradient\\_descent}$ that performs gradient descent to numerically approximate a minimiser of a convex function. The function should take the following arguments\n", "- *objective* - a lambda-function representing function $E$. This itself should take a NumPy array as its argument and return a real number.\n", "- *gradient* - a lambda-function representing function $\\nabla E$. This itself should take a NumPy array as its argument and return a NumPy array representation of the gradient $\\nabla E$.\n", "- *initial_ weights* - a NumPy array with initial values $\\mathbf{w}^{(0)}$ for the first iterate \n", "- *step_size* - a step-size parameter $\\tau$ for the gradient descent step\n", "- *no_of_iterations* - an integer parameter that controls the number of iterations\n", "- *print_output* - an integer parameter that controls how often you are printing an intermediate result. If say *print_output = 100*, then after every 100th iteration you are asked to print your current iterate and a value of the objective as *Iteration k/m, objective = o.*, where $k$ is a number of current iteration, $m$ is a total number of iterations, and $o$ is a value of the objective evaluated at current iterate.\n", "\n", "Implement the function so that it returns a NumPy array of the weights obtained after gradient descent together with a list of objective values for all iterates." ] }, { "cell_type": "code", "execution_count": 14, "id": "55f86058", "metadata": {}, "outputs": [], "source": [ "def gradient_descent(objective,\n", " gradient,\n", " initial_weights,\n", " step_size=1,\n", " no_of_iterations=100,\n", " print_output=10):\n", " \n", " objective_values = []\n", " weights = np.copy(initial_weights)\n", " objective_values.append(objective(weights))\n", " for counter in range(no_of_iterations):\n", " weights -= step_size * gradient(weights)\n", " objective_values.append(objective(weights))\n", " if (counter + 1) % print_output == 0:\n", " print(\"Iteration {k}/{m}, objective = {o}.\".format(k=counter+1, \n", " m=no_of_iterations, o=objective_values[counter]))\n", " print(\"Iteration completed after {k}/{m}, objective = {o}.\".format(k=counter+ 1, \n", " m=no_of_iterations,\n", " o=objective_values[counter]))\n", " return weights, objective_values" ] }, { "cell_type": "markdown", "id": "884d213a", "metadata": {}, "source": [ "Test your function with the following unit test" ] }, { "cell_type": "code", "execution_count": 15, "id": "cd016e65", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iteration 10/100, objective = -3.0256011293288547.\n", "Iteration 20/100, objective = -3.0256410067129824.\n", "Iteration 30/100, objective = -3.025641025632046.\n", "Iteration 40/100, objective = -3.0256410256410207.\n", "Iteration 50/100, objective = -3.025641025641026.\n", "Iteration 60/100, objective = -3.0256410256410255.\n", "Iteration 70/100, objective = -3.0256410256410264.\n", "Iteration 80/100, objective = -3.0256410256410255.\n", "Iteration 90/100, objective = -3.0256410256410255.\n", "Iteration 100/100, objective = -3.0256410256410255.\n", "Iteration completed after 100/100, objective = -3.0256410256410255.\n" ] } ], "source": [ "test_matrix_m = np.array([[3, 1], [2, 4]])\n", "test_vector_v = np.array([5, 6])\n", "# usual energy function\n", "test_objective = lambda x: x.T @ (test_matrix_m @ x) + x @ test_vector_v\n", "# we know the expression for the gradient\n", "test_gradient = lambda x: (test_matrix_m + test_matrix_m.T) @ x + test_vector_v\n", "\n", "test_initial_weights = np.array([0.0, 0.0])\n", "test_step_size = 0.9 / (np.linalg.norm(test_matrix_m + test_matrix_m.T))\n", "test_no_of_iterations = 100\n", "test_print_output = 10\n", "assert_array_almost_equal(gradient_descent(test_objective, test_gradient, \\\n", " test_initial_weights,test_step_size,\\\n", " test_no_of_iterations, test_print_output)[0],np.array([-0.564103, -0.538462]))" ] }, { "cell_type": "markdown", "id": "ce725032", "metadata": {}, "source": [ "7. Next, write a function $\\mathtt{one\\_hot\\_vector\\_encoding}$ that converts an NumPy array _labels_ with values in the range of $\\{0, K - 1\\}$ into so-called one-hot vector encodings. For example, for $K = 3$ and a label vector $\\text{labels} = \n", "\\begin{pmatrix} 2 & 0 & 1 & 2\\end{pmatrix}^\\top$, the output of $\\mathtt{one\\_hot\\_vector\\_encoding}$ should be a two-dimensional NumPy array of the form\n", "$$\n", "\\begin{pmatrix} 0 & 0 & 1 \\\\ 1 & 0 & 0 \\\\ 0 & 1 & 0 \\\\ 0 & 0 & 1 \n", "\\end{pmatrix}.\n", "$$\n", "\n", "Note how we have 1 in each row correspondent to the value of the label" ] }, { "cell_type": "code", "execution_count": 16, "id": "706144e7", "metadata": {}, "outputs": [], "source": [ "def one_hot_vector_encoding(labels): \n", " no_of_classes = np.max(labels) + 1\n", " output = np.zeros((len(labels), no_of_classes)) # as many rows as data points, as many columns as classes\n", " output[np.arange(len(labels)), labels] = 1\n", " # why it works?\n", " # np.arange(x) creates an array from 0 to x-1\n", " # labels is an array that tells us the id of each row that should be one!\n", " return output" ] }, { "cell_type": "code", "execution_count": 17, "id": "6cd1a19b", "metadata": {}, "outputs": [], "source": [ "# this is equivalent to\n", "def one_hot_vector_encoding_2(labels): \n", " no_of_classes = np.max(labels) + 1\n", " output = np.zeros((len(labels), no_of_classes))\n", " counter=0\n", " for i in labels:\n", " output[counter,i]=1\n", " counter+=1\n", " return output" ] }, { "cell_type": "markdown", "id": "092da0fe", "metadata": {}, "source": [ "Test your $\\mathtt{one\\_hot\\_vector\\_encoding}$ function with the following cell" ] }, { "cell_type": "code", "execution_count": 18, "id": "158f6038", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0. 1. 0. 0.]\n", " [0. 0. 1. 0.]\n", " [1. 0. 0. 0.]\n", " [0. 0. 0. 1.]]\n", "-------------------------\n", "[[0. 1. 0. 0.]\n", " [0. 0. 1. 0.]\n", " [1. 0. 0. 0.]\n", " [0. 0. 0. 1.]]\n" ] } ], "source": [ "# let's try it \n", "labels=np.array([1, 2, 0, 3])\n", "print (one_hot_vector_encoding(labels))\n", "print ('-------------------------')\n", "print (one_hot_vector_encoding_2(labels))" ] }, { "cell_type": "code", "execution_count": 19, "id": "79b88762", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0. 1.]\n", " [1. 0.]\n", " [0. 1.]\n", " [1. 0.]]\n" ] } ], "source": [ "# let's try another one\n", "labels=np.array([1,0,1,0]) # since we have two classes we have just two columns\n", "print (one_hot_vector_encoding(labels))" ] }, { "cell_type": "markdown", "id": "8758fb30", "metadata": {}, "source": [ "8. Implement the cost function and gradient for the multinomial logistic regression in terms of two functions $\\mathtt{multinomial\\_logistic\\_regression\\_cost\\_function}$ and $\\mathtt{multinomial\\_logistic\\_regression\\_gradient}$. As in the binary classification case, the arguments are the data matrix _data_matrix_ and weights that are now named _weights_matrix_. Instead of passing on labels as _outputs_ as in the binary case, you pass the one hot vector encoding representation _one_hot_vector_encodings_ as your third argument. Return the cost function value, respectively the gradient, following the mathematical formulas in the lecture notes.\n", "\n", "$$\n", "L\\left(\\mathbf{W}\\right) = \\sum\\limits_{i=1}^s\\log\\left[\\sum\\limits_{k=1}^K \\mathrm{e}^{f\\left(\\mathbf{x}^{(i)},\\mathbf{W}\\right)_k}\\right]\n", "- \\sum\\limits_{i=1}^s\\sum\\limits_{k=1}^K \\mathbf{1}_{y_i=k}f\\left(\\mathbf{x}^{(i)},\\mathbf{W}\\right)_k,\n", "$$\n", "where $f\\left(\\mathbf{x}, \\mathbf{W}\\right)$ is a model function. In the case of linear model function \n", "$$\n", "f\\left(\\mathbf{x},\\mathbf{W}\\right) = \n", "\\left( \\left\\langle \\phi\\left(\\mathbf{x}\\right), \\mathbf{w}^{(1)}\\right\\rangle \\quad \\left\\langle \\phi\\left(\\mathbf{x}\\right), \\mathbf{w}^{(2)}\\right\\rangle \\quad \\ldots \\quad \\left\\langle \\phi\\left(\\mathbf{x}\\right), \\mathbf{w}^{(K)}\\right\\rangle\\right)\n", "$$\n", "one can write\n", "$$\n", "\\begin{align*}\n", "L\\left(\\mathbf{W}\\right) &=& \\sum\\limits_{i=1}^s\\log\\left[\\sum\\limits_{k=1}^K \\mathrm{e}^{\\left\\langle \\phi\\left(\\mathbf{x}^{(i)}\\right), \\mathbf{w}^{(k)}\\right\\rangle}\\right]\n", "- \\sum\\limits_{i=1}^s\\sum\\limits_{k=1}^K \\mathbf{1}_{y_i=k}\\left\\langle \\phi\\left(\\mathbf{x}^{(i)}\\right), \\mathbf{w}^{(k)}\\right\\rangle\n", "\\\\\n", "\\frac{\\partial L\\left(\\mathbf{W}\\right)}{\\partial \\mathbf{w}^{(p)}_q} &=& \\sum\\limits_{i=1}^s \\phi\\left(\\mathbf{x}^{(i)}\\right)_q\\mathrm{softmax}\\left(f\\left(\\mathbf{x}^{(i)}, \\mathbf{W}\\right)\\right)_p\n", "- \\sum\\limits_{i=1}^s\\mathbf{1}_{y_i = p} \\phi\\left(\\mathbf{x}^{(i)}\\right)_q.\n", "\\end{align*}\n", "$$" ] }, { "cell_type": "code", "execution_count": 20, "id": "aaa86a79", "metadata": {}, "outputs": [], "source": [ "def multinomial_logistic_regression_cost_function(data_matrix, weight_matrix, one_hot_vector_encodings): \n", " model_evaluation = model_function(data_matrix, weight_matrix)\n", " return np.sum(np.log(np.sum(np.exp(model_evaluation), axis=1)) - np.sum(one_hot_vector_encodings * model_evaluation, axis=1))" ] }, { "cell_type": "code", "execution_count": 21, "id": "c3a35fc6", "metadata": {}, "outputs": [], "source": [ "def multinomial_logistic_regression_gradient(data_matrix, weight_matrix, one_hot_vector_encodings): \n", " model_evaluation = model_function(data_matrix, weight_matrix)\n", " return data_matrix.T @ (softmax_function(model_evaluation, axis=1) -one_hot_vector_encodings)" ] }, { "cell_type": "markdown", "id": "1f3abd1d", "metadata": {}, "source": [ "Test your functions with the following cells" ] }, { "cell_type": "code", "execution_count": 22, "id": "72641160", "metadata": {}, "outputs": [], "source": [ "test_data_matrix = np.array([[6, 4, 5], [1, 2, 8], [-3, 3, 6], [6, 5, -100], [5, 7, 2]])\n", "test_weight_matrix = np.array([[2, 1, -2, -4], [ 2, -5, 1, 4], [-2, -3, -1, -2]])\n", "\n", "\n", "test_one_hot_vector_encoding = np.array([[1., 0., 0., 0.], [0., 0., 1., 0.], \\\n", " [0., 0., 0., 1.], [0., 1., 0., 0.], [1., 0., 0., 0.]])\n", "assert_array_almost_equal(multinomial_logistic_regression_cost_function(test_data_matrix, \\\n", " test_weight_matrix, \\\n", " test_one_hot_vector_encoding),\\\n", " 0.1430551433917744)" ] }, { "cell_type": "code", "execution_count": 23, "id": "42807d8b", "metadata": {}, "outputs": [], "source": [ "test_data_matrix = np.array([[6, 4, 5], [1, 2, 8], [-3, 3, 6], [6, 5, -100], [5, 7, 2]])\n", "test_weight_matrix = np.array([[2, 1, -2, -4], [ 2, -5, 1, 4], [-2, -3, -1, -2]])\n", "test_one_hot_vector_encoding = np.array([[1., 0., 0., 0.], [0., 0., 1., 0.], \\\n", " [0., 0., 0., 1.], [0., 1., 0., 0.], [1., 0., 0., 0.]])\n", "assert_array_almost_equal(multinomial_logistic_regression_gradient(test_data_matrix, \\\n", " test_weight_matrix, \\\n", " test_one_hot_vector_encoding),\\\n", " np.array([[ 1.173099e-01, 1.203832e-11, -1.335569e-01, 1.624699e-02],\\\n", " [ 2.346201e-01, 2.407656e-11, -2.660032e-01, 3.138308e-02],\\\n", " [ 9.384832e-01, 9.630610e-11, -1.064753e+00, 1.262698e-01]]))" ] }, { "cell_type": "markdown", "id": "283f27bf", "metadata": {}, "source": [ "9. Write a function $\\mathtt{classification\\_accuracy}$ that takes two NumPy array arguments $\\mathtt{predicted\\_labels}$ and $\\mathtt{true\\_labels}$ and evaluates the ratio of labels that coincide in both arrays to the number of all labels." ] }, { "cell_type": "code", "execution_count": 24, "id": "088df900", "metadata": {}, "outputs": [], "source": [ "def classification_accuracy(estimated_labels, true_labels):\n", " equal_labels = estimated_labels == true_labels\n", " return np.mean(equal_labels)" ] }, { "cell_type": "markdown", "id": "80d71c59", "metadata": {}, "source": [ "Test your function with the following unit tests." ] }, { "cell_type": "code", "execution_count": 40, "id": "08cc51d2", "metadata": {}, "outputs": [], "source": [ "test_estimated_labels = np.array([0,4,0,0,2,4,0,0,2,2,2,3,3,1,3,4,0,3,4,0,1,1,2,0,0,0,\\\n", " 1,4,4,4,3,0,4,2,4,4,4,2,2,1,4,3,2,2,1,1,4,3,3,0,4,3,\\\n", " 0,0,0,2,4,3,4,3,1,3,2,4,2,3,2,3,2,3,1,0,4,3,2,3,1,3,\\\n", " 4,1,3,1,4,0,4,4,1,2,3,1,1,4,3,1,3,0,2,0,0,1])\n", "test_true_labels = np.array([4,4,4,4,1,3,0,4,1,0,2,3,1,2,0,1,3,4,3,4,4,4,2,0,3,4,3,2,3,\\\n", " 0,1,0,4,3,2,2,1,2,2,1,4,0,4,4,3,0,0,0,3,2,3,0,0,3,3,4,2,2,\\\n", " 3,4,2,3,3,0,0,2,0,3,0,4,4,4,4,1,2,3,1,3,1,1,2,0,2,3,3,3,4,\\\n", " 0,0,3,1,0,3,0,3,1,0,0,4,4])\n", "assert_array_almost_equal(\n", " classification_accuracy(test_estimated_labels, test_true_labels), 0.26)" ] }, { "cell_type": "markdown", "id": "b66c1a2d", "metadata": {}, "source": [ "10. Finally, we would like to apply all the above to the data. I.e.,\n", "- define two lambda functions called $\\mathtt{objective}$, $\\mathtt{gradient}$ that take $1$ argument _weight_;\n", "- define _initial_weights_ to be a zero matrix of appropriate size;\n", "- let the step size parameter _step_size_ be equal to $3.9/\\left\\|\\mathbf{X}\\right\\|^2$;\n", "- run 10,000 iterations of the gradient descent procedure to find an optimal weights _optimal_weights_;\n", "- evaluate an accuracy rate _accuracy_rate_ evaluated for the model defined to have _optimal_weights_ weights." ] }, { "cell_type": "code", "execution_count": 25, "id": "e6a1a719", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iteration 1000/10000, objective = 1439.189196039968.\n", "Iteration 2000/10000, objective = 1433.3815618034994.\n", "Iteration 3000/10000, objective = 1431.1092776592886.\n", "Iteration 4000/10000, objective = 1429.9411000503446.\n", "Iteration 5000/10000, objective = 1429.254721096464.\n", "Iteration 6000/10000, objective = 1428.8165528337333.\n", "Iteration 7000/10000, objective = 1428.5204216070545.\n", "Iteration 8000/10000, objective = 1428.3116885725212.\n", "Iteration 9000/10000, objective = 1428.1596600836444.\n", "Iteration 10000/10000, objective = 1428.045946444532.\n", "Iteration completed after 10000/10000, objective = 1428.045946444532.\n" ] } ], "source": [ "# let us get back the data \n", "\n", "complete_data = np.genfromtxt(\"dataset.csv\", skip_header = 1, delimiter = ',')\n", "labels = complete_data[:, -1].astype(int) - np.min(complete_data[:, -1].astype(int))\n", "inputs = complete_data[:, :-1]\n", "\n", "#don't forget to standardise it\n", "data_inputs, row_of_means, row_of_stds = standardise(inputs)\n", "data_matrix = linear_regression_data(inputs)\n", "\n", "# we need the OHV\n", "OHV = one_hot_vector_encoding(labels)\n", "\n", "# then as usual the objective function and it gradient\n", "objective = lambda weights: multinomial_logistic_regression_cost_function(data_matrix, weights,OHV)\n", "gradient = lambda weights: multinomial_logistic_regression_gradient(data_matrix, weights, OHV)\n", "\n", "step_size = 3.9/(np.linalg.norm(data_matrix))**2\n", "\n", "initial_weight_matrix = np.zeros((data_matrix.shape[1], OHV.shape[1]))\n", "optimal_weights, objective_values = gradient_descent(objective, gradient,\n", " initial_weight_matrix, step_size, \n", " no_of_iterations=10000, print_output=1000)\n", "\n", "accuracy_rate = classification_accuracy(multinomial_prediction_function(data_matrix,\n", " optimal_weights),labels)" ] }, { "cell_type": "code", "execution_count": 44, "id": "45805085", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The mulnomial logistic regression successfully classified 60.94 % of data\n" ] } ], "source": [ "print(\"The mulnomial logistic regression successfully classified {acc:2.2f} % of data\".format(acc = 100*accuracy_rate))" ] }, { "cell_type": "markdown", "id": "e3adba86", "metadata": {}, "source": [ "### Two layers Neural Network for mutliclass classification\n", "\n", "In the second part of the assignment you are asked to try the same classification task but considering a neural network with $L = 2$ layers. All layers have the same activation function which is given by the $\\mathtt{softmax\\_function}$ and same type of model function given by an affine-linear transformation. Namely, we define\n", "$$\n", "\\left\\{\n", "\\begin{array}{ll}\n", "Z^{(\\ell)} &= \\widetilde{X}^{(\\ell-1)}W^{(\\ell)},\\\\\n", "X^{(\\ell)} &= \\mathrm{softmax}\\left(Z^{(\\ell)}\\right)\n", "\\end{array}\n", "\\right.,\n", "\\label{eq:nn_model}\n", "$$\n", "for $\\ell=1,\\ldots,L$, where \n", "- $X^{(\\ell)}$ is a mathematical representation of data _inputs_ at the layer $\\ell+1$, in particular $X^{(0)}$ is a mathematical representation of an input at the first layer, which just represents data samples;\n", "- $Z^{(\\ell)}$ is a mathematical representation of model function values at the layer $\\ell$, in particular $Z^{(L)}$ is a mathematical representation the final output of neural network;\n", "- for every matrix $M$ we write $\\widetilde{M}$ to denote an augmented matrix $M$, i.e. the one with artificial column of ones inserted at the beginning of a matrix; \n", "- $W^{(\\ell)}$ are weight matrices representing linear transformation applied at the layer $\\ell$ for $\\ell = \\overline{1,L}$ that have dimensions $\\left( d_{\\ell} + 1 \\right)\\times d_{\\ell+1}$ with $d_1 = d$, $d_{L+1} = K$, while $d_2,\\ldots,d_L$ can take any integer values. In this exercise we would assume $d_2 = \\ldots = d_L = d$.\n", "\n", "By training a neural network we understand identifying optimal values of parameters $\\mathbf{W} = \\left(\\mathbf{W}^{(1)}, \\mathbf{W}^{(2)}, \\ldots, \\mathbf{W}^{(L)}\\right)$ such that the cost function\n", "$$\n", "L\\left(\\mathbf{W}\\right) = \n", "\\sum\\limits_{i=1}^s \\log\\left(\\sum\\limits_{j=1}^K\n", "\\mathrm{e}^{Z^{(L)}_{i,j}}\\right)\n", "-\n", "\\sum\\limits_{i=1}^s\\sum\\limits_{j=1}^K\n", "\\mathbf{1}_{y_i = j} Z^{(L)}_{i,j}.\n", "$$\n", "is minimised.\n", "\n", "The optimisation problem is set to be solved by using the gradient descent method. This involves a backpropagation approach to a calculation of the gradient $\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d}\\mathbf{W}}$ (see lecture notes). To help you with the problem, we provide you with a full set of equations you need to use to solve the problem. Every derivative here is thought as a gradient and is written in the form of a corresponding size matrix. For example,\n", "the derivative $\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d}W^{(1)}}$ should be thought as a gradient of the cost function $L\\left(\\mathbf{W}\\right)$ with respect to the arguments $W^{(1)}_{i,j}$ that is written as a matrix of size $\\left(d+1\\right)\\times d$. In what follows, we denote a matrix $M$ with the first row removed as $\\widehat{M}$ and also write $\\mathrm{OHV}$ for a one hot vector representation of input data.\n", "$$\n", "\\boxed{\n", "\\begin{array}{rcl}\n", "\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d} Z^{(\\ell)}}&=& \n", "\\begin{cases}\n", "\\mathrm{softmax}\\left(Z^{(L)}\\right) - \n", "\\mathrm{OHV},& \\ell = L,\\\\\n", "X^{(\\ell)}\\odot\\left[\n", "\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d} X^{(\\ell)}}\n", "- \\Sigma\\left(X^{(\\ell)}\\odot\n", "\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d} X^{(\\ell)}} \\right)\n", "\\right]\n", ",& \\ell < L\n", "\\end{cases}\n", "\\\\\n", "\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d} X^{(\\ell)}} &=& \\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d} Z^{(\\ell+1)}}\\cdot \\left(\\widehat{W}^{(l+1)}\\right)^{\\top}\n", "\\\\\n", "\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d} W^{(\\ell)}} &=& \n", "\\left(\\widetilde{X}^{(\\ell-1)}\\right)^{\\top}\\cdot\n", "\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d} Z^{(\\ell)}}\n", "\\end{array}\n", "},\n", "$$\n", "where for any matrix $M$ we define $\\Sigma\\left(M\\right)$ as a matrix of the same dimensions as $M$ but with every matrix element swapped with a row-sum of matrix elements of matrix $M$.\n", "\n", "\n", "In this section we will assume that the weights are now represented by a three dimensional NumPy array, which has a form of a row vector, where each element is a two dimensional NumPy array. I.e., weights have a form of\n", "\n", "$$\n", "\\mathbf{W} = \\left[\\mathbf{W}^{(1)}, \\mathbf{W}^{(2)}, \\ldots, \\mathbf{W}^{(L)}\\right],\n", "$$\n", "and $\\mathit{weights}[i,j,k] = \\mathbf{W}^{(i)}_{j,k}$. \n", "\n", "The neural network state is now described by $2\\cdot L + 1$ matrices. This is given by\n", "$$\n", "\\mathit{state} = \\left[\\mathbf{X}^{(0)}, \\mathbf{Z}^{(1)}, \\mathbf{X}^{(1)}, \\mathbf{Z}^{(2)}, \\mathbf{X}^{(2)}, \\ldots, \\mathbf{Z}^{(L)}, \\mathbf{X}^{(L)}\\right].\n", "$$\n", "\n", "\n", "Ideally, the code you will write here should work for any number of layers, despite your task being only to model $2$ layered network. Let us introduce a variable that stores the number of layers, which you can later change to see how this will affect the output." ] }, { "cell_type": "code", "execution_count": 52, "id": "02def757", "metadata": {}, "outputs": [], "source": [ "no_of_layers = 2" ] }, { "cell_type": "markdown", "id": "8966f2aa", "metadata": {}, "source": [ "1. Start by writing the $\\mathtt{nn\\_forward\\_propagation}$ function that takes two arguments\n", "- _data_input_ - input data for the first layer ($\\mathbf{X}^{(0)}$);\n", "- _weight_matrix_ - is the list of weight matrices as described above.\n", "\n", "The function should return you the state of neural network following the above formulas. The output should have a form\n", "$$\n", "\\left[\\mathbf{X}^{(0)}, \\mathbf{Z}^{(1)}, \\mathbf{X}^{(1)}, \\mathbf{Z}^{(2)}, \\mathbf{X}^{(2)}, \\ldots, \\mathbf{Z}^{(L)}, \\mathbf{X}^{(L)}\\right].\n", "$$" ] }, { "cell_type": "code", "execution_count": 53, "id": "8d6aad82", "metadata": {}, "outputs": [], "source": [ "def nn_forward_propagation(data_input, weight_matrix): \n", " no_of_layers = weight_matrix.shape[0]\n", " nn_state = np.empty(2*no_of_layers+1, dtype = np.ndarray)\n", " nn_state[0] = data_input\n", " for l in range(no_of_layers):\n", " nn_state[2*l+1] = model_function(linear_regression_data(nn_state[2*l]),weight_matrix[l])\n", " nn_state[2*l+2] = softmax_function(nn_state[2*l+1], axis = 1)\n", " return nn_state" ] }, { "cell_type": "markdown", "id": "7243d27b", "metadata": {}, "source": [ "Test your function with the following unit test" ] }, { "cell_type": "code", "execution_count": 46, "id": "831c84c1", "metadata": {}, "outputs": [], "source": [ "test_data_input = np.array([[1,2],[3,4],[5,6]])\n", "test_weight_matrix = np.array([[[1,2],[3,4],[5,6]], [[-1,2],[-2,3],[3,1]]])\n", "assert_array_almost_equal(nn_forward_propagation(test_data_input,test_weight_matrix)[0],np.array([[1,2],[3,4],[5,6]]))\n", "assert_array_almost_equal(nn_forward_propagation(test_data_input,test_weight_matrix)[1],np.array([[14,18],[30,38],[46,58]]))\n", "assert_array_almost_equal(nn_forward_propagation(test_data_input,test_weight_matrix)[2],\\\n", " np.array([[1.798621e-02, 9.820138e-01],[3.353501e-04, 9.996646e-01],[6.144175e-06, 9.999939e-01]]))\n", "assert_array_almost_equal(nn_forward_propagation(test_data_input,test_weight_matrix)[3],\\\n", " np.array([[1.910069, 3.035972],[1.998323, 3.000671],[1.999969, 3.000012]]))\n", "assert_array_almost_equal(nn_forward_propagation(test_data_input,test_weight_matrix)[4],\\\n", " np.array([[0.244918, 0.755082], [0.26848 , 0.73152 ], [0.268933, 0.731067]]))" ] }, { "cell_type": "markdown", "id": "9414cbff", "metadata": {}, "source": [ "2. Implement the function $\\mathtt{nn\\_back\\_propagation}$ that takes $3$ arguments:\n", "- _data_input_ - input data for the first layer ($\\mathbf{X}^{(0)}$);\n", "- _weight_matrix_ - is the list of weight matrices as described above;\n", "- _one_hot_vector_encoding_ - one hot vector encoding of training data samples.\n", "\n", "Your function should output a gradient $\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d}\\mathbf{W}}$ as per the above formulas. it should have the same shape as _weight_matrix_. You should first evaluate a state of the neural network by following forward propagation method, and then evaluate corresponding derivatives by using the above formulas." ] }, { "cell_type": "code", "execution_count": 47, "id": "f8339df6", "metadata": {}, "outputs": [], "source": [ "def nn_back_propagation(data_input, weight_matrix, one_hot_vector_encodings): \n", " nn_state = nn_forward_propagation(data_input, weight_matrix)\n", " no_of_layers = weight_matrix.shape[0]\n", " full_X_gradient = np.empty(no_of_layers, dtype = np.ndarray)\n", " full_W_gradient = np.empty(no_of_layers, dtype = np.ndarray)\n", " full_Z_gradient = np.empty(no_of_layers, dtype = np.ndarray)\n", " \n", " full_Z_gradient[no_of_layers-1] = nn_state[-1] - one_hot_vector_encodings\n", " for layer in range(no_of_layers,0,-1):\n", " full_W_gradient[layer-1] = linear_regression_data(nn_state[2*layer-2]).T@full_Z_gradient[layer-1]\n", " full_X_gradient[layer-1] = full_Z_gradient[layer-1] @ (np.delete(weight_matrix[layer-1],(0), axis = 0)).T\n", " if layer != 1:\n", " product_matrix = full_X_gradient[layer-1]*nn_state[2*layer-2]\n", " sigma_matrix = np.sum(product_matrix,axis=1)\n", " full_Z_gradient[layer-2] = product_matrix - (nn_state[2*layer-2].T*sigma_matrix).T\n", " \n", " return full_W_gradient" ] }, { "cell_type": "markdown", "id": "23867b33", "metadata": {}, "source": [ "Test your function with the following unit test." ] }, { "cell_type": "code", "execution_count": 48, "id": "330b0f75", "metadata": {}, "outputs": [], "source": [ "test_data_input = np.array([[1,2],[3,4],[5,6]])\n", "test_weight_matrix = np.array([[[1,2],[3,4],[5,6]], [[-1,2],[-2,3],[3,1]]])\n", "assert_array_almost_equal(nn_back_propagation(test_data_input, test_weight_matrix, np.array([[0,1],[1,0],[0,1]]))[0],\\\n", " np.array([[-0.028576, 0.028576], [-0.025189, 0.025189], [-0.053766, 0.053766]]))\n", "assert_array_almost_equal(nn_back_propagation(test_data_input, test_weight_matrix, np.array([[0,1],[1,0],[0,1]]))[1],\\\n", " np.array([[-0.217669, 0.217669], [ 0.004161, -0.004161], [-0.22183 , 0.22183 ]]))" ] }, { "cell_type": "markdown", "id": "16bece8c", "metadata": {}, "source": [ "3. Write two functions \n", "- $\\mathtt{nn\\_cost\\_function}$ which evaluates a cost function by following the above formula. This function should take $3$ arguments: _data_input_, _weight_matrix_, _one_hot_vector_encodings_ and return a value of cost function. \n", "- $\\mathtt{nn\\_prediction\\_function}$ which evaluates class labels for data samples. This function should take $2$ arguments: _data_input_, _weight_matrix_ and return a vector of class labels." ] }, { "cell_type": "code", "execution_count": 49, "id": "8ea8e6f0", "metadata": {}, "outputs": [], "source": [ "def nn_cost_function(data_input, weight_matrix, one_hot_vector_encodings): \n", " nn_state = nn_forward_propagation(data_input, weight_matrix)\n", " return np.sum(np.log(np.sum(np.exp(nn_state[-2]), axis=1))- np.sum(one_hot_vector_encodings*nn_state[-2], axis=1))" ] }, { "cell_type": "code", "execution_count": 50, "id": "56db5292", "metadata": {}, "outputs": [], "source": [ "def nn_prediction_function(data_input, weight_matrix): \n", " nn_state = nn_forward_propagation(data_input, weight_matrix)\n", " return np.argmax(nn_state[-1], axis=1)" ] }, { "cell_type": "markdown", "id": "fbf3229e", "metadata": {}, "source": [ "4. Finally, we would like to apply all the above to data. I.e.,\n", "- define two lambda functions called $\\mathtt{nn\\_objective}$, $\\mathtt{nn\\_gradient}$ that take $1$ argument _weight_;\n", "- define nn_initial_weights_ to be a vector of zero matrices of appropriate sizes;\n", "- let the step size parameter _nn_step_size_ be equal to $3.9/\\left\\|\\mathbf{X}\\right\\|^2$;\n", "- run 10,000 iterations of the gradient descent procedure to find an optimal weights nn_optimal_weights_;\n", "- evaluate an accuracy rate nn_accuracy_rate_ evaluated for the model defined to have nn_optimal_weights_ weights.\n" ] }, { "cell_type": "code", "execution_count": 55, "id": "d4dda842", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iteration 1000/10000, objective = 1856.2330085332792.\n", "Iteration 2000/10000, objective = 1520.4452454103941.\n", "Iteration 3000/10000, objective = 1499.9629638478948.\n", "Iteration 4000/10000, objective = 1494.5719839263595.\n", "Iteration 5000/10000, objective = 1491.7019577532692.\n", "Iteration 6000/10000, objective = 1489.8868257828972.\n", "Iteration 7000/10000, objective = 1488.6348941628878.\n", "Iteration 8000/10000, objective = 1487.7227615588802.\n", "Iteration 9000/10000, objective = 1487.0321824162013.\n", "Iteration 10000/10000, objective = 1486.4941591560323.\n", "Iteration completed after 10000/10000, objective = 1486.4941591560323.\n", "The two layered neural network successfully classified 61.14 % of data\n" ] } ], "source": [ "complete_data = np.genfromtxt(\"dataset.csv\", skip_header = 1,delimiter = ',')\n", "labels = complete_data[:, -1].astype(int) - np.min(complete_data[:, -1].astype(int))\n", "inputs = complete_data[:, :-1]\n", "\n", "no_of_layers = 2\n", "data_inputs, row_of_means, row_of_stds = standardise(inputs)\n", "data_matrix = linear_regression_data(data_inputs)\n", "\n", "OHV = one_hot_vector_encoding(labels)\n", "\n", "\n", "nn_objective = lambda weight_matrix: nn_cost_function(data_inputs,weight_matrix,OHV)\n", "nn_gradient = lambda weight_matrix: nn_back_propagation(data_inputs,weight_matrix,OHV)\n", "nn_step = 3.9/(np.linalg.norm(data_matrix, 2) ** 2)\n", "nn_initial_weight_matrix = np.empty(no_of_layers, dtype = np.ndarray)\n", "\n", "\n", "for weight_pos in range(no_of_layers - 1):\n", " nn_initial_weight_matrix[weight_pos] = np.zeros((data_inputs.shape[1]+1,data_inputs.shape[1]))\n", "\n", "nn_initial_weight_matrix[no_of_layers - 1] = np.zeros((data_inputs.shape[1]+1,OHV.shape[1]))\n", "nn_optimal_weights, _ = gradient_descent(nn_objective,nn_gradient, \n", " nn_initial_weight_matrix,step_size=nn_step,\n", " no_of_iterations=10000, print_output=1000)\n", "\n", "nn_accuracy_rate = classification_accuracy(nn_prediction_function(data_inputs, nn_optimal_weights),labels,)\n", "print(\"The two layered neural network successfully classified {acc:2.2f} % of data\".format(acc = 100 * nn_accuracy_rate))" ] }, { "cell_type": "code", "execution_count": null, "id": "2f4099cf", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.17" } }, "nbformat": 4, "nbformat_minor": 5 }