{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "f1a05dc7",
   "metadata": {},
   "source": [
    "<table width=100%>\n",
    "    <tr>\n",
    "        <td width = 40% align = \"left\">\n",
    "            <h3> MTH786 Machine Learning with Python</h3>\n",
    "        </td>\n",
    "        <td width = 35%>            \n",
    "        </td>\n",
    "        <td width = 25% align = \"left\">\n",
    "            <h3>Semester A, 2023/2024 </h3>\n",
    "        </td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td width = 40% align = \"left\">\n",
    "            <h3> Coursework 9</h3>\n",
    "        </td>\n",
    "        <td width = 35%>            \n",
    "        </td>\n",
    "        <td width = 25% align = \"left\">\n",
    "            <h3>Dr Nicola Perra </h3>\n",
    "        </td>\n",
    "    </tr>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "49e97a81",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from numpy.testing import assert_array_almost_equal, assert_array_equal\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "10b3ac2c",
   "metadata": {},
   "source": [
    "### Multinomial logistic classification\n",
    "We now move on to multinomial logistic regression for multi-class classfication problems. To this end we will use a synthetic dataset built for these types of taks. The dataset consits on s samples, 10 features and different categories\n",
    "\n",
    "\n",
    "We start by loading data. **Important:** please check that file $\\mathtt{dataset.csv}$ is located in the same folder with your Jupyter notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "70a0d912",
   "metadata": {},
   "outputs": [],
   "source": [
    "complete_data = np.genfromtxt(\"dataset.csv\", skip_header = 1, delimiter = ',')\n",
    "labels = complete_data[:, -1].astype(int) - np.min(complete_data[:, -1].astype(int))\n",
    "inputs = complete_data[:, :-1]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ca482452",
   "metadata": {},
   "source": [
    "1. Implement function $\\mathtt{linear\\_regression\\_data}$ that computes (and outputs) the linear regression data matrix. The function should take the NumPy array _data_inputs_ as argument."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "69bb0b37",
   "metadata": {},
   "outputs": [],
   "source": [
    "def linear_regression_data(data_inputs):\n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b61222da",
   "metadata": {},
   "source": [
    "2. Write two functions \n",
    "- $\\mathtt{standardise}$ to standardise the columns of a multi-dimensional array. The function $\\mathtt{standardise}$\ttakes the multi-dimensional array _data_matrix_ as its input argument. It subtracts the means from each column and divides by the standard deviations. It returns the _standardised_matrix_, the _row_of_means_ and the _row_of_standard_deviations_.\n",
    "- $\\mathtt{de\\_standardise}$ to de-standardise the columns of a multi-dimensional array. The function $\\mathtt{de\\_standardise}$ reverses the above operation. It takes a _standardised_matrix_, the _row_of_means_ and the _row_of_standard_deviations_ as its arguments and returns a matrix for which the standardisation process is reversed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e4ddbbe3",
   "metadata": {},
   "outputs": [],
   "source": [
    "def standardise(data_matrix):\n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "27125f41",
   "metadata": {},
   "outputs": [],
   "source": [
    "def de_standardise(standardised_matrix, row_of_means, row_of_stds):\n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e766a4e",
   "metadata": {},
   "source": [
    "Now we standardise the input data and prepare data matrix for a multinomial logistic regression. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cdda6017",
   "metadata": {},
   "outputs": [],
   "source": [
    "data_inputs, row_of_means, row_of_stds = standardise(inputs)\n",
    "data_matrix = linear_regression_data(data_inputs)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4252f7a9",
   "metadata": {},
   "source": [
    "3. As a first step, implement the softmax function $\\mathtt{softmax\\_function}$ as defined in the lectures. The function takes the NumPy array _argument_ as its main argument, but also has an optional _axis_ argument to determine across which array-dimension you apply the softmax operation (axis = 0 for columns, axis = 1 for rows, ...). If this argument is not specified (or set to _None_), then the softmax operation is applied to the entire array. Make sure your function works at least for NumPy arrays _argument_ with arbitrary numerical values and dimension one or two."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fdce2591",
   "metadata": {},
   "outputs": [],
   "source": [
    "def softmax_function(argument, axis=None):    \n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "71213ee4",
   "metadata": {},
   "source": [
    "Test your $\\mathtt{softmax\\_function}$ function with the following cell. Passing the test is awarded with **1 mark**. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6711c0a7",
   "metadata": {},
   "outputs": [],
   "source": [
    "assert_array_almost_equal(softmax_function(np.array([[1.5], [0.3], [-3.7]])), \\\n",
    "                          np.array([[0.76528029], [0.23049799], [0.00422172]]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a0bfa8a5",
   "metadata": {},
   "outputs": [],
   "source": [
    "assert_array_almost_equal(softmax_function(np.array([[1.5, 3], [0.3, -0.7], [-3.7, 2]]), axis=0), \\\n",
    "                          np.array([[0.76528029, 0.71807976], [0.23049799, 0.01775346], \\\n",
    "                                    [0.00422172, 0.26416678]]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5a23453a",
   "metadata": {},
   "outputs": [],
   "source": [
    "assert_array_almost_equal(softmax_function(np.array([[1.5, 3], [0.3, -0.7], [-3.7, 2]]), axis=1), \\\n",
    "                          np.array([[0.182426, 0.817574],\\\n",
    "                                    [0.731059, 0.268941],\\\n",
    "                                    [0.003335, 0.996665]]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fb2ce94b",
   "metadata": {},
   "source": [
    "4. Implement a function $\\mathtt{model\\_function}$ that outputs the values of the linear model function. Unlike the case of binary classification, the $\\mathtt{model\\_function}$ should return a vector of dimension $K$ for each data sample. This vector is defined as\n",
    "$$\n",
    "f\\left(\\mathbf{x},\\mathbf{W}\\right) = \n",
    "\\left( \\left\\langle \\phi\\left(\\mathbf{x}\\right), \\mathbf{w}^{(1)}\\right\\rangle \\quad \\left\\langle \\phi\\left(\\mathbf{x}\\right), \\mathbf{w}^{(2)}\\right\\rangle \\quad \\ldots \\quad \\left\\langle \\phi\\left(\\mathbf{x}\\right), \\mathbf{w}^{(K)}\\right\\rangle\\right),\n",
    "$$\n",
    "where \n",
    "$$\n",
    "\\mathbf{W} = \n",
    "\\begin{pmatrix}\n",
    "\\vdots & \\vdots & \\ldots & \\vdots \\\\\n",
    "\\mathbf{w}^{(1)} & \\mathbf{w}^{(2)} & \\ldots & \\mathbf{w}^{(K)}\\\\\n",
    "\\vdots & \\vdots & \\ldots & \\vdots \\\\\n",
    "\\end{pmatrix}\n",
    "$$\n",
    "is a mathematical representation of the weights matrix.\n",
    "\n",
    "As in the binary classification case, the arguments of $\\mathtt{model\\_function}$ are the data matrix _data_matrix_ and weights that are now named _weights_matrix_."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a86c8d0f",
   "metadata": {},
   "outputs": [],
   "source": [
    "def model_function(data_matrix, weight_matrix):\n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a75408cf",
   "metadata": {},
   "source": [
    "Test your $\\mathtt{model\\_function}$ function with the following cell. Passing the test is awarded with **1 mark**. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ad3927ee",
   "metadata": {},
   "outputs": [],
   "source": [
    "test_data_matrix = np.array([[1,2,3],[1,4,5],[1,6,7]])\n",
    "test_weight_matrix = np.array([[-1,1],[0,1],[1,0]])\n",
    "assert_array_almost_equal(model_function(test_data_matrix, test_weight_matrix),np.array([[2,3],[4,5],[6,7]]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4394f672",
   "metadata": {},
   "source": [
    "5. Write a function $\\mathtt{multinomial\\_prediction\\_function}$ that turns your model function into labels. The function takes the arguments _data_matrix_ and _weight_matrix_ as inputs and returns a vector of labels with values in $\\{0, K - 1 \\}$ as its output."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "439c4b9e",
   "metadata": {},
   "outputs": [],
   "source": [
    "def multinomial_prediction_function(data_matrix, weight_matrix):    \n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "93f53902",
   "metadata": {},
   "source": [
    "Test your $\\mathtt{multinomial\\_prediction\\_function}$ function with the following cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5966c24c",
   "metadata": {},
   "outputs": [],
   "source": [
    "test_data_matrix = np.array([[6, 4, 5], [1, 2, 8], [-3, 3, 6], [6, 5, -100], [5, 7, 2]])\n",
    "test_weight_matrix = np.array([[2,  1, -2, -4], [ 2, -5,  1,  4], [-2, -3, -1, -2]])\n",
    "assert_array_almost_equal(multinomial_prediction_function(test_data_matrix, test_weight_matrix),np.array([0,2,3,1,0]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b7a7de52",
   "metadata": {},
   "source": [
    "6. Implement a function $\\mathtt{gradient\\_descent}$ that performs gradient descent to numerically approximate a minimiser of a convex function. The function should take the following arguments\n",
    "- *objective* - a lambda-function representing function $E$. This itself should take a NumPy array as its argument and return a real number.\n",
    "- *gradient* - a lambda-function representing function $\\nabla E$. This itself should take a NumPy array as its argument and return a NumPy array representation of the gradient $\\nabla E$.\n",
    "- *initial_ weights* - a NumPy array with initial values $\\mathbf{w}^{(0)}$ for the first iterate \n",
    "- *step_size* - a step-size parameter $\\tau$ for the gradient descent step\n",
    "- *no_of_iterations* - an integer parameter that controls the number of iterations\n",
    "- *print_output* - an integer parameter that controls how often you are printing an intermediate result. If say *print_output = 100*, then after every 100th iteration you are asked to print your current iterate and a value of the objective as *Iteration k/m, objective = o.*, where $k$ is a number of current iteration, $m$ is a total number of iterations, and $o$ is a value of the objective evaluated at current iterate.\n",
    "\n",
    "Implement the function so that it returns a NumPy array of the weights obtained after gradient descent together with a list of objective values for all iterates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "55f86058",
   "metadata": {},
   "outputs": [],
   "source": [
    "def gradient_descent(objective,\n",
    "                     gradient,\n",
    "                     initial_weights,\n",
    "                     step_size=1,\n",
    "                     no_of_iterations=100,\n",
    "                     print_output=10):\n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "884d213a",
   "metadata": {},
   "source": [
    "Test your function with the following unit test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cd016e65",
   "metadata": {},
   "outputs": [],
   "source": [
    "test_matrix_m = np.array([[3, 1], [2, 4]])\n",
    "test_vector_v = np.array([5, 6])\n",
    "test_objective = lambda x: x.T @ (test_matrix_m @ x) + x @ test_vector_v\n",
    "test_gradient = lambda x: (test_matrix_m + test_matrix_m.T) @ x + test_vector_v\n",
    "test_initial_weights = np.array([0.0, 0.0])\n",
    "test_step_size = 0.9 / (np.linalg.norm(test_matrix_m + test_matrix_m.T))\n",
    "test_no_of_iterations = 100\n",
    "test_print_output = 10\n",
    "assert_array_almost_equal(gradient_descent(test_objective, test_gradient, \\\n",
    "                                           test_initial_weights,test_step_size,\\\n",
    "                                           test_no_of_iterations, test_print_output)[0],np.array([-0.564103, -0.538462]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ce725032",
   "metadata": {},
   "source": [
    "7. Next, write a function $\\mathtt{one\\_hot\\_vector\\_encoding}$ that converts an NumPy array _labels_ with values in the range of $\\{0, K - 1\\}$ into so-called one-hot vector encodings. For example, for $K = 3$ and a label vector $\\text{labels} = \n",
    "\\begin{pmatrix} 2 & 0 & 1 & 2\\end{pmatrix}^\\top$, the output of $\\mathtt{one\\_hot\\_vector\\_encoding}$ should be a two-dimensional NumPy array of the form\n",
    "$$\n",
    "\\begin{pmatrix} 0 & 0 & 1 \\\\ 1 & 0 & 0 \\\\ 0 & 1 & 0 \\\\ 0 & 0 & 1 \n",
    "\\end{pmatrix}.\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "706144e7",
   "metadata": {},
   "outputs": [],
   "source": [
    "def one_hot_vector_encoding(labels):    \n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "092da0fe",
   "metadata": {},
   "source": [
    "Test your $\\mathtt{one\\_hot\\_vector\\_encoding}$ function with the following cell"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "158f6038",
   "metadata": {},
   "outputs": [],
   "source": [
    "assert_array_almost_equal(one_hot_vector_encoding(np.array([1, 2, 0, 3])), \\\n",
    "                          np.array([[0,1,0,0],[0,0,1,0],[1,0,0,0],[0,0,0,1]]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "79b88762",
   "metadata": {},
   "outputs": [],
   "source": [
    "assert_array_almost_equal(one_hot_vector_encoding(np.array([1,0,1,0])), \\\n",
    "                          np.array([[0,1],[1,0],[0,1],[1,0]]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8758fb30",
   "metadata": {},
   "source": [
    "8. Implement the cost function and gradient for the multinomial logistic regression in terms of two functions $\\mathtt{multinomial\\_logistic\\_regression\\_cost\\_function}$ and $\\mathtt{multinomial\\_logistic\\_regression\\_gradient}$. As in the binary classification case, the arguments are the data matrix _data_matrix_ and weights that are now named _weights_matrix_. Instead of passing on labels as _outputs_ as in the binary case, you pass the one hot vector encoding representation _one_hot_vector_encodings_ as your third argument. Return the cost function value, respectively the gradient, following the mathematical formulas in the lecture notes.\n",
    "\n",
    "$$\n",
    "L\\left(\\mathbf{W}\\right) = \\sum\\limits_{i=1}^s\\log\\left[\\sum\\limits_{k=1}^K \\mathrm{e}^{f\\left(\\mathbf{x}^{(i)},\\mathbf{W}\\right)_k}\\right]\n",
    "- \\sum\\limits_{i=1}^s\\sum\\limits_{k=1}^K \\mathbf{1}_{y_i=k}f\\left(\\mathbf{x}^{(i)},\\mathbf{W}\\right)_k,\n",
    "$$\n",
    "where $f\\left(\\mathbf{x}, \\mathbf{W}\\right)$ is a model function. In the case of linear model function \n",
    "$$\n",
    "f\\left(\\mathbf{x},\\mathbf{W}\\right) = \n",
    "\\left( \\left\\langle \\phi\\left(\\mathbf{x}\\right), \\mathbf{w}^{(1)}\\right\\rangle \\quad \\left\\langle \\phi\\left(\\mathbf{x}\\right), \\mathbf{w}^{(2)}\\right\\rangle \\quad \\ldots \\quad \\left\\langle \\phi\\left(\\mathbf{x}\\right), \\mathbf{w}^{(K)}\\right\\rangle\\right)\n",
    "$$\n",
    "one can write\n",
    "$$\n",
    "\\begin{align*}\n",
    "L\\left(\\mathbf{W}\\right) &=& \\sum\\limits_{i=1}^s\\log\\left[\\sum\\limits_{k=1}^K \\mathrm{e}^{\\left\\langle \\phi\\left(\\mathbf{x}^{(i)}\\right), \\mathbf{w}^{(k)}\\right\\rangle}\\right]\n",
    "- \\sum\\limits_{i=1}^s\\sum\\limits_{k=1}^K \\mathbf{1}_{y_i=k}\\left\\langle \\phi\\left(\\mathbf{x}^{(i)}\\right), \\mathbf{w}^{(k)}\\right\\rangle\n",
    "\\\\\n",
    "\\frac{\\partial L\\left(\\mathbf{W}\\right)}{\\partial \\mathbf{w}^{(p)}_q} &=& \\sum\\limits_{i=1}^s \\phi\\left(\\mathbf{x}^{(i)}\\right)_q\\mathrm{softmax}\\left(f\\left(\\mathbf{x}^{(i)}, \\mathbf{W}\\right)\\right)_p\n",
    "- \\sum\\limits_{i=1}^s\\mathbf{1}_{y_i = p} \\phi\\left(\\mathbf{x}^{(i)}\\right)_q.\n",
    "\\end{align*}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aaa86a79",
   "metadata": {},
   "outputs": [],
   "source": [
    "def multinomial_logistic_regression_cost_function(data_matrix, weight_matrix, one_hot_vector_encodings):    \n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c3a35fc6",
   "metadata": {},
   "outputs": [],
   "source": [
    "def multinomial_logistic_regression_gradient(data_matrix, weight_matrix, one_hot_vector_encodings):    \n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1f3abd1d",
   "metadata": {},
   "source": [
    "Test your functions with the following cells"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "72641160",
   "metadata": {},
   "outputs": [],
   "source": [
    "test_data_matrix = np.array([[6, 4, 5], [1, 2, 8], [-3, 3, 6], [6, 5, -100], [5, 7, 2]])\n",
    "test_weight_matrix = np.array([[2,  1, -2, -4], [ 2, -5,  1,  4], [-2, -3, -1, -2]])\n",
    "test_one_hot_vector_encoding = np.array([[1., 0., 0., 0.], [0., 0., 1., 0.], \\\n",
    "                                         [0., 0., 0., 1.], [0., 1., 0., 0.], [1., 0., 0., 0.]])\n",
    "assert_array_almost_equal(multinomial_logistic_regression_cost_function(test_data_matrix, \\\n",
    "                                                                        test_weight_matrix, \\\n",
    "                                                                        test_one_hot_vector_encoding),\\\n",
    "                         0.1430551433917744)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "42807d8b",
   "metadata": {},
   "outputs": [],
   "source": [
    "test_data_matrix = np.array([[6, 4, 5], [1, 2, 8], [-3, 3, 6], [6, 5, -100], [5, 7, 2]])\n",
    "test_weight_matrix = np.array([[2,  1, -2, -4], [ 2, -5,  1,  4], [-2, -3, -1, -2]])\n",
    "test_one_hot_vector_encoding = np.array([[1., 0., 0., 0.], [0., 0., 1., 0.], \\\n",
    "                                         [0., 0., 0., 1.], [0., 1., 0., 0.], [1., 0., 0., 0.]])\n",
    "assert_array_almost_equal(multinomial_logistic_regression_gradient(test_data_matrix, \\\n",
    "                                                                   test_weight_matrix, \\\n",
    "                                                                   test_one_hot_vector_encoding),\\\n",
    "                         np.array([[ 1.173099e-01,  1.203832e-11, -1.335569e-01,  1.624699e-02],\\\n",
    "                                   [ 2.346201e-01,  2.407656e-11, -2.660032e-01,  3.138308e-02],\\\n",
    "                                   [ 9.384832e-01,  9.630610e-11, -1.064753e+00,  1.262698e-01]]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "283f27bf",
   "metadata": {},
   "source": [
    "9. Write a function $\\mathtt{classification\\_accuracy}$ that takes two NumPy array arguments $\\mathtt{predicted\\_labels}$ and $\\mathtt{true\\_labels}$ and evaluates the ratio of labels that coincide in both arrays to the number of all labels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "088df900",
   "metadata": {},
   "outputs": [],
   "source": [
    "def classification_accuracy(estimated_labels, true_labels):\n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "80d71c59",
   "metadata": {},
   "source": [
    "Test your function with the following unit tests."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "08cc51d2",
   "metadata": {},
   "outputs": [],
   "source": [
    "test_estimated_labels = np.array([0,4,0,0,2,4,0,0,2,2,2,3,3,1,3,4,0,3,4,0,1,1,2,0,0,0,\\\n",
    "                                  1,4,4,4,3,0,4,2,4,4,4,2,2,1,4,3,2,2,1,1,4,3,3,0,4,3,\\\n",
    "                                  0,0,0,2,4,3,4,3,1,3,2,4,2,3,2,3,2,3,1,0,4,3,2,3,1,3,\\\n",
    "                                  4,1,3,1,4,0,4,4,1,2,3,1,1,4,3,1,3,0,2,0,0,1])\n",
    "test_true_labels = np.array([4,4,4,4,1,3,0,4,1,0,2,3,1,2,0,1,3,4,3,4,4,4,2,0,3,4,3,2,3,\\\n",
    "                             0,1,0,4,3,2,2,1,2,2,1,4,0,4,4,3,0,0,0,3,2,3,0,0,3,3,4,2,2,\\\n",
    "                             3,4,2,3,3,0,0,2,0,3,0,4,4,4,4,1,2,3,1,3,1,1,2,0,2,3,3,3,4,\\\n",
    "                             0,0,3,1,0,3,0,3,1,0,0,4,4])\n",
    "assert_array_almost_equal(\n",
    "    classification_accuracy(test_estimated_labels, test_true_labels), 0.26)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b66c1a2d",
   "metadata": {},
   "source": [
    "10. Finally, we would like to apply all the above to the data. I.e.,\n",
    "- define two lambda functions called $\\mathtt{objective}$, $\\mathtt{gradient}$ that take $1$ argument _weight_;\n",
    "- define _initial_weights_ to be a zero matrix of appropriate size;\n",
    "- let the step size parameter _step_size_ be equal to $3.9/\\left\\|\\mathbf{X}\\right\\|^2$;\n",
    "- run 10,000 iterations of the gradient descent procedure to find an optimal weights _optimal_weights_;\n",
    "- evaluate an accuracy rate _accuracy_rate_ evaluated for the model defined to have _optimal_weights_ weights."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e6a1a719",
   "metadata": {},
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "45805085",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"The mulnomial logistic regression successfully classified {acc:2.2f} % of data\".format(acc = 100*accuracy_rate))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e3adba86",
   "metadata": {},
   "source": [
    "### Two layers Neural Network for mutliclass classification\n",
    "\n",
    "In the second part of the assignment you are asked to try the same classification task but considering a neural network with $L = 2$ layers. All layers have the same activation function which is given by the $\\mathtt{softmax\\_function}$ and same type of model function given by an affine-linear transformation. Namely, we define\n",
    "$$\n",
    "\\left\\{\n",
    "\\begin{array}{ll}\n",
    "Z^{(\\ell)} &=  \\widetilde{X}^{(\\ell-1)}W^{(\\ell)},\\\\\n",
    "X^{(\\ell)} &=  \\mathrm{softmax}\\left(Z^{(\\ell)}\\right)\n",
    "\\end{array}\n",
    "\\right.,\n",
    "\\label{eq:nn_model}\n",
    "$$\n",
    "for $\\ell=1,\\ldots,L$, where \n",
    "- $X^{(\\ell)}$ is a mathematical representation of data _inputs_ at the layer $\\ell+1$, in particular $X^{(0)}$ is a mathematical representation of an input at the first layer, which just represents data samples;\n",
    "- $Z^{(\\ell)}$ is a mathematical representation of model function values at the layer $\\ell$, in particular $Z^{(L)}$ is a mathematical representation the final output of neural network;\n",
    "- for every matrix $M$ we write $\\widetilde{M}$ to denote an augmented matrix $M$, i.e. the one with artificial column of ones inserted at the beginning of a matrix; \n",
    "- $W^{(\\ell)}$ are weight matrices representing linear transformation applied at the layer $\\ell$ for $\\ell = \\overline{1,L}$ that have dimensions $\\left( d_{\\ell} + 1 \\right)\\times d_{\\ell+1}$ with $d_1 = d$, $d_{L+1} = K$, while $d_2,\\ldots,d_L$ can take any integer values. In this exercise we would assume $d_2 = \\ldots = d_L = d$.\n",
    "\n",
    "By training a neural network we understand identifying optimal values of parameters $\\mathbf{W} = \\left(\\mathbf{W}^{(1)}, \\mathbf{W}^{(2)}, \\ldots, \\mathbf{W}^{(L)}\\right)$ such that the cost function\n",
    "$$\n",
    "L\\left(\\mathbf{W}\\right) = \n",
    "\\sum\\limits_{i=1}^s \\log\\left(\\sum\\limits_{j=1}^K\n",
    "\\mathrm{e}^{Z^{(L)}_{i,j}}\\right)\n",
    "-\n",
    "\\sum\\limits_{i=1}^s\\sum\\limits_{j=1}^K\n",
    "\\mathbf{1}_{y_i = j} Z^{(L)}_{i,j}.\n",
    "$$\n",
    "is minimised.\n",
    "\n",
    "The optimisation problem is set to be solved by using the gradient descent method. This involves a backpropagation approach to a calculation of the gradient $\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d}\\mathbf{W}}$ (see lecture notes). To help you with the problem, we provide you with a full set of equations you need to use to solve the problem. Every derivative here is thought as a gradient and is written in the form of a corresponding size matrix. For example,\n",
    "the derivative $\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d}W^{(1)}}$ should be thought as a gradient of the cost function $L\\left(\\mathbf{W}\\right)$ with respect to the arguments $W^{(1)}_{i,j}$ that is written as a matrix of size $\\left(d+1\\right)\\times d$. In what follows, we denote a matrix $M$ with the first row removed as $\\widehat{M}$ and also write $\\mathrm{OHV}$ for a one hot vector representation of input data.\n",
    "$$\n",
    "\\boxed{\n",
    "\\begin{array}{rcl}\n",
    "\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d} Z^{(\\ell)}}&=& \n",
    "\\begin{cases}\n",
    "\\mathrm{softmax}\\left(Z^{(L)}\\right) - \n",
    "\\mathrm{OHV},& \\ell = L,\\\\\n",
    "X^{(\\ell)}\\odot\\left[\n",
    "\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d} X^{(\\ell)}}\n",
    "- \\Sigma\\left(X^{(\\ell)}\\odot\n",
    "\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d} X^{(\\ell)}} \\right)\n",
    "\\right]\n",
    ",& \\ell < L\n",
    "\\end{cases}\n",
    "\\\\\n",
    "\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d} X^{(\\ell)}} &=& \\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d} Z^{(\\ell+1)}}\\cdot \\left(\\widehat{W}^{(l+1)}\\right)^{\\top}\n",
    "\\\\\n",
    "\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d} W^{(\\ell)}} &=& \n",
    "\\left(\\widetilde{X}^{(\\ell-1)}\\right)^{\\top}\\cdot\n",
    "\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d} Z^{(\\ell)}}\n",
    "\\end{array}\n",
    "},\n",
    "$$\n",
    "where for any matrix $M$ we define $\\Sigma\\left(M\\right)$ as a matrix of the same dimensions as $M$ but with every matrix element swapped with a row-sum of matrix elements of matrix $M$.\n",
    "\n",
    "\n",
    "In this section we will assume that the weights are now represented by a three dimensional NumPy array, which has a form of a row vector, where each element is a two dimensional NumPy array. I.e., weights have a form of\n",
    "\n",
    "$$\n",
    "\\mathbf{W} = \\left[\\mathbf{W}^{(1)}, \\mathbf{W}^{(2)}, \\ldots, \\mathbf{W}^{(L)}\\right],\n",
    "$$\n",
    "and $\\mathit{weights}[i,j,k] = \\mathbf{W}^{(i)}_{j,k}$. \n",
    "\n",
    "The neural network state is now described by $2\\cdot L + 1$ matrices. This is given by\n",
    "$$\n",
    "\\mathit{state} = \\left[\\mathbf{X}^{(0)}, \\mathbf{Z}^{(1)}, \\mathbf{X}^{(1)}, \\mathbf{Z}^{(2)}, \\mathbf{X}^{(2)}, \\ldots, \\mathbf{Z}^{(L)}, \\mathbf{X}^{(L)}\\right].\n",
    "$$\n",
    "\n",
    "\n",
    "Ideally, the code you will write here should work for any number of layers, despite your task being only to model $2$ layered network. Let us introduce a variable that stores the number of layers, which you can later change to see how this will affect the output."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "02def757",
   "metadata": {},
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8966f2aa",
   "metadata": {},
   "source": [
    "1. Start by writing the $\\mathtt{nn\\_forward\\_propagation}$ function that takes two arguments\n",
    "- _data_input_ - input data for the first layer ($\\mathbf{X}^{(0)}$);\n",
    "- _weight_matrix_ - is the list of weight matrices as described above.\n",
    "\n",
    "The function should return you the state of neural network following the above formulas. The output should have a form\n",
    "$$\n",
    "\\left[\\mathbf{X}^{(0)}, \\mathbf{Z}^{(1)}, \\mathbf{X}^{(1)}, \\mathbf{Z}^{(2)}, \\mathbf{X}^{(2)}, \\ldots, \\mathbf{Z}^{(L)}, \\mathbf{X}^{(L)}\\right].\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8d6aad82",
   "metadata": {},
   "outputs": [],
   "source": [
    "def nn_forward_propagation(data_input, weight_matrix):    \n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7243d27b",
   "metadata": {},
   "source": [
    "Test your function with the following unit test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "831c84c1",
   "metadata": {},
   "outputs": [],
   "source": [
    "test_data_input = np.array([[1,2],[3,4],[5,6]])\n",
    "test_weight_matrix = np.array([[[1,2],[3,4],[5,6]], [[-1,2],[-2,3],[3,1]]])\n",
    "assert_array_almost_equal(nn_forward_propagation(test_data_input,test_weight_matrix)[0],np.array([[1,2],[3,4],[5,6]]))\n",
    "assert_array_almost_equal(nn_forward_propagation(test_data_input,test_weight_matrix)[1],np.array([[14,18],[30,38],[46,58]]))\n",
    "assert_array_almost_equal(nn_forward_propagation(test_data_input,test_weight_matrix)[2],\\\n",
    "                          np.array([[1.798621e-02, 9.820138e-01],[3.353501e-04, 9.996646e-01],[6.144175e-06, 9.999939e-01]]))\n",
    "assert_array_almost_equal(nn_forward_propagation(test_data_input,test_weight_matrix)[3],\\\n",
    "                          np.array([[1.910069, 3.035972],[1.998323, 3.000671],[1.999969, 3.000012]]))\n",
    "assert_array_almost_equal(nn_forward_propagation(test_data_input,test_weight_matrix)[4],\\\n",
    "                          np.array([[0.244918, 0.755082], [0.26848 , 0.73152 ], [0.268933, 0.731067]]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9414cbff",
   "metadata": {},
   "source": [
    "2. Implement the function $\\mathtt{nn\\_back\\_propagation}$ that takes $3$ arguments:\n",
    "- _data_input_ - input data for the first layer ($\\mathbf{X}^{(0)}$);\n",
    "- _weight_matrix_ - is the list of weight matrices as described above;\n",
    "- _one_hot_vector_encoding_ - one hot vector encoding of training data samples.\n",
    "\n",
    "Your function should output a gradient $\\frac{\\mathrm{d}L\\left(\\mathbf{W}\\right)}{\\mathrm{d}\\mathbf{W}}$ as per the above formulas. it should have the same shape as _weight_matrix_. You should first evaluate a state of the neural network by following forward propagation method, and then evaluate corresponding derivatives by using the above formulas."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f8339df6",
   "metadata": {},
   "outputs": [],
   "source": [
    "def nn_back_propagation(data_input, weight_matrix, one_hot_vector_encodings):    \n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23867b33",
   "metadata": {},
   "source": [
    "Test your function with the following unit test."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "330b0f75",
   "metadata": {},
   "outputs": [],
   "source": [
    "test_data_input = np.array([[1,2],[3,4],[5,6]])\n",
    "test_weight_matrix = np.array([[[1,2],[3,4],[5,6]], [[-1,2],[-2,3],[3,1]]])\n",
    "assert_array_almost_equal(nn_back_propagation(test_data_input, test_weight_matrix, np.array([[0,1],[1,0],[0,1]]))[0],\\\n",
    "                         np.array([[-0.028576,  0.028576], [-0.025189,  0.025189], [-0.053766,  0.053766]]))\n",
    "assert_array_almost_equal(nn_back_propagation(test_data_input, test_weight_matrix, np.array([[0,1],[1,0],[0,1]]))[1],\\\n",
    "                         np.array([[-0.217669,  0.217669], [ 0.004161, -0.004161], [-0.22183 ,  0.22183 ]]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16bece8c",
   "metadata": {},
   "source": [
    "3. Write two functions \n",
    "- $\\mathtt{nn\\_cost\\_function}$ which evaluates a cost function by following the above formula. This function should take $3$ arguments: _data_input_, _weight_matrix_, _one_hot_vector_encodings_ and return a value of cost function. \n",
    "- $\\mathtt{nn\\_prediction\\_function}$ which evaluates class labels for data samples. This function should take $2$ arguments: _data_input_, _weight_matrix_ and return a vector of class labels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8ea8e6f0",
   "metadata": {},
   "outputs": [],
   "source": [
    "def nn_cost_function(data_input, weight_matrix, one_hot_vector_encodings):    \n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "56db5292",
   "metadata": {},
   "outputs": [],
   "source": [
    "def nn_prediction_function(data_input, weight_matrix):    \n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fbf3229e",
   "metadata": {},
   "source": [
    "4. Finally, we would like to apply all the above to data. I.e.,\n",
    "- define two lambda functions called $\\mathtt{nn\\_objective}$, $\\mathtt{nn\\_gradient}$ that take $1$ argument _weight_;\n",
    "- define nn_initial_weights_ to be a vector of zero matrices of appropriate sizes;\n",
    "- let the step size parameter _nn_step_size_ be equal to $3.9/\\left\\|\\mathbf{X}\\right\\|^2$;\n",
    "- run 10,000 iterations of the gradient descent procedure to find an optimal weights nn_optimal_weights_;\n",
    "- evaluate an accuracy rate nn_accuracy_rate_ evaluated for the model defined to have nn_optimal_weights_ weights."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d4dda842",
   "metadata": {},
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "23255c3c",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"The two layered neural network successfully classified {acc:2.2f} % of data\".format(acc = 100 * nn_accuracy_rate))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.17"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}