{ "cells": [ { "cell_type": "markdown", "id": "03946d3c", "metadata": {}, "source": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
\n", "

MTH786 Machine Learning with Python

\n", "
\n", " \n", "

Semester A, 2023/2024

\n", "
\n", "

Coursework 8

\n", "
\n", " \n", "

Dr Nicola Perra

\n", "
" ] }, { "cell_type": "markdown", "id": "208002af", "metadata": {}, "source": [ "We start by loading necessary libraries, including NumPy (used for linear algebra calculations) and MatPlotLib (used for visualisation)." ] }, { "cell_type": "code", "execution_count": null, "id": "16d0a39c", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from numpy.testing import assert_array_almost_equal, assert_array_equal\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "id": "ec7a193b", "metadata": {}, "source": [ "### Regression analysis for height-weight data via gradient descent method\n", "In this exercise you will work with a data set you are familiar with: $\\mathtt{heigh\\_weight.csv}$. By completing this exercise you will learn how to solve linear/polynomial regression problems using the gradient descent method. In the second part we will learn ho to solve the LASSO problem using either smoothing technique or proximal maps.\n", "\n", "We start by loading data. **Important:** please check that the file $\\mathtt{height\\_weight.csv}$ is located in the same folder with your Jupyter notebook.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "c09eb8a4", "metadata": {}, "outputs": [], "source": [ "heights = np.genfromtxt(\"height_weight.csv\",\n", " delimiter=\",\",\n", " skip_header=1,\n", " usecols=[0])\n", "\n", "weights = np.genfromtxt(\"height_weight.csv\",\n", " delimiter=\",\",\n", " skip_header=1,\n", " usecols=[1])\n", "indices = np.argsort(heights)\n", "heights = heights[indices].reshape((len(heights), 1))\n", "weights = weights[indices].reshape((len(weights), 1))\n", "\n", "plt.scatter(heights, weights, s=1)\n", "plt.xlabel('Height', fontsize=16)\n", "plt.xticks(fontsize=16)\n", "plt.ylabel('Weight', fontsize=16)\n", "plt.yticks(fontsize=16)\n", "plt.tight_layout;" ] }, { "cell_type": "markdown", "id": "f40c6523", "metadata": {}, "source": [ "#### Data standardisation \n", "In real-world problems we usually get a raw data in the form of $s$ samples each of which is described by numeric several values corresponding to different characteristics of the object. Such a data could be highly non-uniform. The goal of applying $\\textit{standardisation}$ is to make sure different features of objects are on almost on the same scale so that each feature is equally important and make it easier to process by most ML algorithms. The result of standardisation is that the features will be rescaled to ensure the mean and the standard deviation to be $0$ and $1$, respectively. This means that for a data given by $\\mathbf{X} = \\left(\n", "\\left(\\mathbf{x}^{(1)}\\right)^{\\top},\\left(\\mathbf{x}^{(2)}\\right)^{\\top},\\ldots,\\left(\\mathbf{x}^{(s)}\\right)^{\\top}\n", "\\right) \\in \\mathbb{R}^{s\\times d}$ we define a new, rescaled data as:\n", "$$\n", "\\hat{\\mathbf{x}}^{(i)}_k = \\frac{\\mathbf{x}^{(i)}_k - \\left\\langle \\mathbf{x}_k \\right\\rangle }{\\left(\\sigma_{\\mathbf{x}}\\right)_k},\n", "$$\n", "where $\\left\\langle \\mathbf{x}_k \\right\\rangle = \\frac{1}{s}\\sum\\limits_{j=1}^s \\mathbf{x}^{(j)}_k$, and\n", "$\\left(\\sigma_\\mathbf{x}\\right)_k = \\sqrt{\n", "\t\\frac{1}{s}\\sum\\limits_{j=1}^s \\left(\\mathbf{x}^{(j)}_k-\\left\\langle \\mathbf{x}_k \\right\\rangle\\right)^2}$\n", "are the mean and standard deviation of data vector $\\mathbf{x}$. \n", "\n", "Write two functions \n", "1. **standardise** to standardise the columns of a multi-dimensional array. The function **standardise**\ttakes the multi-dimensional array *data_matrix* as its input argument. It subtracts the means from each column and divides by the standard deviations. It returns the *standardised_matrix*, the *row_of_means* and the *row_of_standard_deviations*.\n", "2. **de_standardise** to de-standardise the columns of a multi-dimensional array. The function **de_standardise** reverses the above operation. It takes a *standardised_matrix*, the *row_of_means* and the *row_of_standard_deviations* as its arguments and returns a matrix for which the standardisation process is reversed.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "3b2f941b", "metadata": {}, "outputs": [], "source": [ "def standardise(data_matrix):\n", " row_of_means = np.mean(data_matrix, axis=0)\n", " standardised_matrix = data_matrix - row_of_means\n", " row_of_stds = np.std(data_matrix, axis=0)\n", " return (standardised_matrix / row_of_stds), row_of_means, row_of_stds" ] }, { "cell_type": "markdown", "id": "1b08e7eb", "metadata": {}, "source": [ "Test your function with the following unit tests" ] }, { "cell_type": "code", "execution_count": null, "id": "2e69ba16", "metadata": {}, "outputs": [], "source": [ "test_data_matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n", "test_standardise_data_matrix = np.array([[-1.224745, -1.224745, -1.224745],\n", " [0., 0., 0.],\n", " [1.224745, 1.224745, 1.224745]])\n", "test_row_of_means = np.array([4, 5, 6])\n", "test_row_of_stds = np.array(np.sqrt([6, 6, 6]))\n", "\n", "test_result_standardise_data_matrix, test_result_row_of_means, test_result_row_of_stds = standardise(\n", " test_data_matrix)\n", "assert_array_almost_equal(test_result_standardise_data_matrix,\n", " test_standardise_data_matrix)\n", "assert_array_almost_equal(test_result_row_of_means, test_row_of_means)\n", "assert_array_almost_equal(test_result_row_of_stds, test_row_of_stds)" ] }, { "cell_type": "code", "execution_count": null, "id": "6483763d", "metadata": {}, "outputs": [], "source": [ "def de_standardise(standardised_matrix, row_of_means, row_of_stds):\n", " matrix = np.copy(standardised_matrix * row_of_stds)\n", " return matrix + row_of_means" ] }, { "cell_type": "markdown", "id": "3214eb52", "metadata": {}, "source": [ "Test your function with the following unit tests" ] }, { "cell_type": "code", "execution_count": null, "id": "09520e80", "metadata": {}, "outputs": [], "source": [ "test_standardise_data_matrix = np.array([[-1.224745, -1.224745, -1.224745],\n", " [0., 0., 0.],\n", " [1.224745, 1.224745, 1.224745]])\n", "test_row_of_means = np.array([4, 5, 6])\n", "test_row_of_stds = np.array(np.sqrt([6, 6, 6]))\n", "\n", "assert_array_almost_equal(\n", " np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),\n", " de_standardise(test_standardise_data_matrix, test_row_of_means,\n", " test_row_of_stds))" ] }, { "cell_type": "code", "execution_count": null, "id": "25d682a9", "metadata": {}, "outputs": [], "source": [ "# let's standardize the data input and output\n", "data_input, input_row_of_means, input_row_of_stds = standardise(heights)\n", "data_output, output_row_of_means, output_row_of_stds = standardise(weights)" ] }, { "cell_type": "markdown", "id": "f26577f9", "metadata": {}, "source": [ "#### Building linear/polynomial regressions\n", "In this part you are given a piece of the code from previous assignments that evaluates coefficients of polynomial regression of fixed degree." ] }, { "cell_type": "code", "execution_count": null, "id": "cbb40f7a", "metadata": {}, "outputs": [], "source": [ "def polynomial_basis(data_inputs, degree):\n", " X_matrix = np.ones((len(data_inputs), 1))\n", " for i in range(degree):\n", " X_matrix = np.c_[X_matrix, np.power(data_inputs, i + 1)]\n", " return X_matrix\n", "\n", "\n", "def polynomial_regression(data_matrix, data_outputs):\n", " return np.linalg.solve(data_matrix.T @ data_matrix,\n", " data_matrix.T @ data_outputs)\n", "\n", "\n", "def prediction_function(data_matrix, weights):\n", " return data_matrix @ weights" ] }, { "cell_type": "markdown", "id": "8bb17b60", "metadata": {}, "source": [ "Now use the above functions to evaluate optimal weights in two cases: $d=1$ (linear regression) and $d=5$ (polynomial regression). Plot corresponding results." ] }, { "cell_type": "code", "execution_count": null, "id": "43194ef2", "metadata": {}, "outputs": [], "source": [ "degree_linear = 1\n", "degree_polynomial = 5\n", "data_matrix_linear = polynomial_basis(data_input, degree_linear)\n", "data_matrix_polynomial = polynomial_basis(data_input, degree_polynomial)\n", "optimal_weights_linear = polynomial_regression(data_matrix_linear, data_output)\n", "optimal_weights_polynomial = polynomial_regression(data_matrix_polynomial,\n", " data_output)\n", "\n", "print(\"Optimal linear regression coefficients are equal to: {w}. \\\n", " \\nOptimal polynomial regression (d = {d}) coefficients are equal to: {p}\" .\\\n", " format(w = optimal_weights_linear.T, d = degree_polynomial, p = optimal_weights_polynomial.T))" ] }, { "cell_type": "markdown", "id": "e78ca0a2", "metadata": {}, "source": [ "**Remark:** please note that an optimal polynomial of degree $5$ is almost equal to a linear function. We will see later that this is not the case for corresponding LASSO problem." ] }, { "cell_type": "code", "execution_count": null, "id": "4e8fe202", "metadata": {}, "outputs": [], "source": [ "predictions_linear = de_standardise(prediction_function(data_matrix_linear, optimal_weights_linear),\\\n", " output_row_of_means, output_row_of_stds)\n", "predictions_polynomial = de_standardise(prediction_function(data_matrix_polynomial, optimal_weights_polynomial),\\\n", " output_row_of_means, output_row_of_stds)\n", "\n", "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(35, 15))\n", "fig.suptitle('Linear and polynomial regressions for height weight data',\n", " fontsize=30)\n", "for ax in fig.get_axes():\n", " ax.scatter(heights, weights, s=2)\n", " ax.set(xlabel='Height', ylabel='Weight')\n", " for item in ([ax.title, ax.xaxis.label, ax.yaxis.label] +\n", " ax.get_xticklabels() + ax.get_yticklabels()):\n", " item.set_fontsize(30)\n", " ax.label_outer()\n", "ax1.plot(heights, predictions_linear, linewidth=2, color='r')\n", "ax2.plot(heights, predictions_polynomial, linewidth=2, color='r')\n", "plt.savefig('figure.pdf')\n", "plt.tight_layout;" ] }, { "cell_type": "markdown", "id": "82e6c5b1", "metadata": {}, "source": [ "#### Solving regression problem with the gradient descent\n", "The gradient descent is a method for finding a minimum of a function by running an iterative algorithm. Let $E\\left(\\mathbf{w}\\right): \\mathbb{R}^n \\to \\mathbb{R}$ be a real valued, convex, differentiable function of $n$-dimensional argument $\\mathbf{w}=\\left(w_1,w_2,\\ldots,w_n\\right)$. Let $\\mathbf{w}^{(0)}$ be an arbitrary $n$-dimensional vector. Then the minimiser of the function $E$ can be approximated by consecutively evaluating\n", "$$\n", "\\mathbf{w}^{(k+1)} = \\mathbf{w}^{(k)} - \\tau \\nabla E\\left(\\mathbf{w}^{(k)}\\right), \\quad k\\geq 0,\n", "$$\n", "where $\\tau > 0$ is a step-size parameter.\n", "\n", "1. Implement a function $\\mathtt{gradient\\_descent}$ that performs gradient descent to numerically approximate a minimiser of a convex function. The function should take the following arguments\n", "- *objective* - a lambda-function representing function $E$. This itself should take a NumPy array as its argument and return a real number.\n", "- *gradient* - a lambda-function representing function $\\nabla E$. This itself should take a NumPy array as its argument and return a NumPy array representation of the gradient $\\nabla E$.\n", "- *initial_ weights* - a NumPy array with initial values $\\mathbf{w}^{(0)}$ for the first iterate \n", "- *step_size* - a step-size parameter $\\tau$ for the gradient descent step\n", "- *no_of_iterations* - an integer parameter that controls the number of iterations\n", "- *print_output* - an integer parameter that controls how often you are printing an intermediate result. If say *print_output = 100*, then after every 100th iteration you are asked to print your current iterate and a value of the objective as *Iteration k/m, objective = o.*, where $k$ is a number of current iteration, $m$ is a total number of iterations, and $o$ is a value of the objective evaluated at current iterate.\n", "\n", "Implement the function so that it returns a NumPy array of the weights obtained after gradient descent together with a list of objective values for all iterates." ] }, { "cell_type": "code", "execution_count": null, "id": "29eaa5f9", "metadata": {}, "outputs": [], "source": [ "def gradient_descent(objective, gradient, initial_weights, step_size,no_of_iterations=100, print_output=10):\n", " \n", " objective_values = [] # this is a list where we will append the value of the energy function\n", " weights = np.copy(initial_weights) # this is a copy of the initial values of W, why a copy?!\n", " objective_values.append(objective(weights))\n", " for counter in range(no_of_iterations):\n", " weights -= step_size * gradient(weights)\n", " objective_values.append(objective(weights))\n", " if (counter + 1) % print_output == 0:\n", " print(\"Iteration {k}/{m}, objective = {o}, weights ={w}.\".format(k=counter+1, m=no_of_iterations, o=objective_values[counter],w=weights))\n", " \n", " return weights, objective_values" ] }, { "cell_type": "markdown", "id": "73028911", "metadata": {}, "source": [ "### Example 1\n", "\n", "Imagine to have an energy function \n", "\n", "$E(x)=x^2-2x+1$\n", "\n", "The function is convex. \n", "The gradient is \n", "\n", "$\\nabla E(x)=2x-2$\n", "\n", "Hence we know that the minimizer can be obtained imposing the gradient to zero. This is equivalent to find the argmin\n", "of the Energy function. Hence\n", "\n", "$\\nabla E(x)=2x-2=0 \\rightarrow x=1$\n", "\n", "Now let's see whether gradient descent would find the same solution\n" ] }, { "cell_type": "code", "execution_count": null, "id": "65cfc11d", "metadata": {}, "outputs": [], "source": [ "test_objective = lambda x: pow(x, 2) - 2 * x + 1 # Energy function\n", "test_gradient = lambda x: 2 * x - 2 #Gradient\n", "test_initial_weights = np.array([0.0])\n", "test_step_size = 0.4\n", "test_no_of_iterations = 10\n", "test_print_output = 2\n", "gradient_descent(test_objective, test_gradient,test_initial_weights,test_step_size,\n", " test_no_of_iterations, test_print_output)[0]" ] }, { "cell_type": "markdown", "id": "63d2eae7", "metadata": {}, "source": [ "### Example 2\n", "\n", "In one of the courseworks we had\n", "\n", "$E(x)=\\langle X, AX \\rangle + \\langle v, X \\rangle$\n", "\n", "The gradient is \n", "\n", "$\\nabla E(x)=(A+A^\\top)X+v$" ] }, { "cell_type": "code", "execution_count": null, "id": "e1164ca3", "metadata": {}, "outputs": [], "source": [ "test_matrix_A = np.array([[3, 1], [2, 4]])\n", "test_vector_v = np.array([5, 6])\n", "test_objective = lambda x: x.T @ (test_matrix_A @ x) + test_vector_v.T @ x\n", "test_gradient = lambda x: (test_matrix_A + test_matrix_A.T) @ x + test_vector_v\n", "\n", "test_initial_weights = np.array([0.0, 0.0])\n", "test_step_size = 0.9 / (np.linalg.norm(test_matrix_A.T+test_matrix_A)) # remember this comes from the lectures and the \n", "# conditions of tau-smoothness\n", "test_no_of_iterations = 100\n", "test_print_output = 10\n", "assert_array_almost_equal(gradient_descent(test_objective, test_gradient, \\\n", " test_initial_weights,test_step_size,\\\n", " test_no_of_iterations, test_print_output)[0],np.array([-0.564103, -0.538462]))" ] }, { "cell_type": "markdown", "id": "2c2624e3", "metadata": {}, "source": [ "2. Write two functions $\\mathtt{mean\\_squared\\_error}$ and $\\mathtt{mean\\_squared\\_error\\_gradient}$ that implement the mean squared error and its gradient as defined below. Both functions take a two-dimensional NumPy array *data_matrix*, a two-dimensional NumPy array *weights* and a two-dimensional NumPy array *data_outputs* as arguments. The first function should return a real number, while the second one should return a matrix representation of the MSE gradient. The MSE function and its gradient are given by\n", "$$\n", "MSE\\left(\\mathbf{w}\\right) = \n", "\\frac{1}{2s}\\left\\|\\mathbf{\\Phi}\\left(\\mathbf{X}\\right)\\mathbf{w} - \\mathbf{Y}\\right\\|^2,\n", "\\quad\n", "\\nabla MSE\\left(\\mathbf{w}\\right) = \n", "\\frac{1}{s}\\mathbf{\\Phi}^{\\top}\\left(\\mathbf{X}\\right)\\left(\\mathbf{\\Phi}\\left(\\mathbf{X}\\right)\\mathbf{w} - \\mathbf{Y}\\right),\n", "$$\n", "where $\\mathbf{\\Phi}\\left(\\mathbf{X}\\right)$ is a mathematical representation of *data_matrix* and $\\mathbf{Y}$ is a mathematical representation of *data_output*\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "1c37d5c5", "metadata": {}, "outputs": [], "source": [ "def mean_squared_error(data_matrix, data_outputs, weights):\n", " return (np.linalg.norm(data_matrix @ weights -data_outputs))**2 / (2 * len(data_matrix))" ] }, { "cell_type": "code", "execution_count": null, "id": "2626fc8b", "metadata": {}, "outputs": [], "source": [ "def mean_squared_error_gradient(data_matrix, data_outputs, weights):\n", " return data_matrix.T @ (data_matrix @ weights -data_outputs) / len(data_matrix)" ] }, { "cell_type": "markdown", "id": "7a10e8b3", "metadata": {}, "source": [ "Test your functions with the following unit tests." ] }, { "cell_type": "code", "execution_count": null, "id": "b0dab5c0", "metadata": {}, "outputs": [], "source": [ "test_data_matrix = np.array([[1, 0.98], [1, 1.02]])\n", "test_data_outputs = np.array([[-0.1], [0.3]])\n", "test_weights = np.array([[-9.9], [10]])\n", "assert_array_almost_equal(\n", " mean_squared_error(test_data_matrix, test_data_outputs, test_weights), 0)" ] }, { "cell_type": "code", "execution_count": null, "id": "83630b8a", "metadata": {}, "outputs": [], "source": [ "test_data_matrix = np.array([[1, 1, -1], [1, 2, 2]])\n", "test_data_outputs = np.array([[-1, 2], [1, 3]])\n", "test_weights = np.array([[0, 0], [1, 2], [3, 4]])\n", "assert_array_almost_equal(\n", " mean_squared_error(test_data_matrix, test_data_outputs, test_weights),\n", " 36.75)" ] }, { "cell_type": "markdown", "id": "e4f8a848", "metadata": {}, "source": [ "3. Use the function $\\mathtt{gradient\\_descent}$ from previous question to minimise the mean squared error for linear and polynomial regression problems discussed in the first section (and thus, to compute their solutions). Take the step-size parameter equal to\n", "$$\n", "\\tau = 0.9\\cdot s\\cdot\\left\\|\\mathbf{\\Phi}\\left(\\mathbf{X}\\right)\\right\\|^{-2},\n", "$$\n", "and the number of iterations equal to $100$ for linear regressions, and $50000$ for polynomial one. Start your iterations from a zero vector of corresponding length. Compare your results with the ones you obtained in the first section." ] }, { "cell_type": "code", "execution_count": null, "id": "b16548b1", "metadata": {}, "outputs": [], "source": [ "objective_linear = lambda weights: mean_squared_error(data_matrix_linear,data_output, weights)\n", "gradient_linear = lambda weights: mean_squared_error_gradient(data_matrix_linear, data_output, weights)\n", "\n", "initial_weights_linear = np.zeros((degree_linear + 1, 1))\n", "step_size_linear = 0.9 * len(data_matrix_linear) / (np.linalg.norm(data_matrix_linear))**2\n", "\n", "gd_optimal_weights_linear = gradient_descent(objective_linear, gradient_linear,\n", " initial_weights_linear,step_size_linear, 100, 10)[0]" ] }, { "cell_type": "code", "execution_count": null, "id": "7075aa50", "metadata": {}, "outputs": [], "source": [ "objective_polynomial = lambda weights: mean_squared_error(data_matrix_polynomial, data_output, weights)\n", "gradient_polynomial = lambda weights: mean_squared_error_gradient(data_matrix_polynomial, data_output, weights)\n", "initial_weights_polynomial = np.zeros((degree_polynomial + 1, 1))\n", "step_size_polynomial = 0.9 * len(data_matrix_polynomial) / (np.linalg.norm(data_matrix_polynomial))**2\n", "\n", "\n", "gd_optimal_weights_polynomial = gradient_descent(objective_polynomial,\n", " gradient_polynomial,initial_weights_polynomial,\n", " step_size_polynomial, 50000,5000)[0]" ] }, { "cell_type": "markdown", "id": "4af72244", "metadata": {}, "source": [ "## Solutions we got applying the normal equation directly\n", "Optimal linear regression coefficients are equal to: [8.57473564e-16 9.24756299e-01]. \n", "Optimal polynomial regression (d = 5) coefficients are equal to: [-1.54785290e-03 1.05220213e+00 4.03175447e-03 -7.60510612e-02\n", " -4.05488657e-04 6.63445950e-03]" ] }, { "cell_type": "markdown", "id": "1c5c7983", "metadata": {}, "source": [ "Test your results with the following unit tests" ] }, { "cell_type": "code", "execution_count": null, "id": "7cad6d30", "metadata": {}, "outputs": [], "source": [ "assert_array_almost_equal(gd_optimal_weights_linear, optimal_weights_linear)" ] }, { "cell_type": "code", "execution_count": null, "id": "989c1d2a", "metadata": {}, "outputs": [], "source": [ "assert_array_almost_equal(gd_optimal_weights_polynomial,\n", " optimal_weights_polynomial)" ] }, { "cell_type": "markdown", "id": "4159b39c", "metadata": {}, "source": [ "As one can see from above tests, the weights obtained with gradient descent coincide with the ones obtained by solving the normal equation." ] }, { "cell_type": "markdown", "id": "25fbd575", "metadata": {}, "source": [ "### The LASSO problem for height-weight data\n", "\n", "\n", "1. Write a function $\\mathtt{lasso\\_cost\\_function}$ that implements the LASSO cost function as introduced in the lecture notes. The arguments include *data_matrix*, *weights* and *outputs* similar to $\\mathtt{mean\\_squared\\_error}$, but further a positive scalar *regularisation_parameter* that controls the balance between mean squared error and the one norm." ] }, { "cell_type": "code", "execution_count": null, "id": "3ada25f0", "metadata": {}, "outputs": [], "source": [ "def lasso_cost_function(data_matrix, outputs, weights,regularisation_parameter):\n", " return mean_squared_error(data_matrix, outputs,weights) + regularisation_parameter * np.sum(np.abs(weights))" ] }, { "cell_type": "markdown", "id": "1efb9cac", "metadata": {}, "source": [ "Apply your function to the data below" ] }, { "cell_type": "code", "execution_count": null, "id": "13b8a5fc", "metadata": {}, "outputs": [], "source": [ "test_samples, test_dimensions, test_output_dimensions, test_regularisation_parameter = np.random.randint(low=2, high=100, size=4)\n", "\n", "test_inputs = np.random.rand(test_samples, test_dimensions)\n", "\n", "test_data_outputs = np.random.rand(test_samples, test_output_dimensions)\n", "test_data_matrix = np.c_[np.ones((len(test_inputs), 1)), test_inputs]\n", "test_weights = np.random.rand(test_dimensions + 1, test_output_dimensions)\n", "\n", "lasso_cost_function(test_data_matrix,test_data_outputs, test_weights, test_regularisation_parameter)" ] }, { "cell_type": "markdown", "id": "c89bce4c", "metadata": {}, "source": [ "We now implement two methods of solving the LASSO problem. Namely,\n", "- Smoothing of one-norm\n", "- Proximal gradient descent\n", "\n", "#### Smoothing of one-norm\n", "\n", "\n", "1. Write a function $\\mathtt{huber\\_loss}$ that evaluates a value of the Huber loss function for a vector. Your function should take a NumPy array *argument* and a scalar *smoothing_parameter* as its arguments. This function is defined as\n", "$$\n", "H_{\\tau}\\left(\\mathbf{w}\\right) = \n", "\\sum\\limits_{j=0}^{d} \\left|w_j\\right|_{\\tau},\\quad\n", "\\mbox{with} \\quad \n", "\\left|x\\right|_{\\tau} = \n", "\\begin{cases}\n", "\\left|x\\right| - \\frac{\\tau}{2}, &\\left|x\\right|\\geq \\tau,\\\\\n", "\\frac{x^2}{2\\tau}, &\\left|x\\right|< \\tau,\n", "\\end{cases}\n", "$$\n", "where $\\tau$ is a mathematical representation of *smoothing_parameter*, while $\\mathbf{w}$ is mathematical representation of *argument*.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "87b96d40", "metadata": {}, "outputs": [], "source": [ "def huber_loss(argument, smoothing_parameter):\n", " \n", " H=0.\n", " for x in argument: \n", " if np.abs(x)>=smoothing_parameter:\n", " H+=np.abs(x)-smoothing_parameter/2\n", " else:\n", " H+=np.power(x,2)/(2*smoothing_parameter)\n", " \n", " return H" ] }, { "cell_type": "markdown", "id": "53cc86ab", "metadata": {}, "source": [ "Test your function with the following unit tests" ] }, { "cell_type": "code", "execution_count": null, "id": "76aab4e8", "metadata": {}, "outputs": [], "source": [ "assert_array_almost_equal(huber_loss(np.array([1, 2, 3, -4]), 2), 6.25)" ] }, { "cell_type": "markdown", "id": "45ba5959", "metadata": {}, "source": [ "2. Implement the gradient of Huber loss function $\\mathtt{huber\\_loss\\_gradient}$. Your function should take a NumPy array *argument* and a scalar *smoothing_parameter* as its arguments and perform the evaluation of the derivative of Huber loss function\n", "$$\n", "\\left|x\\right|'_{\\tau}\n", "=\n", "\\begin{cases}\n", "\\mathrm{sgn}\\left(x\\right), &\\left|x\\right|\\geq \\tau,\\\\\n", "\\frac{x}{\\tau}, &\\left|x\\right|< \\tau,\n", "\\end{cases}\n", "$$\n", "to each component of the array." ] }, { "cell_type": "code", "execution_count": null, "id": "0a7bfbec", "metadata": {}, "outputs": [], "source": [ "def huber_loss_gradient(argument, smoothing_parameter):\n", " values=[]\n", " for x in argument:\n", " if np.abs(x)>=smoothing_parameter:\n", " values.append(np.sign(x))\n", " else:\n", " values.append(x/smoothing_parameter)\n", " return np.array(values)" ] }, { "cell_type": "markdown", "id": "fac2def1", "metadata": {}, "source": [ "Test your function with the following unit test" ] }, { "cell_type": "code", "execution_count": null, "id": "85ac93bd", "metadata": {}, "outputs": [], "source": [ "assert_array_almost_equal(huber_loss_gradient(np.array([1, 2, 3, -4]), 2),\n", " np.array([0.5, 1., 1., -1.]))" ] }, { "cell_type": "markdown", "id": "6236ef47", "metadata": {}, "source": [ "3. Use the function $\\mathtt{gradient\\_descent}$ from previous question to minimise the lasso cost function for linear and polynomial regression problems discussed in the first section (and thus, to compute their solutions). Take the gradient step-size parameter equal to\n", "$$\n", "\\tau = 0.45\\cdot s\\cdot\\left\\|\\mathbf{\\Phi}\\left(\\mathbf{X}\\right)\\right\\|^{-2},\n", "$$\n", "while assign the Huber loss smoothing parameter with $\\hat\\tau = 2\\cdot \\alpha\\cdot\\tau$, where the regularisation parameter $\\alpha$ is equal to $\\frac{1}{2}$. Take the number of iterations equal to $1000$ for linear regressions, and $50000$ for polynomial one. Start your iterations from a zero vector of corresponding length. Compare your results with the ones you obtained in the first section." ] }, { "cell_type": "code", "execution_count": null, "id": "3283bf68", "metadata": {}, "outputs": [], "source": [ "test_regularisation_parameter = 0.5\n", "test_step_size_linear = 0.45 * len(data_matrix_linear) / ((np.linalg.norm(data_matrix_linear))**2)\n", "test_smoothing_parameter_linear = 2* test_step_size_linear * test_regularisation_parameter\n", "\n", "# the cost function is the LASSO expression\n", "test_objective_linear = lambda weights: lasso_cost_function(data_matrix_linear, data_output, weights, test_regularisation_parameter)\n", "# the gradient is the sum of the gradient of MSE plus the gradient of the Huber loss\n", "test_gradient_linear = lambda weights: (mean_squared_error_gradient(data_matrix_linear, data_output, weights)+ \n", " test_regularisation_parameter * \n", " huber_loss_gradient(weights, test_smoothing_parameter_linear))\n", "\n", "test_initial_weights_linear = np.zeros((degree_linear + 1, 1)) # initital vector\n", "\n", "\n", "# we then apply gradient descent\n", "hl_optimal_weights_linear = gradient_descent(test_objective_linear,\n", " test_gradient_linear,\n", " test_initial_weights_linear,\n", " test_step_size_linear, 1000,\n", " 100)[0]" ] }, { "cell_type": "code", "execution_count": null, "id": "9786a703", "metadata": {}, "outputs": [], "source": [ "test_regularisation_parameter = 0.5\n", "test_step_size_polynomial = 1.9 * len(data_matrix_polynomial) / ((np.linalg.norm(data_matrix_polynomial))**(2))\n", "test_smoothing_parameter_polynomial = 2*test_step_size_polynomial * test_regularisation_parameter\n", "test_objective_polynomial = lambda weights: lasso_cost_function(data_matrix_polynomial, data_output, weights, test_regularisation_parameter)\n", "\n", "test_gradient_polynomial = lambda weights: (mean_squared_error_gradient(\n", "data_matrix_polynomial, data_output, weights\n", ") + test_regularisation_parameter * huber_loss_gradient(\n", "weights, test_smoothing_parameter_polynomial))\n", "test_initial_weights_polynomial = np.zeros((degree_polynomial + 1, 1))\n", "\n", "hl_optimal_weights_polynomial = gradient_descent(\n", " test_objective_polynomial, test_gradient_polynomial,\n", " test_initial_weights_polynomial, test_step_size_polynomial, 50000, 5000)[0]" ] }, { "cell_type": "code", "execution_count": null, "id": "8972e0aa", "metadata": {}, "outputs": [], "source": [ "print(\"Optimal linear regression coefficients are equal to: {l}. \\\n", " \\nOptimal polynomial regression (d = {d}) coefficients are equal to: {p}\" .\\\n", " format(l = hl_optimal_weights_linear.T, d = degree_polynomial,p = hl_optimal_weights_polynomial.T))" ] }, { "cell_type": "markdown", "id": "0581c7a1", "metadata": {}, "source": [ "## Note: \n", "Remember that LASSO will \"sparsify\" the solution, hence in case of small number of variables (like here) it might not be the optimal choice" ] }, { "cell_type": "markdown", "id": "7b971094", "metadata": {}, "source": [ "Test your results with the following unit tests" ] }, { "cell_type": "code", "execution_count": null, "id": "8dae4e48", "metadata": {}, "outputs": [], "source": [ "assert_array_almost_equal(hl_optimal_weights_linear,\n", " np.array([[1.372813e-16], [4.247563e-01]]))" ] }, { "cell_type": "code", "execution_count": null, "id": "26299665", "metadata": {}, "outputs": [], "source": [ "assert_array_almost_equal(\n", " hl_optimal_weights_polynomial,\n", " np.array([[-6.045255e-05], [4.988579e-03], [-1.228225e-05],\n", " [1.92627433e-01], [-3.09616971e-04], [1.77316147e-03]]))" ] }, { "cell_type": "markdown", "id": "9aa5ca37", "metadata": {}, "source": [ "You can now observe that a polynomial function is very different from a linear one. It is rather qubic now. We finish by plotting resulting regression functions." ] }, { "cell_type": "code", "execution_count": null, "id": "df2da704", "metadata": {}, "outputs": [], "source": [ "hl_predictions_linear = de_standardise(prediction_function(data_matrix_linear, hl_optimal_weights_linear),\\\n", " output_row_of_means, output_row_of_stds)\n", "hl_predictions_polynomial = de_standardise(prediction_function(data_matrix_polynomial, hl_optimal_weights_polynomial),\\\n", " output_row_of_means, output_row_of_stds)\n", "\n", "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(35, 15))\n", "fig.suptitle('Linear and polynomial regressions for height weight data',\n", " fontsize=30)\n", "for ax in fig.get_axes():\n", " ax.scatter(heights, weights, s=2)\n", " ax.set(xlabel='Height', ylabel='Weight')\n", " for item in ([ax.title, ax.xaxis.label, ax.yaxis.label] +\n", " ax.get_xticklabels() + ax.get_yticklabels()):\n", " item.set_fontsize(30)\n", " ax.label_outer()\n", "ax1.plot(heights, hl_predictions_linear, linewidth=2, color='r')\n", "ax2.plot(heights, hl_predictions_polynomial, linewidth=2, color='r')\n", "plt.tight_layout;" ] }, { "cell_type": "markdown", "id": "79bbae5f", "metadata": {}, "source": [ "#### Proximal gradient descent\n", "In this part we extend the concept of gradient descent to proximal gradient descent. Subsequently, we use proximal gradient descent to implement the LASSO problem introduced in the lecture.\n", "\n", "1. Based on your function $\\mathtt{gradient\\_descent}$ from above, write a function $\\mathtt{proximal\\_gradient \\_descent}$ that takes all the arguments of $\\mathtt{gradient\\_descent}$ but also a function *proximal_map*." ] }, { "cell_type": "code", "execution_count": null, "id": "22b624fe", "metadata": {}, "outputs": [], "source": [ "def proximal_gradient_descent(objective, gradient, proximal_map,\n", " initial_weights, step_size=1, no_of_iterations=100, print_output=10):\n", " objective_values = []\n", " weights = initial_weights\n", " objective_values.append(objective(weights))\n", " \n", " for counter in range(no_of_iterations):\n", " weights = proximal_map(weights - step_size * gradient(weights)) # the weights are the iterative output here\n", " objective_values.append(objective(weights))\n", " if (counter + 1) % print_output == 0:\n", " print(\"Iteration {k}/{m}, objective = {o}.\".format(k=counter+1, m=no_of_iterations, o=objective_values[counter]))\n", " print(\"Iteration completed after {k}/{m}, objective = {o}.\".format(k=counter+ 1, m=no_of_iterations, o=objective_values[counter]))\n", " return weights, objective_values" ] }, { "cell_type": "markdown", "id": "630a0452", "metadata": {}, "source": [ "Apply your function below" ] }, { "cell_type": "code", "execution_count": null, "id": "0f040c37", "metadata": {}, "outputs": [], "source": [ "test_matrix_m = np.array([[3, 1], [2, 4]])\n", "test_vector_v = np.array([5, 6])\n", "\n", "test_objective = lambda x: x.T @ (test_matrix_m @ x) + x @ test_vector_v + 2 * np.linalg.norm(x, ord=1)\n", "test_gradient = lambda x: (test_matrix_m + test_matrix_m.T) @ x + test_vector_v\n", "# why is this the gradient? Remember: the proximal map is applied to w-tau*Nabla(L), where E(w)=L(w)+R(w)!\n", "\n", "test_initial_weights = np.array([0.0, 0.0])\n", "test_step_size = 0.9 / (np.linalg.norm(test_matrix_m + test_matrix_m.T))\n", "test_no_of_iterations = 1000\n", "test_print_output = 100\n", "\n", "test_proximal_map = lambda x: np.sign(x) * np.maximum(0,np.abs(x) - test_step_size)\n", "\n", "proximal_gradient_descent(test_objective,test_gradient, \n", " test_proximal_map,test_initial_weights,test_step_size,\n", " test_no_of_iterations,test_print_output)[0]" ] }, { "cell_type": "markdown", "id": "0968e263", "metadata": {}, "source": [ "2. Implement the proximal map for the one-norm, also known as soft thresholding. Write a function $\\mathtt{proximal\\_map}$ that takes a NumPy array *argument* and a scalar *threshold* as its arguments and performs the soft-thresholding operation to each component of the array. This is defined as\n", "$$\n", "\\mathrm{soft}_{\\tau}\\left(x\\right) = \n", "x - \\tau\\cdot H'_{\\tau}\\left(x\\right) = \n", "\\begin{cases}\n", "x - \\mathrm{sgn}\\left(x\\right)\\cdot\\tau, &\\left|x\\right|\\geq \\tau,\\\\\n", "0, &\\left|x\\right| < \\tau.\n", "\\end{cases}\n", "$$" ] }, { "cell_type": "code", "execution_count": null, "id": "04bb0eb3", "metadata": {}, "outputs": [], "source": [ "def soft_thresholding(argument, threshold):\n", " return np.sign(argument) * np.maximum(0, np.abs(argument) - threshold)" ] }, { "cell_type": "markdown", "id": "16322d0a", "metadata": {}, "source": [ "3. Use the function $\\mathtt{proximal\\_gradient\\_descent}$ from previous question to minimise the lasso cost function for linear and polynomial regression problems discussed in the above. Take the gradient step-size parameter equal to\n", "$$\n", "\\tau = 0.9\\cdot s\\cdot\\left\\|\\mathbf{\\Phi}\\left(\\mathbf{X}\\right)\\right\\|^{-2},\n", "$$\n", "while assign the proximal map smoothing parameter with $\\hat\\tau = \\alpha\\cdot\\tau$, where the regularisation parameter $\\alpha$ is equal to $\\frac{1}{2}$. Take the number of iterations equal to $1000$ for linear regressions, and $50000$ for polynomial one. Start your iterations from a zero vector of corresponding length. Compare your results with the ones you obtained in the first section." ] }, { "cell_type": "code", "execution_count": null, "id": "b8be4e7a", "metadata": {}, "outputs": [], "source": [ "test_regularisation_parameter = 0.5\n", "test_step_size_linear = 0.9 * len(data_matrix_linear) * ((np.linalg.norm(data_matrix_linear))**(-2))\n", "\n", "test_smoothing_parameter_linear = test_step_size_linear *test_regularisation_parameter\n", "\n", "test_objective_linear = lambda weights: lasso_cost_function(data_matrix_linear, data_output, weights, test_regularisation_parameter)\n", "test_gradient_linear = lambda weights: mean_squared_error_gradient(data_matrix_linear, data_output, weights)\n", "test_proximal_map_linear = lambda weights: soft_thresholding(weights, test_smoothing_parameter_linear)\n", "\n", "test_initial_weights_linear = np.zeros((degree_linear + 1, 1))\n", "\n", "pm_optimal_weights_linear = proximal_gradient_descent(\n", " test_objective_linear, test_gradient_linear, test_proximal_map_linear,\n", " test_initial_weights_linear, test_step_size_linear, 1000, 100)[0]" ] }, { "cell_type": "markdown", "id": "b356bc71", "metadata": {}, "source": [ "Test your result with the following unit test" ] }, { "cell_type": "code", "execution_count": null, "id": "ea1849fe", "metadata": {}, "outputs": [], "source": [ "assert_array_almost_equal(pm_optimal_weights_linear,\n", " np.array([[-0.], [0.424756]]))" ] }, { "cell_type": "code", "execution_count": null, "id": "906786a6", "metadata": {}, "outputs": [], "source": [ "test_regularisation_parameter = 0.5\n", "test_step_size_polynomial = 0.9 * len(data_matrix_polynomial) * ((np.linalg.norm(data_matrix_polynomial))**(-2))\n", "test_smoothing_parameter_polynomial = test_step_size_polynomial *test_regularisation_parameter\n", "\n", "test_objective_polynomial = lambda weights: lasso_cost_function(data_matrix_polynomial, data_output, weights, test_regularisation_parameter)\n", "test_gradient_polynomial = lambda weights: mean_squared_error_gradient(data_matrix_polynomial, data_output, weights)\n", "test_proximal_map_polynomial = lambda weights: soft_thresholding(weights, test_smoothing_parameter_polynomial)\n", "\n", "test_initial_weights_polynomial = np.zeros((degree_polynomial + 1, 1))\n", "\n", "pm_optimal_weights_polynomial = proximal_gradient_descent(\n", " test_objective_polynomial, test_gradient_polynomial,\n", " test_proximal_map_polynomial, test_initial_weights_polynomial,\n", " test_step_size_polynomial, 50000, 5000)[0]" ] }, { "cell_type": "markdown", "id": "11af80ff", "metadata": {}, "source": [ "Test your result with the following unit test" ] }, { "cell_type": "code", "execution_count": null, "id": "0df34cd5", "metadata": {}, "outputs": [], "source": [ "assert_array_almost_equal(pm_optimal_weights_polynomial,\n", " np.array([[0], [0], [0], [0.17920935], [0], [0]]))" ] }, { "cell_type": "markdown", "id": "4c9eb6cd", "metadata": {}, "source": [ "We would like to emphasize now that the optimal polynomial has a form $p\\left(x\\right) = 0.17920395\\cdot x^3$. This is in complete agreement with the LASSO method: all coefficients except one are zero. We finish by plotting optimal linear and optimal polynomial regression functions. " ] }, { "cell_type": "code", "execution_count": null, "id": "7f636f24", "metadata": {}, "outputs": [], "source": [ "pm_predictions_linear = de_standardise(prediction_function(data_matrix_linear, pm_optimal_weights_linear),\\\n", " output_row_of_means, output_row_of_stds)\n", "pm_predictions_polynomial = de_standardise(prediction_function(data_matrix_polynomial, pm_optimal_weights_polynomial),\\\n", " output_row_of_means, output_row_of_stds)\n", "\n", "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(35, 15))\n", "fig.suptitle('Linear and polynomial regressions for height weight data',\n", " fontsize=30)\n", "for ax in fig.get_axes():\n", " ax.scatter(heights, weights, s=2)\n", " ax.set(xlabel='Height', ylabel='Weight')\n", " for item in ([ax.title, ax.xaxis.label, ax.yaxis.label] +\n", " ax.get_xticklabels() + ax.get_yticklabels()):\n", " item.set_fontsize(30)\n", " ax.label_outer()\n", "ax1.plot(heights, pm_predictions_linear, linewidth=2, color='r')\n", "ax2.plot(heights, pm_predictions_polynomial, linewidth=2, color='r')\n", "plt.tight_layout;" ] }, { "cell_type": "code", "execution_count": null, "id": "fa3009d5", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.17" } }, "nbformat": 4, "nbformat_minor": 5 }