Closed Form Solution of Linear Regression

Closed Form Solution of Linear Regression#

The Closed Form Solution of Linear Regression, particularly when referring to linear regression with L2 regularization (also known as Ridge Regression), is a mathematical approach to find the optimal parameters (weights) of the model without needing iterative optimization methods like Gradient Descent. This solution directly computes the weights that minimize the cost function of the linear regression model.

Mathematical Expression#

Given a dataset with \(n\) samples, each having \(d\) features. The Closed Form Solution for \(\theta\) for Ridge Regression is given by:

\[\theta = (X^TX + \lambda I)^{-1}X^TY\]

where:

  • \(X\) is the feature matrix,

  • \(Y\) is the vector of labels,

  • \(\lambda\) is the regularization parameter,

  • \(X^T\) is the transpose of \(X\),

  • \(I\) is the identity matrix of size \((d+1) \times (d+1)\) (to exclude the bias term from regularization, the first element of \(I\), \(I_{00}\), is set to 0),

  • \((X^TX + \lambda I)^{-1}\) is the inverse of the matrix \((X^TX + \lambda I)\).

Python Implementation#

After seeing the problem, your classmate Alice immediately argues that we can apply a linear regression model, as the labels are numbers from 0-9, very similar to the example we learned from Unit 1. Though being a little doubtful, you decide to have a try and start simple by using the raw pixel values of each image as features.

Alice wrote a skeleton code run_linear_regression_on_MNIST in main.py, but she needs your help to complete the code and make the model work.

import mnist.utils as U
import numpy as np
import mnist.test_utils as T

# Jupiter notebook cache the imported modules, so we need to reload the module to get the latest changes
import importlib
importlib.reload(T)

# Create a link to the test_utils.py file
# ln -s /Users/n03an/Documents/projects/AI/edx/MIT-MachineLearning-MicroMasters/test/utils.py test_utils.py 

def run_linear_regression_on_MNIST(lambda_factor=1, dataset='mnist/Datasets/mnist.pkl.gz'):
    """
    Trains linear regression, classifies test data, computes test error on test set

    Returns:
        Final test error
    """
    # Load MNIST data
    train_x, train_y, test_x, test_y = U.get_MNIST_data(dataset)
    # Add bias dimension (adds a column of ones to the training features)
    # This bias term allows the linear regression model to learn an intercept term.
    train_x_bias = np.hstack([np.ones([train_x.shape[0], 1]), train_x])
    # Add bias dimension to test data
    test_x_bias = np.hstack([np.ones([test_x.shape[0], 1]), test_x])
    # Compute parameter / weights using closed form solution using training data
    theta = closed_form(train_x_bias, train_y, lambda_factor)
    # Compute test error using test data
    test_error = compute_test_error_linear(test_x_bias, test_y, theta)
    return test_error

def closed_form(X, Y, lambda_factor):
    """
    Computes the closed form solution of linear regression with L2 regularization

    Args:
        X - (n, d + 1) NumPy array (n datapoints each with d features plus the bias feature in the first dimension)
        Y - (n, ) NumPy array containing the labels (a number from 0-9) for each
            data point
        lambda_factor - the regularization constant (scalar)
    Returns:
        theta - (d + 1, ) NumPy array containing the weights of linear regression. Note that theta[0]
        represents the y-axis intercept of the model and therefore X[0] = 1
    """

    I = np.identity(X.shape[1])
    #  np.dot(X.T, X) == X.T @ X
    theta = np.linalg.inv((X.T @ X) + (lambda_factor * I)) @ X.T @ Y
    return theta

def compute_test_error_linear(test_x, Y, theta):
    # Look at individual pridictions and compare to actual
    # (nsamples, nfeatures) = test_x.shape
    # for i in range(nsamples):
    #     prediction = np.round(np.dot(test_x[i], theta))
    #     print(f"Prediction: {prediction}, Actual: {Y[i]}")
    test_y_predict = np.round(np.dot(test_x, theta))
    test_y_predict[test_y_predict < 0] = 0
    test_y_predict[test_y_predict > 9] = 9
    return 1 - np.mean(test_y_predict == Y)

# Test
def check_closed_form():
    ex_name = "Closed form"
    X = np.arange(1, 16).reshape(3, 5)
    Y = np.arange(1, 4)
    lambda_factor = 0.5
    exp_res = np.array([-0.03411225,  0.00320187,  0.04051599,  0.07783012,  0.11514424])
    if T.check_array(
            ex_name, closed_form,
            exp_res, X, Y, lambda_factor):
        return

    T.log(T.green("PASS"), ex_name, "")

check_closed_form()

# Run linear regression on MNIST dataset with varying regularization constants
print('Linear Regression test_error with regularization constant as 1  =', run_linear_regression_on_MNIST(lambda_factor=1))
print('Linear Regression test_error with regularization constant as 0.1 =', run_linear_regression_on_MNIST(lambda_factor=0.1))
print('Linear Regression test_error with regularization constant as 0.01 =', run_linear_regression_on_MNIST(lambda_factor=0.01))
PASS Closed form 
Linear Regression test_error with regularization constant as 1  = 0.7697
Linear Regression test_error with regularization constant as 0.1 = 0.7698
Linear Regression test_error with regularization constant as 0.01 = 0.7702
def compute_test_error_linear(test_x, Y, theta):
    test_y_predict = np.round(np.dot(test_x, theta))
    test_y_predict[test_y_predict < 0] = 0
    test_y_predict[test_y_predict > 9] = 9
    return 1 - np.mean(test_y_predict == Y)

The function compute_test_error_linear in the provided code snippet calculates the test error for linear regression. Here’s a step-by-step explanation of how it works:

  1. Predict Test Labels: It uses the linear regression model’s weights (theta) to predict the labels for the test dataset. This is done by multiplying the test feature matrix (test_x) with the weight vector (theta). The result is a vector of predicted continuous values.

  2. Round Predictions: The continuous predictions are rounded to the nearest integer using np.round, as the MNIST labels are integers from 0 to 9. This step converts the continuous predictions into discrete class labels.

  3. Clip Predictions: The predictions are then clipped to the range [0, 9] using test_y_predict[test_y_predict < 0] = 0 and test_y_predict[test_y_predict > 9] = 9. This ensures that after rounding, any prediction outside the valid range of labels is corrected to the nearest valid label.

  4. Calculate Test Error: Finally, the test error is calculated by comparing the predicted labels (test_y_predict) with the actual labels (Y). The comparison test_y_predict == Y produces a boolean array, where True indicates a correct prediction and False indicates an incorrect prediction. The mean of this boolean array gives the accuracy (the proportion of correct predictions). Subtracting this accuracy from 1 gives the error rate, i.e., the proportion of incorrect predictions.

The function returns the final test error, which is the proportion of the test dataset that was incorrectly classified by the linear regression model.

Note#

Notice how the error rate is so high irrespective of regularization \(\lambda\). The function compute_test_error_linear calculates the test error for a linear regression model applied to a classification problem (MNIST digits classification). Linear regression is not the best choice for classification tasks, especially for multi-class problems like MNIST. Hence it does not matter what regularization constant we use, but the choice of the model itself.