Linear Algebra in Machine Learning Problem space#

Questions in n03@n’s head

  • Did linear algebra started out as pure mathematics and then found a use case?

  • Maths such as linear algebra is used in so many diverse applications, which one started it first? Is there an alternate way to solve without using Maths?

  • When studing Maths in school or universy, it always sounds super accademics without any applications. Why?

History of Linear Algebra#

The development of linear algebra has a rich history involving numerous mathematicians and spanning several centuries. Here are some key milestones and figures in its development:

  1. Ancient Civilizations:

  • Babylonians (circa 3000 BCE): They used methods that can be interpreted as solving systems of linear equations.

  • Chinese (circa 200 BCE): “The Nine Chapters on the Mathematical Art” contains methods for solving systems of linear equations using a matrix-like approach.

  1. Determinants and Early Matrix Concepts:

  • Gottfried Wilhelm Leibniz (1646-1716): One of the earliest mathematicians to develop the concept of determinants.

  • Gabriel Cramer (1704-1752): Known for Cramer’s Rule, a method for solving systems of linear equations using determinants.

  • Alexandre-Théophile Vandermonde (1735-1796): Worked on determinants and developed what is now known as the Vandermonde matrix.

  1. Vector Spaces and Algebraic Structures:

  • Hermann Grassmann (1809-1877): Developed the concept of vector spaces and linear algebra in his work “Die Lineale Ausdehnungslehre” (The Theory of Linear Extension).

  • William Rowan Hamilton (1805-1865): Introduced quaternions, a mathematical system that extended complex numbers and laid the groundwork for vector space theory.

  1. Formalization of Linear Algebra:

  • Giuseppe Peano (1858-1932): Made significant contributions to the axiomatization of vector spaces and linear algebra.

  • David Hilbert (1862-1943): His work on functional analysis and Hilbert spaces was crucial in the formal development of linear algebra.

  1. Matrix Theory and Further Development:

  • Arthur Cayley (1821-1895): One of the founders of matrix theory, Cayley introduced the concept of the matrix as an algebraic object.

  • James Joseph Sylvester (1814-1897): Worked closely with Cayley and contributed to the development of matrix theory and invariant theory.

  1. Applications in Physics and Engineering:

  • Werner Heisenberg (1901-1976): Used matrix mechanics, a form of quantum mechanics that employs matrices, to describe quantum systems.

  • John von Neumann (1903-1957): Applied linear algebra in functional analysis and quantum mechanics.

  1. Modern Development and Computational Methods:

In the 20th and 21st centuries, linear algebra has become foundational in computer science, data science, machine learning, and many engineering disciplines. It is used for algorithms, optimizations, and numerous applications.

Almost any engineering problem is solved in the real world by breaking it down into a large collection of linear pieces, and then using some form of finite element analysis to solve it. This gives an approximate solution to the true, non-linear solution. This is how mechanical engineers analyse the complex stresses on a bike frame, car frame, or air frame in practice, for example.

History of Machine Learning#

Here is the history of various machine learning techniques, their inventors, the years they were developed, the problems they solved along with real-world examples, the key algorithms, and the mathematical subjects involved.

1950s-1960s: Early Foundations#

  1. Perceptron (1958)

  • Inventor: Frank Rosenblatt

  • Problem Solved: Binary classification

  • Real-World Example: Early character recognition systems

  • Key Algorithm: Perceptron learning algorithm

  • Mathematics: Linear algebra (Vectors, Dot Products, Matrix Operations), calculus (Differentiation, Gradients), Probability and Statistics (Probability distributions, loss function), Optimization (Gradient Descent, Learning Rate).

1970s-1980s: Statistical Methods and Neural Networks#

  1. Linear Regression (early 19th century)

  • Inventors: Adrien-Marie Legendre, Carl Friedrich Gauss

  • Problem Solved: Predicting continuous outcomes

  • Real-World Example: Predicting housing prices based on features like size, location, and number of rooms

  • Key Algorithm: Ordinary Least Squares (OLS)

  • Mathematics: Linear algebra (Vectors, Dot Products, Matrix Operations, Determinants and inverses), calculus (Differentiation, Gradients descent), Probability and Statistics etc

  1. Logistic Regression (1944)

  • Inventor: David Cox

  • Problem Solved: Binary classification

  • Real-World Example: Predicting whether an email is spam or not

  • Key Algorithm: Logistic regression model

  • Mathematics: Probability (logistic function), statistics - (likelihood estimation)

  1. k-Nearest Neighbors (k-NN) (1967)

  • Inventors: Evelyn Fix and Joseph Hodges

  • Problem Solved: Classification and regression

  • Real-World Example: Recommending products to users based on their nearest neighbors in feature space

  • Key Algorithm: k-NN algorithm

  • Mathematics: Geometry (Euclidean distance), distance metrics

  1. Backpropagation (1986)

  • Inventors: David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams

  • Problem Solved: Training deep neural networks

  • Real-World Example: Image recognition tasks, such as identifying objects in photographs

  • Key Algorithm: Backpropagation algorithm

  • Mathematics: Calculus (chain rule for gradients), linear algebra (matrix multiplications)

1990s: Expansion of Machine Learning Techniques#

  1. Support Vector Machines (SVMs) (1995)

  • Inventor: Vladimir Vapnik

  • Problem Solved: Classification and regression

  • Real-World Example: Handwriting recognition for digit classification

  • Key Algorithm: SVM algorithm

  • Mathematics: Linear algebra (support vectors), calculus (Lagrange multipliers), optimization (maximizing margin)

  1. Ensemble Methods (1996)

  • Inventors: Leo Breiman (Bagging), Yoav Freund and Robert Schapire (Boosting)

  • Problem Solved: Improving predictive performance

  • Real-World Example: Combining multiple decision trees in random forests for better prediction accuracy in credit scoring

  • Key Algorithms: Bagging (Bootstrap Aggregating), AdaBoost (Adaptive Boosting)

  • Mathematics: Probability (bootstrap sampling), statistics (aggregating predictions)

2000s: Rise of Probabilistic Models and Deep Learning#

  1. Random Forests (2001)

  • Inventor: Leo Breiman

  • Problem Solved: Classification, regression, and feature importance

  • Real-World Example: Predicting customer churn in telecom by analyzing user behavior

  • Key Algorithm: Random forest algorithm

  • Mathematics: Probability (random sampling), statistics (aggregating multiple decision trees)

  1. Hidden Markov Models (HMMs) (1960s)

  • Inventor: Leonard E. Baum and colleagues

  • Problem Solved: Sequential data modeling

  • Real-World Example: Speech recognition systems, where spoken words are treated as sequences of sounds

  • Key Algorithms: Forward-backward algorithm, Viterbi algorithm

  • Mathematics: Probability (state transitions), statistics (likelihood estimation), linear algebra (matrix operations)

  1. Deep Belief Networks (DBNs) (2006)

  • Inventors: Geoffrey Hinton, Simon Osindero, Yee-Whye Teh

  • Problem Solved: Unsupervised learning, feature extraction

  • Real-World Example: Image and voice recognition systems

  • Key Algorithm: Greedy layer-wise training

  • Mathematics: Linear algebra (matrix operations), calculus (gradient-based optimization), probability (Bayesian networks)

2010s: Advanced Techniques and Modern Developments#

  1. Convolutional Neural Networks (CNNs)

  • Inventors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

  • Problem Solved: Image and video recognition

  • Real-World Example: Automated tagging of friends in Facebook photos

  • Key Algorithms: Convolutional layers, pooling layers

  • Mathematics: Linear algebra (convolution operations), calculus (gradient descent)

  1. Recurrent Neural Networks (RNNs)

  • Inventors: Jürgen Schmidhuber, Sepp Hochreiter (LSTM) Problem Solved: Sequential data modeling

  • Real-World Example: Language translation applications like Google Translate

  • Key Algorithms: RNNs, Long Short-Term Memory (LSTM) networks

  • Mathematics: Linear algebra (matrix multiplications), calculus (backpropagation through time), dynamic programming (sequence optimization)

  1. Reinforcement Learning (RL)

  • Inventors: Richard Sutton, Andrew Barto

  • Problem Solved: Decision making and control problems

  • Real-World Example: AlphaGo, the system that defeated human champions in the game of Go

  • Key Algorithms: Q-learning, Deep Q-Network (DQN)

  • Mathematics: Probability (Markov decision processes), statistics (reward estimation), dynamic programming (policy optimization)

  1. Generative Adversarial Networks (GANs) (2014)

  • Inventor: Ian Goodfellow

  • Problem Solved: Generative modeling

  • Real-World Example: Generating realistic synthetic images, such as DeepFake videos

  • Key Algorithm: Minimax game between generator and discriminator

  • Mathematics: Probability (adversarial loss), statistics (distribution estimation), optimization (gradient descent) Modern Applications and Research Areas

  1. Natural Language Processing (NLP)

  • Key Techniques: Word2Vec, GloVe, transformers (e.g., BERT, GPT)

  • Problem Solved: Text understanding, translation, and generation

  • Real-World Example: Chatbots and virtual assistants like Amazon Alexa and Google Assistant

  • Mathematics: Linear algebra (vector embeddings), probability (language models), statistics (frequency analysis)

  1. Transfer Learning

  • Key Techniques: Pre-trained models, fine-tuning

  • Problem Solved: Efficient model training on new tasks with limited data

  • Real-World Example: Using pre-trained image recognition models to identify specific items in retail inventory

  • Mathematics: Linear algebra (feature transfer), calculus (fine-tuning), optimization (loss minimization)

  1. AutoML and Meta-Learning

  • Key Techniques: Automated model selection and hyperparameter tuning

  • Problem Solved: Simplifying the machine learning pipeline

  • Real-World Example: Google AutoML, which automates the design of machine learning models

  • Mathematics: Optimization (search algorithms), probability (model selection), statistics (performance evaluation)

Jeremy Kun#

Here’s just a fraction of what you can do with linear algebra

Ranking in search engines:#

The most high-profile use of linear algebra (whether or not you know it) was in the creation of Google. Their original ranking algorithm, which has since surely become far more complicated, used a lot of linear algebra to rank which webpages should show up first. More generally, any time you want to analyze a random walk in a network, you’re likely to need some linear algebra. Here’s a series of blog posts I wrote deriving the method.

Linear programming:#

The most widely used application of linear algebra is definitely optimization, and the most widely used kind of optimization is linear programming. You can optimize budgets, your diet, and your route to work using linear programming, and this only scratches the surface of the applications. Here’s a series on the mathematics behind linear programming. The primary technique for solving them, called the simplex algorithm, is essentially a beefed up Gaussian elimination.

Error correcting codes:#

Another unseen but widespread use of linear algebra is in coding theory. The problem is to encode data in such a way that if the encoded data is tampered with a little bit, you can still recover the unencoded data. Such schemes are called error correcting codes, and the simplest ones encode data as vectors in a vector space. Error correcting codes are used in DVDs to prevent scratches from ruining a movie. They’re also used on deep space probes to transmit data back to Earth, and they allowed us to get the first ever close-up pictures of Saturn and Jupiter. Here’s an article describing the simplest kind of error correcting code, the Hamming code.

Signal analysis:#

The field of signal analysis gives one massively useful tools for encoding, analyzing, and manipulating “signals” that can be audio, images, video, or things like x-rays and light refracting through a crystal. The simplest way to understand the Fourier transform is as a linear map that performs a change of basis. Fourier analysis has even been used to make art. Here’s the first post in a series deriving Fourier analysis from scratch, although much of it can be abbreviated, skipped, or skimmed if you have a strong understanding of linear algebra. A discrete cousin of Fourier analysis has been part of many theoretical techniques in computer science as well.

Graphics:#

Pretty much all graphics innovation since computers have existed have come from video games and movies. The central part of graphics is projecting a three-dimensional scene onto a two-dimensional screen. Projection is already a linear map. On top of that, rotations, scaling, and perspective are all implemented and analyzed properly using linear algebra.

Facial recognition:#

A cool (but not the best) method for doing automated facial recognition uses a linear algebraic technique called principal component analysis. Essentially this is just finding a particularly good basis to represent a database of face images, and using eigenvectors (“eigenfaces”) to rebuild the images. Here is an article describing that method without prior knowledge of PCA, and a more general article showing PCA for any dataset. Here is a picture of what an “eigenface” might look like.

image.png

Prediction:#

The simplest models of prediction are linear models, and these are developed and understood with linear algebra. Here, for example is an article describing how to do linear regression.

Community detection:#

The leading methods for detecting communities in networks of people (or any other kind of network) use a linear-algebraic tool called the spectrum of the network. Like ranking webpages, community detection techniques also rely on random walks. Here is an overview of some notions of community detection.

Greedy algorithms:#

Greedy algorithms are characterized by a kind of generalization of linear systems called a matroid. In other words, every problem solvable by a greedy algorithm can be represented as a matroid and every matroid can be optimized by a greedy algorithm. Understanding linear algebra is not a requirement to understand matroids, but it makes the process much easier. Here is an article proving what I just said.

Quantum computing:#

All of quantum computing is literally just linear algebra, as is general quantum mechanics. You can understand how quantum computers can break cryptosystems without any physics, as long as you understand linear algebra. Here is the first of a series of articles (in progress) doing this, and it gives you an idea of how linear algebra is the primary tool.