07 Univariate Gaussians

07 Univariate Gaussians#

A univariate Gaussian or normal distributions can be completely determined by its mean and variance.

Gaussian distributions can be applied to a large numbers of problems because of the central limit theorem (CLT). The CLT posits that when a large number of independent and identically distributed ((i.i.d.) random variables are added, the cumulative distribution function (cdf) of their sum is approximated by the cdf of a normal distribution.

Recall the probability density function of the univariate Gaussian with mean \(mu\) and variance \(\sigma ^2\) , \(\mathcal{N}(\mu , \sigma ^2)\) :

\[ \displaystyle f_ X(x) = \frac{1}{\sqrt{2\pi \sigma ^2}} e^{-(x - \mu )^2/(2\sigma ^2)}. \]

Question#

Let \( \displaystyle X\sim \mathcal{N}\left(\mu ,\sigma ^2\right)\), i.e. the pdf of X is

\[ f_ X(x)=\frac{1}{\sigma \sqrt{2\pi }} \exp \left(-\frac{(x-\mu )^2}{2 \sigma ^2}\right). \]

Let \(Y=2X\). Write down the pdf of the random variable . (Your answer should be in terms of \(y\), \(\sigma\) and \(mu\).

Explanation If \( \displaystyle X\sim \mathcal{N}\left(\mu ,\sigma ^2\right)\) and \( \displaystyle Y = 2X\sim \mathcal{N}\left(2\mu ,4\sigma ^2\right)\)

\[ \frac {1}{2\sigma\sqrt{2\pi}}\exp(-\frac {(y-2\mu)^2}{2(4\sigma^2)}) \]

The maximum of a Probability Density Function (PDF) is the highest value that the PDF can take. This maximum value depends on the specific distribution.

For example:

  1. For a Uniform Distribution on the interval [a, b], the PDF is constant and equal to 1 / (b - a). So, the maximum value of the PDF is 1 / (b - a).

  2. For a Normal (Gaussian) Distribution with mean μ and standard deviation σ, the PDF is a bell-shaped curve and its maximum value is at the mean μ. The maximum value is 1 / * sqrt(2π)).

It’s important to note that the maximum value of a PDF does not represent a probability. The PDF gives the density of probability at a given point, and the actual probability of a range of values is given by the integral of the PDF over that range. The maximum value of the PDF can be greater than 1, especially for continuous distributions where the range of possible outcomes is small.

import numpy as np

def normal_pdf(x, mu, sigma):
    sqrt_two_pi = np.sqrt(2 * np.pi)
    return (np.exp(-(x-mu)**2 / 2 / sigma**2) / (sqrt_two_pi * sigma))

print(normal_pdf(2, 1, 2) - normal_pdf(0.5, 1, 2))
-0.017301395019274884
from scipy.stats import norm

mu = 1
sigma = np.sqrt(2)  # standard deviation is the square root of variance

# Calculate the probability that X is between 0.5 and 2
prob = norm.cdf(2, mu, sigma) - norm.cdf(0.5, mu, sigma)

print(prob)
print(norm.cdf(2, mu, sigma))
print(normal_pdf(2, mu, sigma))
0.3984131339906417
0.7602499389065233
0.21969564473386122

Cumulative distribution function#

The Normal Distribution Function and the Cumulative Distribution Function (CDF) are related but serve different purposes:

  1. Normal Distribution Function (PDF): This is also known as the Probability Density Function. It describes the probability density of a continuous random variable at a specific point. The value of the PDF at any two points can be used to compute the probability that the random variable will fall within that interval. However, the PDF itself does not give the probability of an event but rather the density of the probability at that point. The integral of the PDF over an interval gives the probability that the random variable takes a value within that interval.

  2. Cumulative Distribution Function (CDF): The CDF for a random variable is defined as the probability that the variable takes a value less than or equal to a certain value. The CDF at a particular point is the area under the PDF curve to the left of that point. In other words, it’s the cumulative sum of the probabilities of all outcomes up to and including that point.

When calculating the probability of a random variable falling within a certain range, it’s often easier to use the CDF. For a range [a, b], the probability is given by CDF(b) - CDF(a). This is because CDF(b) gives the probability of the variable being less than or equal to b, and CDF(a) gives the probability of the variable being less than or equal to a. The difference between these two probabilities is the probability of the variable falling within the range [a, b].

Question#

Let \( \displaystyle X\sim \mathcal{N}\left(1 ,2\right)\) i.e. random variable X is normally distributted with mean 1 and variance 2. What is the probability that \(X \in [0.5,2]\)

To calculate the probability that a normally distributed random variable falls within a certain range, we need to integrate the probability density function (PDF) over that range. In Python, you can use the scipy.stats.norm module, which provides a cdf function to calculate the cumulative distribution function.

from scipy.stats import norm

mu = 1
sigma = np.sqrt(2)  # standard deviation is the square root of variance

# Calculate the probability that X is between 0.5 and 2
prob = norm.cdf(2, mu, sigma) - norm.cdf(0.5, mu, sigma)

print(prob)
0.3984131339906417