2. Maximum likelihood estimation#

2.1. Reading materials#

2.2. Definition#

  • In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data.

  • Therefore, we have to make an assumption of distribution at first. Take the normal distribution as an example, we assume

    1. Mean(average) have the highest probability

    2. Relatively symmetrical around the mean (no skewness)

image.png

2.3. Steps in MLE#

  1. Write the likelihood function _ where \(x_i\) is the observed value _ \(\theta_j\) is the parameter from assumed distribution * \(f(x_i;\theta)\) is the probability function

    \[ L(\theta) = L(x_1,x_2,...,x_n;\theta_1,\theta_2,...,\theta_m) = \prod_{i=1}^n f(x_i;\theta_1,\theta_2,...,\theta_m) \]

    image-2.png

  2. Get the logarithm of likelihood function

    • The goal is to maximize the likelihood function, but likelihood function is product of bunch probabilities, which makes harder to calculate derivatives.

    • logarithm of likelihood function will not change the maximum and minimum position, and also transfer products to summation

      image-3.png

  3. Get partial derivatives on distribution parameter θ

    • Get the solution of above equation, then find which solution makes $ln(L)$ get maximum
      
      • If there is no solution (no flat point), means \(min⁡(θ)\) or \(max⁡(θ)\) gives the maximum

\[ \frac{\partial ln(L)}{\partial \theta} = 0 \]

2.4. MLE for normal distribution#

2.4.1. Understand the parameter of normal distribution#

\[ P(x|\mu,\sigma) = \frac{1}{\sqrt{2\pi}\sigma} e ^{-\frac{(x-\mu)^2}{2\sigma^2}} \]
  • What the probability of observing \(x=32\) from a normal distribution \(N\)~\((\mu=28,\sigma=2)\)

\[ P(x=32|\mu=28,\sigma=2) = \frac{1}{\sqrt{2\pi}*2} e ^{-\frac{(32-28)^2}{2*2^2}} = 0.03 \]

2.4.2. Get logarithm of likelihood function#

  1. \[ ln(\frac{1}{\sqrt{2\pi}\sigma} e ^{-\frac{(x-\mu)^2}{2\sigma^2}}) \to ln(\frac{1}{\sqrt{2\pi}\sigma}) + ln(e ^{-\frac{(x-\mu)^2}{2\sigma^2}}) \]
  2. \[ ln(\frac{1}{\sqrt{2\pi}\sigma}) + ln(e ^{-\frac{(x-\mu)^2}{2\sigma^2}}) \to -\frac{1}{2}ln[(2\pi\sigma^2)] - \frac{(x-\mu)^2}{2\sigma^2} \]
  3. \[ -\frac{1}{2}ln[(2\pi\sigma^2)] - \frac{(x-\mu)^2}{2\sigma^2} \to -\frac{1}{2}ln(2\pi) - ln(\sigma) - \frac{(x-\mu)^2}{2\sigma^2} \]
  4. Sum the logarithm of likelihood function for all observations $\( ln[L(\mu,\sigma|x_1,x_2,...,x_n)] = \sum_{i=1}^n ln(f(x_i)) \)\( \)\( \sum_{i=1}^n ln(f(x_i)) = -\frac{n}{2}ln(2\pi) - n*ln(\sigma) - \frac{(x_1-\mu)^2}{2\sigma^2}-...-\frac{(x_n-\mu)^2}{2\sigma^2} \)$

2.4.3. Estimate \(\mu\) and \(\sigma\)#

  1. Get partial derivatives of \(\mu\)

\[ \frac{\partial}{\partial \mu} ln[L(\mu,\sigma|x_1,x_2,...,x_n)] = 0-0+ \frac{(x_1-\mu)}{\sigma^2}+...+\frac{(x_n-\mu)}{\sigma^2} \]
\[ \frac{\partial}{\partial \mu} ln[L(\mu,\sigma|x_1,x_2,...,x_n)] = \frac{1}{\sigma^2}[(x_1+...+x_n)-n*\mu] \]
  1. Let partial derivatives equal to 0 to get the solution

\[ \frac{1}{\sigma^2}[(x_1+...+x_n)-n*\mu] = 0 \to \mu = \frac{(x_1+...+x_n)}{n} \]
  1. Get partial derivatives of \(\sigma\)

\[ \frac{\partial}{\partial \sigma} ln[L(\mu,\sigma|x_1,x_2,...,x_n)] = - \frac{n}{\sigma} + \frac{1}{\sigma^3}[(x_1-\mu)^2+...+(x_n-\mu)^2] \]
  1. Let partial derivatives equal to 0 to get the solution

\[ -\frac{n}{\sigma} + \frac{1}{\sigma^3}[(x_1-\mu)^2+...+(x_n-\mu)^2] = 0 \to -n + \frac{1}{\sigma^2}[(x_1-\mu)^2+...+(x_n-\mu)^2] = 0 \]
\[ -n + \frac{1}{\sigma^2}[(x_1-\mu)^2+...+(x_n-\mu)^2] = 0 \to n * \sigma^2 = [(x_1-\mu)^2+...+(x_n-\mu)^2] \]
\[ n * \sigma^2 = [(x_1-\mu)^2+...+(x_n-\mu)^2] \to \sigma = \sqrt{\frac{[(x_1-\mu)^2+...+(x_n-\mu)^2]}{n}} \]