Point Estimation
If a parameter \(\theta\in\Theta\) can be expressed in terms of the moments of the population \(\mathbb{P}_{X;\,\theta}\) then \(\theta\) can be estimated by equating the population moments with their corresponding sample moments using the method of moments:
- Given a random sample \(\underline{X}\) distributed according to the population \(\mathbb{P}_{X;\,\theta}\) and assuming \(\theta\) contains \(k\) parameters to be estimated, the first \(k\) sample moments are calculated.
- The sample moments are equated with their corresponding population moments.
- The system of equations is solved to find the unknown \(\theta.\)
Given an observed random sample \(\underline{x}\) of size \(n\) from a population \(\mathbb{P}_{X;\,\theta}\) with pdf \(f_{X;\,\theta}\left(x;\,\theta\right),\) the likelihood function \(\mathcal{L}\left(\theta\mid\underline{x}\right)\) is the joint pmf or pdf of \(\underline{x}\) expressed in terms of \(\theta:\)
\[\mathcal{L}\left(\theta\mid\underline{x}\right)=f_{\underline{X};\,\theta}\left(\underline{x};\,\theta\right)\]Given that \(\underline{X}\) contains iid random variables then the likelihood can be expressed as:
\[\mathcal{L}\left(\theta\mid\underline{x}\right)=\prod_{i=1}^n f_{X;\,\theta}\left(x_i;\,\theta\right)\]In the case where \(\underline{X}\) contains independent but not identically distributed random variables, the likelihood is:
\[\mathcal{L}\left(\theta\mid\underline{x}\right)=\prod_{i=1}^n f_{X;\,\theta,i}\left(x_i;\,\theta\right)\]The maximum likelihood estimate (MLE) \(\hat{\theta}\left(\underline{x}\right)\) is the value of \(\theta\) that maximises the likelihood for a given \(\underline{x}:\)
\[\hat{\theta}\left(\underline{x}\right)=\operatorname*{arg\,max}_{\theta\in\Theta}\,\mathcal{L}\left(\theta\mid\underline{x}\right)\]The likelihood function often contains exponential terms. The log-likelihood function is defined as:
\[\mathcal{l}\left(\theta\mid\underline{x}\right)=\log_e\mathcal{L}\left(\theta\mid\underline{x}\right)\]Because logarithms are monotonic functions then \(\hat{\theta}\left(\underline{x}\right)\) also maximises the log-likelihood \(\mathcal{l}\left(\theta\mid\underline{x}\right)\) and is often more practical to maximise:
\[\hat{\theta}\left(\underline{x}\right)=\operatorname*{arg\,max}_{\theta\in\Theta}\,\mathcal{l}\left(\theta\mid\underline{x}\right)\]The maximum likelihood estimator (also MLE) is the estimator \(\hat{\theta}\left(\underline{X}\right).\)