Notes are primarily taken from the textbook Mathematical Statistics and Data Analysis by John Rice 3rd edition
Usage:
Test if two samples X1,...,Xnand Y1,...,Ym have the same mean. The assumption is that X and Y are iid samples from a normal distribution with the same variance σ2. (There is variation method for non-homogeneous variance assumption). If the underlying distributions are not normal but the sample size are large, the use of t distribution can be justified by the CLT. But at the same time, a high degree of freedom t distribution is similar to normal distribution.
Motivation:
Notice the MLE estiamte for μx−μyis Xˉ−Yˉ. Since X,Y are normally distributed, Xˉ−Yˉ can be expressed as a linear combination of independent normally distributed random variables with the distribution:
Xˉ−Yˉ∼N(μx−μy,σ2+(n1+m1))
So then if we know the variance σ2, we can construct the confidence interval for μx−μybased on standard normal
Z=σn1+m1(Xˉ−Yˉ)−(μx−μy)
However, generally σ2is unknown, and estimated from the pooled sample variance