t-test

Notes are primarily taken from the textbook Mathematical Statistics and Data Analysis by John Rice 3rd edition

Usage:

Test if two samples X1,...,Xn X_1, ..., X_nand Y1,...,YmY_1, ..., Y_m have the same mean. The assumption is that XX and Y Y are iid samples from a normal distribution with the same variance σ2\sigma^2. (There is variation method for non-homogeneous variance assumption). If the underlying distributions are not normal but the sample size are large, the use of t distribution can be justified by the CLT. But at the same time, a high degree of freedom t distribution is similar to normal distribution.

Motivation:

Notice the MLE estiamte for μxμy\mu_x - \mu_yis XˉYˉ\bar{X} - \bar{Y}. Since X,YX, Y are normally distributed, XˉYˉ\bar{X} - \bar{Y} can be expressed as a linear combination of independent normally distributed random variables with the distribution:

XˉYˉN(μxμy,σ2+(1n+1m))\bar{X} - \bar{Y} \sim N\left(\mu_x - \mu_y, \sigma^2 + (\frac{1}{n} + \frac{1}{m})\right)

So then if we know the variance σ2\sigma^2, we can construct the confidence interval for μxμy\mu_x - \mu_ybased on standard normal

Z=(XˉYˉ)(μxμy)σ1n+1m Z = \frac{(\bar{X} - \bar{Y}) - (\mu_x - \mu_y)}{\sigma \sqrt{\frac{1}{n} + \frac{1}{m}}}

However, generally σ2\sigma^2is unknown, and estimated from the pooled sample variance

sp2=(n1)sx2+(m1)sy2m+n2sx2=1(n1)i=1n(xiXˉ)2 \begin{align*} & s_p^2 = \frac{(n-1)s_x^2 + (m-1)s_y^2}{m+n-2} \\ & s_x^2 = \frac{1}{(n-1)}\sum_{i=1}^n (x_i - \bar{X})^2 \end{align*}

Formal Definition:

The following statistics follow a tt distribution with m+n2m+n-2 degrees of freedom

t=(XˉYˉ)(μxμy)sp1n+1mt = \frac{(\bar{X}-\bar{Y}) - (\mu_x - \mu_y)}{s_p \sqrt{\frac{1}{n} + \frac{1}{m}}}

The confidence interval of 100(1α)100 * (1-\alpha) for the difference in mean can be constructed as

(XˉYˉ)±tm+n2(α/2)sXˉYˉsXˉYˉ=sp1n+1m \begin{align*} & (\bar{X} - \bar{Y}) \pm t_{m+n-2} (\alpha / 2)s_{\bar{X}- \bar{Y}}\\ & s_{\bar{X}- \bar{Y}} = s_p \sqrt{\frac{1}{n} + \frac{1}{m}} \end{align*}

The alternative hypothesis can be expressed as:

Two-sided:t>tn+m2(α/2)One sided greater:t>tn+m2(α) \begin{align*} & \text{Two-sided}: |t| > t_{n+m-2}(\alpha / 2) \\ & \text{One sided greater}: t > t_{n+m-2}(\alpha) \end{align*}

Last updated