Introductory Statistical Analysis – “t-test”

A t-test is a statistical test used to determine whether there is a significant correlation between two sets of data. It works by first formulating a null hypothesis and then running the t-test, and in the process either validating or invalidating the null hypothesis. The test is run on sample sizes that are assumed to be small and normally distributed, and relies on an estimate of standard deviation.

History of the T-test

The t-test was developed by William Gosset in Dublin in the year 1908, a statistician working for the popular Guinness-brand brewery in Ireland. Here, he used the method of the t-test to improve the efficiency of the brewing process, giving the company an edge over its competition. The t-test was used as an efficient method to measure product quality. Although he was not allowed to reveal the nature of its development, many of his colleagues were able to identify his work, and thus we are aware of the developer’s identity today.

Null Hypothesis

The null hypothesis is a plausible scenario which may explain a given set of data. This is opposite to the alternative hypothesis, which suggests that an observation is genuine. Therefore, the null hypothesis hypothesizes that there is a difference between two samples of data, while the alternative hypothesis aims to show that they coincide. A null-hypothesis test, therefore, asks the question: “Assuming that the null hypothesis is true, what is the probability of observing a value for the test statistic that is at least as extreme as the value that was actually observed?” [wikipedia]

Determining Sample Type

Samples may be from two distinct, unrelated areas, or from more than one measurement of the same situation or a repeated measurement.

Fractional Nature of T-Test

The t-test is a ratio between the difference of the two means of the two sets of data, and the variability (depression) of the scores. Therefore, t = (mean1 – mean2) / (variability of scores).

Risk Level

Also called the alpha level, it describes the threshold around which data is said to either demonstrate a significant correlation or not. A 0.05 alpha level means that in 5% of the data, a statistically significant difference would be observed between the two sets. Acceptable alpha values can be determined from a table.

Conclusion

The t-test is certainly a powerful tool for the average worker and the statistician alike. By comparing measurements we can learn more about our world by identifying trends and therefore natural laws. The t-test is truly one of man’s most powerful analytical tools.