This section covers some basic aspects of statistical analysis which are very often encountered in the course of any student’s normal progression. Most of the time, it is about comparing two or more groups to define whether there exists a difference or an association while considering a specific parameter or variable. But it isn’t only about that. It is important to understand a few concepts about the data to analyse. What is your dataset made of? What do you expect? What do you want to know? How much or what do you already know about the data?
Descriptive statistics give you a meaningful, quantitative overview over your sample and helps you summarizing an overwhelming amount of data into something somehow more comprehensible. It is generally a first step into statistical analysis, even though it only reflects what your data are made of.
As surprising as it seems, one does not need more than a single sample to perform statistical analysis! One-sample analysis allows you to compare the known mean of a large population to the mean of a specific subpopulation or sample. This means that you may check whether a limited group of individuals exhibits the same “properties” as the population they originate from.
It’s now time for things to get serious: we have more than one sample! Actually two! But what do we want to do with these two samples? Compare their variances? Their means? And what about correlating two variables? Here is a series of useful tests and functions in R to take care of these two groups of data.
Analysis of variance (a.k.a. ANOVA) is used to compare the means of two or more groups. Unsurprisingly, the way ANOVA works is by comparing variances (hence the name Analysis of Variance…). Variables must be categorical, and will often be called factors. There are several designs for ANOVA, depending on the number of variables to be compared and on whether samples are measured several times during the course of an experiment.
After having performed an F-test (one-way ANOVA, two-way ANOVA,…) which has indicated that there exists a difference between the group means, post hoc tests help defining which group means are significantly different from each other. Such tests are never to be used when the F-test does not show the existence of significant differences. Note that some parameters in post hoc tests must be carefully chosen based on the experimental design to avoid “false positive”.
Regression analysis is a useful statistical tool that helps you making sense of your data by finding and defining relationships between variables. Most often, it starts with a plot (scatter plot of your dataset) onto which a line or a curve will be overlayed. Let’s see a couple of techniques that allow drawing those lines and curves.