PREPARATION GUIDE

Before You Start

Everything you need to know before choosing a statistical test — from understanding your research question to checking your data.

PART 1 — YOUR RESEARCH QUESTION

What are you trying to find out?

Before you can choose an appropriate statistical test, you need a clear sense of your research question and what you are trying to discover. The questions below will help you think this through.

What kind of question are you asking?

Different research questions call for different kinds of analysis. Are you trying to explore a relationship between variables — for example, whether age is related to blood pressure? Are you comparing groups — for example, whether men and women differ in their anxiety scores? Are you trying to predict an outcome — for example, which factors best predict academic performance? Or are you simply describing patterns in your data? Being clear about your question will point you toward the right type of analysis.

How many variables are involved?

Some analyses involve just two variables — one outcome and one predictor or group. Others involve multiple variables at once. Knowing how many variables are involved — and what role each one plays — is an important step before choosing a test. The Stats Selection Tool will ask you about this directly.

Is your study experimental or observational?

In an experimental study, participants are randomly assigned to conditions or treatment groups. In an observational study, groups are defined by characteristics participants already have, such as gender, age group, or diagnosis. This distinction matters because it affects what conclusions you can draw from your results and which tests are most appropriate.

PART 2 — TYPES OF VARIABLES

What kind of data do you have?

The type of data you are working with is one of the most important factors in choosing a statistical test. There are two broad categories to understand.

CATEGORICAL

Categorical variables

Categorical variables place people or cases into groups. They represent qualities or classifications rather than measurements. The numbers assigned to categories (if any) are simply labels — they have no mathematical meaning.

Examples: gender, treatment group, blood type, highest level of education, country of birth, yes/no responses

CONTINUOUS

Continuous variables

Continuous variables represent measured quantities where the numbers carry real mathematical meaning. Scores can be ranked, compared, and used in arithmetic. Many statistical tests assume this type of variable for the outcome measure.

Examples: age, height, weight, blood pressure, reaction time, questionnaire total scores, income

A note on ordinal data. Some variables fall between these two categories. Likert-scale items (for example, 1 = Strongly disagree to 5 = Strongly agree) are technically ordinal — the categories have a rank order, but the gaps between them may not be equal. In practice, when ordinal items are summed into a total score, researchers often treat the resulting scale as continuous.

PART 3 — VARIABLE ROLES

Independent and dependent variables

In many analyses, variables play different roles. Understanding these roles will help you answer the selector questions accurately.

INDEPENDENT VARIABLE

The predictor or grouping variable

This is the variable you use to define groups, make comparisons, or predict an outcome. It is sometimes called the predictor variable, grouping variable, or factor. It is the variable you expect may explain or influence the outcome.

DEPENDENT VARIABLE

The outcome variable

This is the variable you are measuring as your outcome of interest — what you are trying to understand, compare, or predict. It is sometimes called the outcome variable or response variable.

Example. A researcher wants to know whether a new training programme improves job performance. The training programme (whether or not someone received it) is the independent variable. Job performance scores are the dependent variable. The researcher is asking whether the independent variable predicts or influences the dependent variable.

In correlational research. When you are simply exploring whether two continuous variables are related to each other — rather than predicting one from the other — the distinction between independent and dependent variables is less critical.

PART 4 — CHECKING YOUR DATA

Five checks before you begin

Before choosing a statistical test, it is worth taking a closer look at your data. These five checks will help you avoid common problems and answer the selector questions with more confidence.

Check for errors and impossible values

Look for values that fall outside the expected range — for example, an age of 220, or a score of 8 on a 5-point scale. These may indicate data entry errors. Checking the minimum and maximum values for each variable using descriptive statistics is a quick and effective way to identify obvious problems before analysis.

Check for missing values

Identify which variables have missing data and how much. A small amount of missing data is common and manageable. However, if a substantial proportion of cases are missing values on a key variable, this may affect which analyses are appropriate and whether your results can be trusted. Most statistical software can display the number of missing cases for each variable.

Look at the distribution of your scores

Many common statistical tests assume that scores on the dependent variable are approximately normally distributed, particularly in small samples. Looking at a histogram or boxplot for your key variables will help you see whether scores are roughly symmetrical or whether there is notable skewness. If your data are severely skewed or your sample is small, a non-parametric test may be more appropriate.

Check for outliers

Outliers are values that are unusually high or low compared to the rest of your data. They can occur because of genuine variation in the sample, or because of data entry or measurement errors. Either way, outliers can have a strong influence on the results of some analyses, particularly those based on means. Identifying them before you begin will help you make an informed decision about how to handle them.

Check your group sizes

If your analysis involves comparing groups, note whether the groups are roughly similar in size. Very unequal group sizes — particularly in small samples — can affect the stability and reliability of some statistical tests. Knowing this in advance will help you interpret the selector questions and the resulting recommendation more accurately.

PART 5 — KEY TERMS

Terms you will see in the selector

The Stats Selection Tool uses a small number of statistical terms. You do not need to be an expert, but recognising these ideas will help you answer the questions with confidence.

Normal distribution

A bell-shaped distribution in which most scores cluster around the middle, with fewer scores at the extremes. Many statistical tests assume that scores are approximately normally distributed, especially in small samples.

Skewness

A measure of how asymmetrical a distribution is. A distribution is positively skewed if most scores are low with a few very high scores, and negatively skewed if most scores are high with a few very low scores. Marked skewness may affect the choice of test.

Outlier

A score that is unusually high or low compared to the rest of the data. Outliers can occur because of genuine variation in the sample or because of errors. They may have a large influence on the results of some analyses.

Statistical significance

A result is described as statistically significant when the probability of observing it by chance, if there were no real effect in the population, falls below a pre-specified threshold — usually .05. Statistical significance does not tell you about the size or practical importance of an effect.

Sample size

The number of participants or cases in your dataset. Sample size affects the reliability of your results and the statistical power of your analysis — that is, the ability to detect a real effect if one exists. Larger samples generally produce more stable and trustworthy estimates.

Parametric vs non-parametric

Parametric tests make assumptions about the distribution of the data (eg. that scores are approximately normally distributed). Non-parametric tests make fewer assumptions and may be more appropriate when data are clearly non-normal, when the dependent variable is ordinal, or when the sample is very small.

YOU ARE READY

Choose your statistical test

Now that you understand your research question, your variables, and your data,

you are ready to use the Stats Selection Tool.

Go to Stats Selection Tool →