Preparation before you choose a test

Before you use the statistical test selector, it helps to understand some key ideas about your data and to do some basic checks. This page is designed to guide you through that preparation so you can use the selector more confidently and interpret the result more appropriately.


Part 1: Understand your research question and your data


Before you can choose an appropriate statistical test, you need to be clear about what you are trying to find out

and what kind of data you have.

What are you trying to find out?

Your research question is the starting point for everything that follows. Are you trying to compare two groups, explore the relationship between variables, predict an outcome, or describe patterns in your data? Different questions require different kinds of analysis. Before choosing a test, it helps to write your question in simple language and be clear about what you are trying to discover.

What kind of variables do you have?

The kind of variables you have will affect which analyses are appropriate. Some variables represent categories or groups, such as gender, treatment group, or marital status. Others represent scores or quantities, such as age, income, blood pressure, or questionnaire totals. Before choosing a test, be clear about which variables you are comparing, predicting, or relating to one another.

What is the difference between independent and dependent variables?

In many analyses, one variable is treated as the outcome you are interested in, and another is treated as the variable that may explain, predict, or influence it. The outcome variable is often called the dependent variable. The variable that is used to compare groups or predict the outcome is often called the independent variable. Being clear about which variable plays which role will help you choose an appropriate analysis.

What else do you need to know before choosing a test?

It also helps to understand a few other ideas before choosing an analysis. These include whether your data are approximately normally distributed, whether there are unusual scores or outliers, whether your groups are similar in size, and whether there are missing values in your data. You do not need to be an expert before using the selector, but having some awareness of these issues will make the process easier.

Common terms you will see in the selector

The selector may refer to terms such as normal distribution, skewness, outliers, missing values, sample size, and statistical significance. You do not need to master all of these before you begin, but it helps to recognise that they refer to features of your data that can affect which analyses are appropriate. The more familiar you are with these ideas, the easier it will be to answer the selector questions with confidence.


Part 2: Check your data before analysis


Before choosing a statistical test, it is a good idea to look closely at your data. This does not need to be complicated, but some basic checks can help you understand what your data look like and whether there are any obvious problems that may affect your analysis.

Check for errors and missing values

Check your data for obvious entry errors, impossible values, or missing information.

Are there values that fall outside the expected range?

Are some participants missing large amounts of data?

Even simple checks like these can prevent problems later and help you understand whether your dataset is ready for analysis.

Look at the distribution of your scores

It is useful to look at how your scores are distributed before choosing an analysis. Are most scores clustered around the middle, or are they heavily skewed to one side? Are there unusually high or low values? Looking at histograms, boxplots, or simple descriptive statistics can help you see whether your data are roughly normal or whether they may need a different analytical approach.

Check for outliers and unusual values

Sometimes a dataset contains a few unusually high or unusually low values that are very different from the rest. These are often called outliers. Outliers may be genuine values, or they may be errors. Either way, they can affect the results of some analyses, so it is worth identifying them before you choose a test.

Look at a scatterplot when examining relationships

If you are interested in the relationship between two variables, look at a scatterplot first. It can show whether the relationship is roughly linear and whether unusual points may distort the result. This helps you judge whether a simple correlation is likely to be appropriate.

Check whether your groups are similar in size

If your analysis involves comparing groups, it helps to see whether the groups are roughly similar in size. Very unequal group sizes in small samples can affect how stable or reliable some results are. Knowing this in advance will help you interpret the selector questions more accurately.

Now you are ready to use the selector

Once you have clarified your research question, understood your variables, and done some basic checks on your data, you will be in a much better position to use the statistical test selector. You do not need perfect knowledge before you begin, but a little preparation will make the questions easier to answer and the results easier to interpret.