It’s that time of the year again. The time when undergraduate students in Psychology have collected their data, and are furiously trying to get it analyzed and written up, in time for their dissertation deadline. It’s also the time of year when students tend to panic about the “right” way to analyze their data. But – as far as statistics go – there is no single right way to go about things. In fact, there is as much debate about doing statistics in Psychology as there is about psychological theories themselves – with whole journals dedicated to the topic. When it comes to dissertation stats, there is no single right way here, either…I explain.
I have one manipulated variable (call it Experimental Condition) and two continuous variables that I measured, Measure A, and Measure B. I want to know how Measure A and Experimental Condition interplay to influence Measure B. I have met all the assumptions for parametric data analysis. Although that research question is clearly defined, there are still several ways I could go about this.
One way would be to perform a linear regression, entering Experimental Condition as a dummy variable and Measure A (centered about the mean) as predictors of my outcome, Measure B. If I found any interaction, I could analyse it using a simple slopes analysis. That would answer my research question.
Equally viable, however, would be to run this analysis using ANOVA – because the maths underlying ANOVA and regression analyses are essentially the same. You can check this for yourself, by running the two analyses on the same variables: you will find that because both rely on what is called the General Linear Model the R squared value is the same for each. The distinction between the two in teaching terms is really just an historical artefact arising because ANOVA has been traditionally used for experimental designs and regression for correlational designs. It doesn’t have to be that way: whether the analysis you do make any sense depends on what you were trying to find out, more than anything.
Anyway – if I ran this analysis using ANOVA, there are two ways I could go about it. I could continue to treat measure A as a continuous variable and, in SPSS at least, force the program, via the syntax editor, to treat measure A as continuous but nevertheless a bona fide fixed factor, by adding it after the WITH sub-command:
Measure_B BY Experimental Condition with Measure A_centred
I could, however, legitimately perform a median split on Measure A, creating a new variable where people are coded as either high A-scorers or low A-scorers. I would then enter Measure A _ split into the ANOVA alongside Experimental Condition, as above.
In either case, if I found an interaction between Measure A and Experimental Condition, I would analyse it using a simple effects analysis (to look at the effect of Experimental Condition at differing scores on Measure A).
The Right Way?
So – either ANOVAs or regression could be used for the above research question. Neither way is “wrong” although statisticians will point out the advantages and disadvantages to each approach. The classic disadvantage to median splits, for example, is that I would lose some of the variance provided in the variable scores (because I have changed a continuous variable to a dichotomous one).
Of course, that said, there are some things that we need to do, for any of the above options to be “right” before we run those tests. Here is a checklist, courtesy of Tabachnik and Fiddell (2007) – with the health warning that, the debate around statistics rages on, and these are guidelines – one high-profile journal in Psychology decided earlier this week that reporting p values is inappropriate full-stop….
(1) Before you do anything, check for missing values and cases where weird stuff seems to be happening. Work out what is weird, and consider deletion of these cases, or checking against the questionnaires for human error in data entry.
(2) Check you meet the assumptions for the tests you want to do. See Tabachnik and Fiddell (2007) for myriad guidelines on what to do with the data, if you fail to meet an assumption.
(4) If you do perform extra post-hoc tests, because something interesting has come up, don’t be afraid to admit to that. There are ways of statistically adjusting for the probability of finding significant results in such cases, and the important thing is being transparent about what we are doing as scientists, to allow effective evaluation of findings.
So – to sum this up, before you do anything with your data, look at it. Is it weird? Is it normal(ly distributed)? Can you use parametric statistics or not? Then, work out what research question you would like to answer, and what types of variable you now have. Based on this, choose among the options for answering that research question. All the time, remember to be transparent about the analysis and post-hoc tests that you are using. Just as one rationalizes the inclusion of different variables in your study in the Introduction, the Results section should give a rationale for what you have done with each variable, why, and what was found. Statistics in Psychology is about having a rationale, rather than a “right” answer.