The 20% Statistician: September 2016

I think it was somewhere in the end of 2012 when my co-authors and I received an e-mail from Greg Francis pointing out that a study we published on the relationship between physical weight and importance was ‘too good to be true’. This was a stressful event. We were extremely uncertain about what this meant, but we realized it couldn’t be good. For me, it was the first article I had ever published. What did we do wrong? How serious was this allegation? What did it imply about the original effect? How would this affect our reputation?

As a researcher who gets such severe criticism, you have to go through the 5 stages of grief. Denial (‘This doesn’t make any sense at all’), anger (‘Who is this asshole?’), negotiation (‘If he would have taken into account this main effect which was non-significant, our results wouldn’t be improbable!’), depression (‘What a disaster’), until, finally, you reach acceptance (‘OK, he has somewhat of a point’).

In keeping with the times, we had indeed performed multiple comparisons without correcting, and didn’t report one study that had not revealed a significant effect (which we immediately uploaded to PsychFileDrawer).

Before Greg Francis e-mailed us, I probably had heard about statistical power, and knew about publication bias, but receiving this personal criticism forced me to kick my understanding about these issues to a new level. I started to read about the topic, and quickly understood that you can’t have exclusively significant sets of studies in scientific articles, even when there is a true effect (see Schimmack, 2012, for a good explanation). Oh, it felt unfair to be singled out, when everyone else had a file-drawer. We joked that we would from now on only submit one-study papers to avoid such criticism (the test for excessive significance can only be done on multiple study papers). And we didn’t like the tone. “Too good to be true” sounds a lot like fraud, while publication bias sounds almost as inevitable as death and taxes.

But now that some time has passed, I think about this event quite differently. I wonder where I would be without having had this criticism. I was already thinking about ‘Slow Science’ as we tended to call it in 2010, and had written about topics such as reward structures and the importance of replication research early in 2012. But if no-one had told me explicitly and directly that I was doing things wrong, would I have been equally motivated to change the way I do science? I don’t think so. There is a difference between knowing something is important, and feeling something is important. I had the opportunity to read about these topics for years, but all of a sudden, I actually was reading about these topics. Personal criticism was, at least for me, a strong motivating force.

I shouldn’t be surprised by this as a psychologist. I know there is the value-action gap (the difference between saying something is important, and acting based on those beliefs). It makes sense that it took slightly hurtful criticism for me to really be motivated to ignore current norms in my field, and take the time and effort to reflect on what I thought would be best practices.

I’m not saying that criticism has to be hurtful. Sometimes, people who criticize others can try to be a bit more nuanced when they tell the 2726^th researcher who gets massive press attention based on a set of underpowered studies with all p-values between 0.03 and 0.05 that power is ‘pretty important’ and the observed results are ‘slightly unlikely’ (although I can understand they might be sometimes a bit too frustrated to use the most nuanced language possible). But I also don’t know how anyone could have brought the news that one of my most-cited papers was probably nothing more than a fluke in a way that I would not have felt stressed, angered, and depressed, as a young untenured researcher who didn’t really understand the statistical problems well enough.

This week, a large scale replication of one of the studies on the weight-importance effect was published. There was no effect. When I look at how my co-authors and myself responded, I am grateful for having received the criticism by Greg Francis years before this large scale replication was performed. Had a failure to replicate our work been the very first time I had been forced to think about the strength of our original research, I fear I might have been one of those scholars that responds defensively to failures to replicate their work. It would be likely that we would have only made it to the ‘anger’ stage in the 5 steps towards acceptance. Without having had several years to improve our understanding of the statistical issues, we would likely have written a very different commentary. Instead, we simply responded by stating: “We have had to conclude that there is actually no reliable evidence for the effect.”

I wanted to share this for two reasons.

First, I understand the defensiveness in some researchers. Getting criticism is stressful, and reduces the pleasure in your work. You don’t want to spend time having to deal with these criticisms, or feel insecure about how well you are actually able to do good science. I’ve been there, and it sucks. Based on my pop-science understanding of the literature on grief processing, I’m willing to give you a month for every year that you have been in science to go through all 5 stages. After a forty-year career, be in denial for 8 months. Be angry for another 8. But after 3 years, I expect you’ll slowly start to accept things. Maybe you want to cooperate with a registered replication report about your own work. Or maybe, if you are still active as a researcher, you want to test some of the arguments you proposed while you were in denial or negotiating, in a pre-registered study.

The second reason I wanted to share this is much more important. As a scientific community, we are extremely ungrateful to people who express criticism. I think the way we treat people who criticize us is deeply shameful. I see people who suffer blatant social exclusion. I see people who don’t get the career options they deserve. I see people whose work is kept out of prestigious journals. Those who criticize us have nothing to gain, and everything to lose. If you can judge a society by how it treats it weakest members, psychologists don’t have a lot to be proud of in this area.

So here, I want to personally thank everyone who has taken the time to criticize my research or thoughts. I know for a fact that while it happened, I wasn’t even close to as grateful as I should have been. Even now, the eight weeks of meditation training I did two years ago will not be enough for me not to feel hurt when you criticize me. But in the long run, feel comforted that I am grateful for every criticism that forces me to have a better understanding of how to do the best science I can do.

The 20% Statistician

Monday, September 19, 2016

Why scientific criticism sometimes needs to hurt