Statistics for journalists: Five Ws before reporting scientific research

9 min readJul 16, 2018

Going to the beach is overestimated, or that is what I try to believe this summer because I have changed the peaceful Mediterranean for learning statistics in the UK.

Jealous?

Well, at least I can sleep without a fan by my bed.

My adventure is part of the MA in Data Journalism at Birmingham City University, and I have started by understanding a scientific paper.

I had never read a medical research before. I got those stories from press releases and… looking at other media (mostly…).

So, I decided to make a test. I found these two “curious” stories, which I would have potentially published:

- Dye hair increases the risk of breast cancer

- Women who eat too much junk food are twice as likely to be infertile

I selected the first of these stories and I looked for the original paper. But before analysing it, I read How to read a paper (Trisha Greenhalgh, 2001) and Bad Science (Ben Goldacre, 2009), I took a couple of courses in statistics, and I talked to some statisticians (exciting summer!).

After that, I came up with a sort of list.

Questions before reporting

1. Where? Check if it was published

I looked at Medline, where I found the research in a magazine that publishes peer-review studies (if you come across with a systematic review, that’s better). I typed then the keywords to know if there were other papers related to the same topic and I went through their conclusions to avoid “picking out your favourite paper to back up your prejudices,” Ben Goldacre says in Bad Science.

I found 34 studies, 44% saying there is no relationship, 26% concluding there may be, and 30% empty. This could add context in the story and enrich it.

Q1: Considering the publication bias (research which shown “positive” results are more likely to be published): Could the proportion of those concluding no relationship be higher?

2. Who? Who paid and who did it

There is a section called “Funding” that gives you the details of who is paying. In my case, several US organisations and the Philip L. Hubbell family supported as well. The institutions involved in the study can be found in the section “Acknowledgements.”

3. Which type? Observation or intervention

During the CIJ Summer School, I went to Professor Kevin McConway’s session, where he gave us some tips to read a paper. One of the first steps was knowing the type of study.

Shot from Kevin McConway’s notes (revise notes of several sessions here)

My research is a case-control, so observational. Why is that important?

“Be careful not to make causal claims from observational studies,” McConway said.

Some media did not know this (here, here, here, here), although the abstract of the study does not use the word cause but association and relationship. It also said that further investigation is needed. And the lead investigator told to Reuters:

“A lot of people have asked me if I’m telling women not to dye their hair or not to use relaxers,” she said in a phone interview. “I’m not saying that. What I think is really important is we need to be more aware of the types of exposures in the products we use.”

There is also the Bradford Hill criteria for assessing causation. But, do not wait for a kind of “numeric” range where you can compare and contrast the numbers you find…

Statistics are “subjective to a certain extent,” McConway concluded in our phone conversation when noted my frustration.

I have some concern about the plausibility and specificity of hair dyes causing cancer. (All products? A component, maybe?) But I might be biased.

Another important point is that there is a risk of confounders in observational studies; a hidden influence that affects the outcome (breast cancer) and the exposure (hair dying).

Q2: The researchers controlled for age and race. However, might there be any lifestyle habit or the use of other cosmetics that could be affecting both, the risk of breast cancer and the hair dyes?

4. How? The methods they used

“Studies which don’t report their methods fully do overstate the benefits of the treatments, by around 25%,” Goldacre says in Bad science.

That is not my case. There are Bonferroni corrections, chi-square, logistic regressions… applied in a survey, and all the courses I have taken have much to say about this method (and FiveThirtyEight as well, especially if you have a nutritional paper).

Bulletproof Data Journalism by Stijn Debrouwere

The researchers interviewed 4,285 women; around half of them with and the other half without breast cancer. They asked about their lifestyle (diet, exercise, pregnancies and medical history), their use of hair products, and they measured body size and a carried out a saliva test.

There is a risk of recall bias (participants not remembering previous events), but the main concern is the sample: how big, random and representative it is (this is a story in the New York Times about a research repeated due to problems in the sample).

All the women surveyed came from New York and some places in New Jersey. And, as it is a case-control, the participants were chosen according to have or not have breast cancer.

So, “if you tried to calculate the risk of breast cancer from those data, it would be over 50% (because there are more cases than controls), but that doesn’t reflect the position in the population,” McConway explained to me.

Going through their methodology, I also found that they have used multivariable-adjusted model to control for confounders. And I came across with “cases (cancer) were more likely than controls to be older, have a family history and be less educated.” Age and family history are “strongly related” to breast cancer, says Cancer Research UK.

Q3: Is there any missing data?

5. Why publish? The results

Commonly, researchers publish their results when it is statistically significant, when the p-value is less than .05. But the higher the number of comparisons, the higher the chance for a false positive.

FiveThirtyEight explains here how the p-value works and why there is a risk of p-hacking. However, can we detect it?

“Uff, it isn’t easy at all,” said Julián Cárdenas, researcher and Professor at Freie Universität von Berlin.

“If the scientists say two variables are related, they have to specify the mechanism,” how and why they are related, he told me. “The multivariant analysis reduces the spurious relations,” and publications are “increasingly asking for the database to check the results,” he added.

The complementary of the p-value is the confidence interval (CI) of 95%, which is followed by the lower and upper relative risk the researchers found. But what I came across was not the relative risk, but the odds ratio (OR), a slightly different measure.

That is “the probability of something happening divided by the probability of something not happening,” said Cárdenas in his blog. 1 means the exposure makes no difference, and the bigger the number, the stronger the relationship.

“But,” McConway warned me, “among the problems of the odds ratio is whether an increase or decrease of the risk is medically or scientifically important, and that is not a statistical question.” “If the research is about cancer (like mine) a small increase might be important, but the same observation in (a paper) about common colds, it could be not so important,” he added.

Q4. What is the baseline risk?

Q5. What is the mechanism that relates dye hair to breast cancer?

6. BONUS: Read the ‘Discussion’

This part was useful to add new details and context to the story, but mainly to decide whether to publish the story.

As an example of ‘new details,’ the sample of African American (AA) women they used is the “largest number to date.” However, the researchers admit:

“To date, there appears to be only fairly weak evidence to support a significant association between hair dye use and breast cancer risk.”

And when talking about the limitations, they highlight the recall bias, the non-considerations of brands and ingredients, the “long-term use” in the association and the fact that they are the only one associating hair relaxers and risk of cancer in white women.

Don’t forget when writing

I would think twice about publishing it; not because I could judge the scientists, but because:

1. These stories have an alarming-effect in the audience, and the researchers are already warning of the limitations to extrapolating the conclusions.

2. The lack of baseline risk adds another complication (explained below).

Nevertheless, I have another list of things to include when reporting these stories.

1. The name of the research, who did it and link to the publication.

It seems obvious, but this story in The Times, reproduced in other UK media (this one, this one, and this one looked for inspiration, too), did not include it.

2. The sample characteristics and the methods used, adding details like the questions asked or the researchers’ caveats.

3. Be specific about what is the study about and to whom it relates.

In this case, AA women who regularly use dark shades of hair dyes.

4. Translate the relative to absolute risk and give it as a natural frequency.

BBC Trust Impartiality Review states “never report simply that eating X doubles your risk of cancer. Always say it doubles the risk from 1 in 1,000 to 2 in 1,000, for example.”

That is pretty simple when we have the baseline risk and the relative risk. But, the calculation is different with OR. The baseline risk for the sample is needed, as the sample is not representative and the risk of having breast cancer for the women of the study would be different from the true population. But the study does not provide with it.

“You could do some complicated calculations (…) to deduce it — but it’s still approximate, and I don’t think it is worth trying to do all that,” McConway patiently explained to me.

“It’s fine to use the 11% risk (of breast cancer for AA women) as long as you make it clear in some way that the calculation is only illustrative.”

Ok. This is illustrative.

OR of breast cancer in AA women in US: 11/88 = .125

OR of breast cancer in AA women who regularly use dark shades dye hair: 1.51*.125 = .18875

Probability: .18875/(1+.18875) = .1588

Natural frequency: The chance of having breast cancer for AA women who regularly dye their hair with dark shades increases by 5 women in 100, from 11 to 16 in 100.

Faces crossed are the new cases of AA women who regularly use dark shades hair dyes — Cates plot

5. Mention the caveats from the research.

For example, “further examinations are needed” or “to date there appears to be only fairly weak evidence.”

6. Give context and background from the research, but also about the conclusions of similar investigations.

7. Do not exaggerate in the headline

Scientific fact-checking

One of the debates during the Summer Conference in London was whether journalists should apply the same fact-checking techniques used for politicians to scientists.

In fact, Julián Cárdenas mentioned to me the “academic publications’ bubble.”

“The number and the effect of those (scientific) publications are being used to measure the researchers’ performance. That has an impact on their salaries, promotions, and possibilities to move to other universities; and it is useful for the universities to get higher positions in rankings or better financial resources.”

But among all the journalists within that class, I was probably the less experienced in scientific reporting, and there is a whole new language that should be learned before.

Fortunately, some professionals and organisations are willing to help us. The Science Media Centre has a 10 best practice to report about science, Kevin McConway and David Spiegelhalter did a check-list “to ignore health stories,” and the NHS dedicated a “clarification” to the Daily Mail about the story of fertility and junk food. The second one that I highlighted at the beginning of the post.

Any mistake? Please, let me know. Comments are welcome, as well as pictures from the beach.