Much of what I think has already been said, but I think there is a need for examples. Correlation is a necessary but insufficient proof of causation. In other words, you have to have correlation, but by itself it is not enough. All sorts of correlations can be found in the ādataā and there may be many reasons why it does not show causation.
Letās say we go to modern day Israel, the Jewish State. We might find a correlation between serious crime, like murder or rape, and being Jewish. Before anyone accuses me of being antisemitic, there is going to be a correlation between anything, like righteous living, and being Jewish. That is because most people in any sample of anything in Israel are going to be Jewish.
Letās take another example closer to home. One might assert that there is a correlation between watching pornography and sexual crime. However, most people who watch pornography do not commit sexual crimes.
Negotiating oneās way through the tricky minefield of statistics is difficult, especially for those with no training in statistical analysis. One can even see examples of people cherry-picking results from the data to āproveā the opposite of what the data actually shows ā¦ and they get it published in a journal!
Some people are so committed to their theories that they believe their correlations actually prove it. They release it to the common media and we see it announced as fact.
The implications for law and crime can be quite serious. Let us imagine a situation where one person accuses another of sexual assault. The accusation can cover all genders. It appears to be a case of he/she says versus he/she says. Some statistical work might show that, in the majority of cases, where a woman accuses a man of rape, she tells the truth. So, we come to a particular case, and it is a matter of she says ā¦ he says. How can the jury decide? Should a statistical summary be used, or should the matter be decided by the merits of this particular case? If one chooses to go with the statistics, the matter is decided on the balance of probabilities. However, this is quite different from ābeyond reasonable doubtā. The problem is even more accentuated when the matter is decided by statistical analysis, because if that happens, the case then adds itself to the calculation of statistics. The ābalance of probabilitiesā becomes a self-fulfilling mantra.
Correlation can also be a marker for something else that is the actual cause. Let me say from the start that the following is a hypothetical case. Suppose a correlation is shown between young, single migrant men and sexually transmitted diseases. Which of the aspects of young, single migrant men is the causative factor? Is it their ethnicity, their race, their singleness, their morality or a combination of these factors? You can bet that politicians will exploit the unknown for their own political purposes.
Surveys from other parts of the world begin to suggest that the crucial factors are being young, single and alone, and being a migrant. Is this an indication of their morality? Well, young, single and alone, migrants may well be poor and so can only afford street-level prostitutes with questionable hygiene standards. Whereas young, single and alone, migrant men who are wealthy avail themselves of high-class prostitutes whose hygiene practices are State regulated.
I hope that some of the above illustrates the difficulties in teasing out causation from correlation.