12th February 2025
I do a lot of work involving the genetics of natural populations. Natural populations are messy because genetics, environment and traits are confounded in complicated way that can make it challening to distinguish real biological signal from confounding.
One source of confounding that comes up again and again is what’s called linkage disequilibrium (LD) between genetic markers. However, I frequently find that this idea is not intuitive for my colleagues to grasp, and in fact many people really hate the term itself. In this post I want to illustrate the concept of LD by analogy to something that is more familiar to many people: speaking with an accent.
My aim here is only to build an intuition explanation of what LD is and where it comes from, and not to go into the details of how it is calculated or used in practice. For more on the issues as they relate to GWAS, I recommend looking at Veller & Coop’s paper that lays out the ways LD can arise and cause confounding in GWAS in articulate, if depressing, detail.
Vocabulary predicts behaviour
Now imagine that you had collected information on the words that people use, and also on their behaviour. I would be prepared to bet decent money that you would find a very strongly significant association between whether people say cookie or tap and, for example, and how much tea they drink or how many guns they own (they challenge would be how many Canadians were included in the study, who don’t obviously fit the lazy stereotypes I am alluding to). There is obviously no causal relationship between these things, but they are correlated because they are confounded with where people come from.
This is why LD is a challenge in genetics. Even in the absence of environmental correlations, LD between markers makes it hard to distinguish which genetic variants are actually associated with a trait, and which are just confounded with those variants. Ignoring this issue can lead to some very dubious conclusions.