When I was a kid, I’d joke that my brother must have been adopted—his eye color, hair, and height didn’t match mine or our parents’. Even without formal training in genetics, most of us intuitively expect children to resemble their parents. This expectation lies at the heart of heritability: the proportion of variation in a trait within a population that can be explained by genetic factors.

We can estimate heritability without knowing the exact genes involved. For example, comparing identical twins shows that height is about 80–90% heritable. Similarly, schizophrenia has an estimated heritability of ~80%. But here’s the puzzle: if we sequenced the human genome more than 20 years ago, why do we still struggle to pinpoint the specific genetic causes of these highly heritable traits and diseases?


The search for genetic causes

To uncover genetic risk factors, researchers run genome-wide association studies (GWAS). These compare DNA sequences from people with a disease to those without it, searching for differences. GWAS have identified thousands of genetic variants—but together they usually explain only a small fraction (often ~5%) of the expected heritability.

This discrepancy is called the “missing heritability problem.” Where is the rest hiding?


Two key explanations

  1. Complexity of genetic networks
    • Most traits aren’t caused by one gene but by hundreds—or even thousands.
    • Different combinations of mutations can produce the same outcome.
    • This redundancy makes it statistically challenging to detect rare combinations, because studies would need enormous sample sizes.
  2. Limits of current sequencing
    • Even though sequencing has outpaced Moore’s law in speed and cost, parts of the genome remain inaccessible.
    • Many important variants involve structural changes—large chunks of DNA that are inverted, duplicated, or moved. These are much harder to detect than single-nucleotide changes.

Why structural variation is so hard to capture

Several technical barriers limit our ability to map these large-scale rearrangements:

  1. Reference bias – Standard sequencing compares reads to a reference genome. Structural changes that don’t match this reference often go undetected.
  2. Repetitive DNA – Many rearrangements sit within repetitive regions (“sequencing deserts”) that are nearly impossible to map correctly.
  3. Diploidy – Humans inherit two copies of each chromosome (maternal and paternal). Conventional sequencing often collapses them into one “consensus” sequence, obscuring important differences.

Why this matters

Until we can fully capture all forms of genetic variation, much of heritability will remain unexplained. Advances in sequencing technology—especially those that can resolve structural variants and distinguish between maternal and paternal genomes—are key to solving this puzzle.

Predicting height from DNA would be a fun parlor trick. But more importantly, achieving this level of precision would represent a true mastery of biology’s language—with profound implications for medicine, disease prediction, and our understanding of life itself.