Tuesday, May 3, 2011

Osama Bin Laden's DNA? 99.9% Accuracy and 0.1% Nonsense

This morning the New York Times reported that genetic analysis established "with 99.9 percent accuracy" that the man killed by U.S. soldiers in Pakistan and quickly buried at sea was Osama Bin Laden. "Officials said they collected multiple DNA samples from Bin Laden's relatives in the years since the Sept. 11 attacks. And they said the analysis, which was performed the day Bin Laden was killed but after his body was buried at sea, confirmed his identity with 99.9 percent accuracy." [1] The 99.9% figure is quoted in other stories and commentary as a precise statement of the probability that the body was Osama's.

But where would a number like 99.9% come from? In a typical criminal case, the issue is whether a trace of DNA left at a crime-scene or on a victim and a sample from a suspect or defendant share a sufficient number of highly variable features to justify the inference that they originated from the same individual. One can compute the probability of the match under different hypotheses. One hypothesis (I shall call it H) is that the defendant is indeed the source. A rival hypothesis (U) is that an unrelated person is. Other alternatives are F, that a father of the suspect is the source, or S, that a full sibling is. Still other relationships between the trace DNA and its source can be envisioned.

Empirically determined frequencies of the distinct DNA features (the alleles at each locus) can be combined according to a population genetics model to estimate the probability that an unrelated individual, a parent or child, a sibling, etc., would be born with the DNA profile in question. This is the probability of the DNA data if a given hypothesis is true. Suppose these probabilities are P(data | U) = 1/1012, that P(data | F) = 1/105, and that P(data | S) = 1/107. Ignoring the chance of laboratory error, the probability of the data if H is true is P(data | H) = 1. These conditional probabilities (for data, given hypotheses) often are called likelihoods. [2]

It is important to understand that none of these numbers is the probability, P(S | data), that the suspect is the source given the genetic data--the 99.9% figure. To find this probability, we would need to know the probability of all the hypotheses before considering the genetic data. Bayes' rule then would permit us to combine these prior probability with the likelihoods. Using genetic data alone, however, it is not possible to state the probability of H. For that, we would need subjective probabilities based on nongenetic information. However, the data can produce likelihoods that swamp any reasonable choice for the prior probability, justifying assertions that the posterior probability exceeds a figure like 99.9%. The box gives an example.

BOX: Sample Computation with Bayes' Rule

With likelihoods like L(U) = P(data | U) = 1/1010 and L(H) = P(data | H) = 1, the posterior probability will not be sensitive to the choice of the prior probabilities. Confining the analysis to the four hypotheses and assuming that the priors are P(H) = 0.7 and P(U) = P(F) = P(S) = 0.1, Bayes' rule tells us that

P(H | data)
= P(H) L(H)
P(H) L(H) + P(U) L(U) + P(F) L(F) + P(S) L(S)
= (.7)(1)
(.7)(1) + (.1)(10-12) + (.1)(10-5) + (.1)(10-7)

Because the likelihoods for all the rival hypotheses are orders of magnitude smaller than that for H, the weighted prior probabilities in the denominator are negligible, and P(H | data) is close to 1. The "accuracy" of the identification is even greater than 99.9%.

The Bin Laden case probably is different. Although Bin Laden had contacts with journalists before 9/11, presumably, the CIA had no sample of Osama's DNA to compare to the body. But it could have obtained some DNA samples of at least one relative--Osama had a remarkable number of half-siblings and children. Kinship analysis is commonly used to produce likelihood ratios for a given relationship (such as paternity or siblingship). [3]

ABC News reported that the government used DNA from the brain of a half-sister who had died at Massachusetts General Hospital in Boston [1], but that level of relationship, standing alone, probably is too weak to give a large enough likelihood ratio to warrant assertions of "99.9 accuracy" (unless a huge number of loci were involved).

Could the government have had a sample from Osama's son, Omar? Why not? ABC News interviewed him in 2010. [4] CNN did an interview in 2008. He was deported from England. (For that matter, eight days after 9/11/2001, at least 13 relatives, along with bodyguards and associates, left Boston on a chartered Ryan Airlines flight.) In the U.S. and elsewhere, police and private individuals have followed people around to get DNA samples without their knowledge. [5, 6]

With Omar's DNA to compare against a sample from the body, Y-STRs combined with those from other chromosomes should have been enough to produce a very large likelihood ratio (relative to an unrelated man) for paternity if the body was indeed Omar's. But were all the men in the compound that was assaulted unrelated to Osama? The likelihood ratio would be smaller for a comparison to one of Osama's half-siblings (through Osama's father) . Even with respect to the hypothesis that the body was a half-brother, however, the likelihoods could be quite convincing for a substantial number of loci. If the likelihood ratios for an uncle or for an unrelated man are many times smaller than that for paternity, the genetic evidence strongly favors paternity.

It also is reported that a different son was killed in the raid. Comparing samples from both bodies could help establish the father-son of those two bodies. Similar analyses helped demonstrate the identities of bones found in a mass grave in Siberia as members of the Russian royal Romanov family. [2]

In short, the claim that kinship testing with DNA from relatives of Osama Bin Laden establishes his death is credible, but 99.9% seems like a metaphor rather than the result of a direct computation. You cannot get around Bayes' theorem. A posterior probability like 99.9% has to reflect some prior probability. That said, if we assume that the prior probability based on photographs and other information is substantial and that the likelihood for unrelated men and half-brothers of Osama are small relative to the likelihood for Osama, the posterior probability could well equal or exceed 99.9%.


1. Donald G. Mcneil Jr. & Pam Belluck, Experts Say DNA Match Is Likely a Parent or Child , N.Y. Times, May 3, 2011, at F2

2. David H. Kaye, The Double Helix and the Law of Evidence (2010)

3. Leslie G. Biesecker et al., DNA Identifications After the 9/11 World Trade Center Attack, 310 Science 1122 (2005)

4. Lara Setrakian, Bin Laden's Son: Worst Is Yet to Come, ABC News International, May 2, 2011, http://abcnews.go.com/International/osama-bin-ladens-son-death-unleash-violent-enemies/story?id=13509779

5. Amy Harmon, Stalking Strangers' DNA to Fill In the Family Tree, New York Times, April 2, 2007

6. Tracy Johnson, Police Ruse Case Argued Before State's Highest Court: Convicted Murderer Says Officers Broke Law with DNA Trick, Seattle Post-intelligencer Reporter, Jan. 27, 2006

Cross-posted: Double Helix Law blog

No comments:

Post a Comment