Friday, May 27, 2011

There but for the grace of God ... The “horrific false-positive DNA match”

I have been trying to make sense of a 2004 newspaper report [1] that has achieved a certain degree of fame in articles and briefs emphasizing the risks of false accusations or convictions resulting from cold hits in DNA databases. The article appeared in November 2004, in the Chicago Sun-Times, with no follow-up in that paper or, as far I can see, in any other. In 2009, the Federal Public Defender in Sacramento cited it as an example of “horrific tales of false-positive DNA matches” [2, at 17].

In this tale, detectives investigating a string of burglaries “were informed of a ‘hit’ between blood recovered at the scene and the genetic profile of a woman named Diane Myers.” [1] Evidently, they were not informed that “the ‘hit’ was not based on a direct match.” It was some kind of “partial match” provided as an “investigative lead.” The article does not explain further. One of the reporters responded to a recent email that she does not recall the details. The suspect promptly cleared herself by showing that “she was locked up in a Downstate prison [when someone] slipped into the Chicago apartment Dec. 12, 2002.” [1]

What might have happened had she not been so lucky as to have an airtight alibi? “Jack Rimland, a criminal defense attorney and former president of the Illinois Association of Criminal Defense Lawyers, said ... ‘But for the fact that this woman was in prison [sic] ... I absolutely believe she'd still be in custody.’” On the other hand, “Kathleen Zellner, a Naperville attorney who relied on DNA evidence to exonerate four men ... said it was ‘reassuring’ the error was in paperwork, and not in the scientific process, and that the mistake appears to have been addressed.” [1]

All told, this incident does not seem to me to merit the appellation of “horrific,” but it does illustrate the need for the police to understand the true significance of every cold hit. When only a few loci are involved, the power of the association obviously is reduced. If anyone knows more about the laboratory report in the case, the probability of a random match for the limited number of loci involved, and how reports of database hits have changed in Illinois, please consider posting a comment or emailing me.

References

1. Annie Sweeney & Frank Main, Botched DNA Report Falsely Implicates Woman: Case Compels State to Change How It Reports Lab Findings, Chicago Sun-Times, Nov. 8, 2004, at 18.

2. United States v. Pool, Brief for Defendant-Appellant, No. 09-10303, 9th Cir. Oct. 5, 2009, at 17,
available at http://edca.typepad.com/files/pool-opening-brief.pdf

(Cross-posted from the Double Helix Law blog.)

Thursday, May 26, 2011

An Odd Ruling on DUI Error Statistics?

An article in the Lansing State Journal [1] begins as follows:
Blood tests in drunken-driving cases statewide will face more scrutiny, experts say, after a Mason County judge ruled that the state crime lab's test results "are not reliable."

In a ruling signed Friday, 79th District Court Judge Peter Wadel refused to admit blood-alcohol results in a drunken-driving case. He said the crime lab — which conducts blood and other forensic tests in cases from around the state — does not report an error rate, or margin of error, along with blood-alcohol results.

Police routinely report a single number for blood-alcohol content in drunken-driving cases. But East Lansing attorney Mike Nichols, who is handling the case in Mason County — which includes the city of Ludington along Lake Michigan — said there are no absolutes in science.

"Everyone says a blood test is so accurate. Well, it's not," Nichols said. "That's what this judge has ruled."

Not including a range of possible results, Nichols said, ignores the uncertainties in the collection, handling, analysis and reporting process.

A blood-alcohol level of 0.08 percent is the threshold in Michigan for being charged with drunken driving. But Nichols said when someone's blood-alcohol is determined to be 0.10, for example, it could actually be higher - or lower - than 0.08.
Mr. Nichols is correct — in part. Because of measurement error in blood alcohol testing, there is some chance that a true concentration of just under 0.08% could give rise to reading of 0.10%. The equipment can be calibrated and shown to have a standard error of measurement. This statistic should be included in laboratory reports, and judges should learn what it means. (The statistics chapter of the Federal Judicial Center's Reference Manual on Scientific Evidence [2] is one of many sources for an explanation.)

But what about all the other possible sources of errors, from mislabeling samples to falsifying data? "Not including a range of possible results, Nichols said, ignores the uncertainties in the collection, handling, analysis and reporting process." Analysis I get. That was the subject of the preceding paragraph. But "collection, handling, and reporting"? These are matters traditionally handled by proof of chain of custody and cross-examination. It would be difficult to report error probabilities for these matters, and no appellate court, as far as I know, ever has required it.

References

1. Kevin Grasha, DUI Blood Tests Could Face Scrutiny after Judge's Ruling, Lansing State Journal, May 11, 2011

2. David H. Kaye & David A. Freedman, Reference Guide on Statistics, in Reference Manual on Scientific Evidence (Federal Judicial Center 3d ed. 2011)

Tuesday, May 3, 2011

Osama Bin Laden's DNA? 99.9% Accuracy and 0.1% Nonsense

This morning the New York Times reported that genetic analysis established "with 99.9 percent accuracy" that the man killed by U.S. soldiers in Pakistan and quickly buried at sea was Osama Bin Laden. "Officials said they collected multiple DNA samples from Bin Laden's relatives in the years since the Sept. 11 attacks. And they said the analysis, which was performed the day Bin Laden was killed but after his body was buried at sea, confirmed his identity with 99.9 percent accuracy." [1] The 99.9% figure is quoted in other stories and commentary as a precise statement of the probability that the body was Osama's.

But where would a number like 99.9% come from? In a typical criminal case, the issue is whether a trace of DNA left at a crime-scene or on a victim and a sample from a suspect or defendant share a sufficient number of highly variable features to justify the inference that they originated from the same individual. One can compute the probability of the match under different hypotheses. One hypothesis (I shall call it H) is that the defendant is indeed the source. A rival hypothesis (U) is that an unrelated person is. Other alternatives are F, that a father of the suspect is the source, or S, that a full sibling is. Still other relationships between the trace DNA and its source can be envisioned.

Empirically determined frequencies of the distinct DNA features (the alleles at each locus) can be combined according to a population genetics model to estimate the probability that an unrelated individual, a parent or child, a sibling, etc., would be born with the DNA profile in question. This is the probability of the DNA data if a given hypothesis is true. Suppose these probabilities are P(data | U) = 1/1012, that P(data | F) = 1/105, and that P(data | S) = 1/107. Ignoring the chance of laboratory error, the probability of the data if H is true is P(data | H) = 1. These conditional probabilities (for data, given hypotheses) often are called likelihoods. [2]

It is important to understand that none of these numbers is the probability, P(S | data), that the suspect is the source given the genetic data--the 99.9% figure. To find this probability, we would need to know the probability of all the hypotheses before considering the genetic data. Bayes' rule then would permit us to combine these prior probability with the likelihoods. Using genetic data alone, however, it is not possible to state the probability of H. For that, we would need subjective probabilities based on nongenetic information. However, the data can produce likelihoods that swamp any reasonable choice for the prior probability, justifying assertions that the posterior probability exceeds a figure like 99.9%. The box gives an example.


BOX: Sample Computation with Bayes' Rule

With likelihoods like L(U) = P(data | U) = 1/1010 and L(H) = P(data | H) = 1, the posterior probability will not be sensitive to the choice of the prior probabilities. Confining the analysis to the four hypotheses and assuming that the priors are P(H) = 0.7 and P(U) = P(F) = P(S) = 0.1, Bayes' rule tells us that


P(H | data)
= P(H) L(H)
--------------------------------------------------------------
P(H) L(H) + P(U) L(U) + P(F) L(F) + P(S) L(S)
= (.7)(1)
-------------------------------------------------------------
(.7)(1) + (.1)(10-12) + (.1)(10-5) + (.1)(10-7)
>0.999

Because the likelihoods for all the rival hypotheses are orders of magnitude smaller than that for H, the weighted prior probabilities in the denominator are negligible, and P(H | data) is close to 1. The "accuracy" of the identification is even greater than 99.9%.


The Bin Laden case probably is different. Although Bin Laden had contacts with journalists before 9/11, presumably, the CIA had no sample of Osama's DNA to compare to the body. But it could have obtained some DNA samples of at least one relative--Osama had a remarkable number of half-siblings and children. Kinship analysis is commonly used to produce likelihood ratios for a given relationship (such as paternity or siblingship). [3]

ABC News reported that the government used DNA from the brain of a half-sister who had died at Massachusetts General Hospital in Boston [1], but that level of relationship, standing alone, probably is too weak to give a large enough likelihood ratio to warrant assertions of "99.9 accuracy" (unless a huge number of loci were involved).

Could the government have had a sample from Osama's son, Omar? Why not? ABC News interviewed him in 2010. [4] CNN did an interview in 2008. He was deported from England. (For that matter, eight days after 9/11/2001, at least 13 relatives, along with bodyguards and associates, left Boston on a chartered Ryan Airlines flight.) In the U.S. and elsewhere, police and private individuals have followed people around to get DNA samples without their knowledge. [5, 6]

With Omar's DNA to compare against a sample from the body, Y-STRs combined with those from other chromosomes should have been enough to produce a very large likelihood ratio (relative to an unrelated man) for paternity if the body was indeed Omar's. But were all the men in the compound that was assaulted unrelated to Osama? The likelihood ratio would be smaller for a comparison to one of Osama's half-siblings (through Osama's father) . Even with respect to the hypothesis that the body was a half-brother, however, the likelihoods could be quite convincing for a substantial number of loci. If the likelihood ratios for an uncle or for an unrelated man are many times smaller than that for paternity, the genetic evidence strongly favors paternity.

It also is reported that a different son was killed in the raid. Comparing samples from both bodies could help establish the father-son of those two bodies. Similar analyses helped demonstrate the identities of bones found in a mass grave in Siberia as members of the Russian royal Romanov family. [2]

In short, the claim that kinship testing with DNA from relatives of Osama Bin Laden establishes his death is credible, but 99.9% seems like a metaphor rather than the result of a direct computation. You cannot get around Bayes' theorem. A posterior probability like 99.9% has to reflect some prior probability. That said, if we assume that the prior probability based on photographs and other information is substantial and that the likelihood for unrelated men and half-brothers of Osama are small relative to the likelihood for Osama, the posterior probability could well equal or exceed 99.9%.

References

1. Donald G. Mcneil Jr. & Pam Belluck, Experts Say DNA Match Is Likely a Parent or Child , N.Y. Times, May 3, 2011, at F2

2. David H. Kaye, The Double Helix and the Law of Evidence (2010)

3. Leslie G. Biesecker et al., DNA Identifications After the 9/11 World Trade Center Attack, 310 Science 1122 (2005)

4. Lara Setrakian, Bin Laden's Son: Worst Is Yet to Come, ABC News International, May 2, 2011, http://abcnews.go.com/International/osama-bin-ladens-son-death-unleash-violent-enemies/story?id=13509779

5. Amy Harmon, Stalking Strangers' DNA to Fill In the Family Tree, New York Times, April 2, 2007

6. Tracy Johnson, Police Ruse Case Argued Before State's Highest Court: Convicted Murderer Says Officers Broke Law with DNA Trick, Seattle Post-intelligencer Reporter, Jan. 27, 2006

Cross-posted: Double Helix Law blog

Monday, May 2, 2011

“A Copulation of Many Years of Testifying”: Misconstruing Statistical Significance in Forensic Toxicology

No matter how many times statistics books caution readers not to transpose p-values, scientists do it anyway. My latest example comes from the Forensic Toxicology Expert Witness Handbook (2007), by James W. Jones.

The introduction explains that the handbook is "to help facilitate the training of FTDTL [Forensic Toxicology Drug Testing Laboratory] scientists in forensic toxicology expert testimony" and that "[t]he information presented is a copulation [sic] of many years of testifying as an expert witness reading and researching information and listening to many experts ... .” (Page 2).

What wisdom resides in this compilation? At page 11, we learn that
Statistically, is there a scientifically-accepted likelihood that an observed relationship is simply not due to chance? That is where the 95% confidence number comes from. A “p” value of 0.05, by convention the cutoff between statistically significant and not, is that 95% likelihood. But that percentage applies only to one of the quality criteria as to whether the science used to assess causality in a claim.

[I]f a study showed only a 51% likelihood of reflecting a true relationship rather than a chance relationship, then no scientist, no regulatory body, no one who reviews scientific data, would consider that study indicative of any causal relationship. A 51% outcome would not even merit a follow up or “validation” study by the scientific community. In the words of legalese: “The relevant scientific community would consider the use of such a study methodologically improper.”

This is gobbledygook. I think Dr. Jones is saying that p < 0.05 implies that the probability of the alternative hypothesis (given the data) exceeds 95%. But statistical significance at the 0.05 level means nothing of the kind. It only means that if the null hypothesis is true, then the probability of the data (or other data even farther from what would be expected under the null hypothesis) is less than 0.05. The p-value assumes that the null hypothesis is true: p = 0.05 means that if the null hypothesis is true, then the data are so far from what is expected that one would encounter such data only 5% of the time (or less). Because the probability 0.05 pertains to the data, and not to the hypothesis, it makes no sense to compare a p-value or its complement to the “likelihood of ... a true relationship.”

Testing for statistical significance is a way to decide whether the result merits further investigation or should be tentatively accepted as proving that the null hypothesis is false. But an alternative hypothesis could have a 51% probability of being true (if one is willing to assign probabilities to hypotheses rather than to random variables) when the p-value is, say, 0.001. According to many reviewers of scientific data, this would be a highly significant result, strongly “indicative of [a] causal relationship” (if the data came from a controlled experiment properly designed to investigate causation). In technical terms, Dr. Jones has confused a conditional probability of data given the hypothesis, P(extreme data | H0) -- the p-value -- with a posterior probability for a hypothesis, P(H0 | extreme data).

These observations about p-values and significance testing are compact. For a gentler explanation, see David H. Kaye, David E. Bernstein & Jennifer L. Mnookin, The New Wigmore on Evidence: Expert Evidence (2d ed. 2011). See also David H. Kaye, Statistical Significance and the Burden of Persuasion, 46 J. L. & Contemp. Probs. 13 (1983); David H. Kaye, Is Proof of Statistical Significance Relevant?, 61 Wash. L. Rev. 1333 (1986); David H. Kaye & David A. Freedman, Reference Guide on Statistics, in Reference Manual on Scientific Evidence (3d ed. 2011).

Sunday, May 1, 2011

New Doubts About Unscrambling Complex DNA Mixtures with SNPs

On September 5, 2008, I posted a report in the now defunct Science & Law Blog in the Law Professors Blog Network entitled Genetics Datasets Closed Due to Forensic DNA Discovery. It concerned a reported procedure for the equivalent of unscrambling a broken egg -- using a large number of SNPs to determine whether a known individual's DNA is part of a mixture of DNA from scores or even hundreds of DNA samples.

If Hollywood was paying attention, we would have seen CSI techs checking a door knob to find out whether the suspect ever touched it. And however the report might have been received in Hollywood, it scared the NIH and other scientific organizations that maintain research databases. After reproducing the posting about the international reaction, I shall quote an abstract from a study slated for publication in the journal Forensic Science International: Genetics. The latest work makes it even clearer that limiting access to the databases was unnecessary. It concludes that "it is not possible to reliably infer the presence of minor contributors to mixtures following the approach suggested in Homer et al. (2008)."

Posting of 5 Sept. 2008

Until last Friday, the National Institutes of Health (NIH) and other groups had posted large amounts of aggregate human DNA data for easy access to researchers around the world. On Aug. 25, however, NIH removed the aggregate files of individual Genome Wide Association Studies (GWAS).

The files, which include the Database of Genotypes and Phenotypes (dbGaP), run by the National Center for Biotechnology Information, and the Cancer Genetic Markers of Susceptibility database, run by the National Cancer Institute, remain available for use by researchers who apply for access and who agree to protect confidentiality using the same approach they do for individual-level study data.) The Wellcome Trust Case Control Consortium and the Broad Institute of MIT and Harvard also withdrew aggregate data.

The reason? The data keepers fear that police or other curious organizations or individuals might deduce whose DNA is reflected in the aggregated data, and hence, who participated in a research study. These data consist of SNPs -- Single Nucleotide Polymorphisms. These are differences in the base-pair sequences from different people at particular points in their genomes. Many SNPs are neutral -- they do not have have any impact on gene expression. Nonetheless, they can be helpful in determining the locations of nearby disease-related mutations.

The event that prompted the data keepers to act was the discovery at the Translational Genomics Research Institute (TGen) of a new way to check whether an individual's DNA is a part of a complex mixture of DNA (possibly from hundreds of people). According to the  TGen report, Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays, a statistic applied to intensity data from SNP microarrays (chips that detect tens of thousands of SNPs simultaneously) reveals whether the signals from an individual's many SNPs are consistent with the possibility that the individual is not in the mixture. (Sorry for the wordiness, but the article uses hypothesis testing, and "not in the mixture" is the null hypothesis.)

How could this compromise the research databases? As best as I understand it, the scenario is that someone first would acquire a sample from somewhere. Your neighbor might check your garbage, isolate some of your DNA, get a SNP-chip readout, and check it against the public database to see if you were a research subject who donated DNA. Or, the police might have a crime-scene sample. Then they would use a SNP-chip to get a profile to compare to the record on the public database to see if the profile probably is part of the mixture data there. Finally, if they got a match, the police would approach the researchers to get the matching individual's name.

Kathy Hudson, a public policy analyst at Johns Hopkins University, stated in an email that “While a fairly remote concern, and there are some protections even against subpoena, NIH did the right thing in acting to protect research participants.” However, scientists such David Balding in the U.K. are complaining that the restrictions on the databases are an overreaction. Indeed, an author of the TGen study is quoted as stating that the new policy is "a bit premature." See http://www.nature.com/news/2008/080904/full/news.2008.1083.html.

It seems doubtful that anonymity of the research databases has been breached, or will be in the immediate future, by this convoluted procedure. Of course, the longer-term implications remain to be seen, and the technique has obvious applications in forensic science. If the technique works as advertised, police will be able to take a given suspect and determine whether his DNA is part of a mixture from a large number of individuals that was recovered at a crime scene. Analyzing complex mixtures for identity is difficult to do with standard (STR-based) technology.

References

- Homer N, Szelinger S, Redman M, Duggan D, Tembe W, et al., Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays, PLoS Genetics (2008). 4(8):e1000167. doi:10.1371/journal.pgen.1000167
-DNA databases shut after identities compromised, Nature 455:13. Sept. 3, 2008
-Natasha Gilbert, Researchers criticize genetic data restrictions, Nature Sept. 4, 2008, <http://www.nature.com/news/2008/080904/full/news.2008.1083.html>

The latest study is Egeland et al., Complex Mixtures: A Critical Examination of a Paper by Homer et al., Forensic Sci. Int'l: Genetics, 2011. It is in press. Corrected proofs are available online (for a price) at the journal's website.

Abstract: DNA evidence in criminal cases may be challenging to interpret if several individuals have contributed to a DNA-mixture. The genetic markers conventionally used for forensic applications may be insufficient to resolve cases where there is a small fraction of DNA (say less than 10%) from some contributors or where there are several (say more than 4) contributors. Recently methods have been proposed that claim to substantially improve on existing approaches [1]. The basic idea is to use high-density single nucleotide polymorphism (SNP) genotyping arrays including as many as 500,000 markers or more and explicitly exploit raw allele intensity measures. It is claimed that trace fractions of less than 0.1% can be reliably detected in mixtures with a large number of contributors. Specific forensic issues pertaining to the amount and quality of DNA are not discussed in the paper and will not be addressed here. Rather our paper critically examines the statistical methods and the validity of the conclusions drawn in Homer et al. (2008).

We provide a mathematical argument showing that the suggested statistical approach will give misleading results for important cases. For instance, for a two person mixture an individual contributing less than 33% is expected to be declared a non-contributor. The quoted threshold 33% applies when all relative allele frequencies are 0.5. Simulations confirmed the mathematical findings and also provide results for more complex cases. We specified several scenarios for the number of contributors, the mixing proportions and allele frequencies and simulated as many as 500,000 SNPs.

A controlled, blinded experiment was performed using the Illumina GoldenGate® 360 SNP test panel. Twenty-five mixtures were created from 2 to 5 contributors with proportions ranging from 0.01 to 0.99. The findings were consistent with the mathematical result and the simulations.

We conclude that it is not possible to reliably infer the presence of minor contributors to mixtures following the approach suggested in Homer et al. (2008). The basic problem is that the method fails to account for mixing proportions.

Cross-posted to the Double Helix Law blog, 1 May 2011.

Part IV of Fingerprinting Under the Microscope: Probative Value

Where do all these numbers leave us? Plainly, they demonstrate that the error rate for human examiners is not zero. They show that the examiners in the experimental conditions made false identifications at a rate of only 1 in a 1,000, and they made false exclusions at a rate exceeding 75 in 1,000. These error rates indicate that fingerprint identifications and exclusions are (or can be) valid. That is the Daubert question. A human being's use of the features that LPEs rely on in fingerprint comparisons is a valid means of identifying fingerprints.

But these numbers do not give the probative value of an identification or an exclusion in a particular case. For one thing, these numbers apply to a large group of examiners who are not representative of all LPEs and to challenging pairings of prints. To permit an extrapolation to the particular case, however, let us assume that the examiner in that case performed identically to this group in examining a pair of prints that was comparable to the pairs in the study. Suppose this LPE then made an identification to the defendant. How much support would that provide for the prosecution’s case?

The positive predictive value of 99.8% does not apply to this case unless the prior probability that the latent and the exemplar are mates is the same as the prevalence of matees in the study. The prevalence of mates among the pairs found to be of value for individualization was 4083/5969 = 68.4%. The prior probability here could be lower (if the exemplar came from an AFIS search and there were no other evidence in the case, for example). Or it could be higher (if the other evidence was such that the probability of a mate without regard to the match exceeded 68.4%).

The expert witness is not in a position to choose prior probabilities for the jurors, but an expert could display the posterior probability as a function of the possible priors. A figure in the Noblis-FBI study graphs this function, and it might be presented to the jury. This method of explaining a match has been debated in the legal literature for some 40 years. See David H. Kaye et al., The New Wigmore on Evidence: Expert Evidence (2d ed. 2011).

Instead of talking in terms of the posterior probability, one can focus on the likelihood ratio (LR) in its own right. This ratio indicates how much the finding of a match shifts the odds (based on other evidence) that the defendant is the source of the latent print. The LR is given by
P(identification | mate & VIn & conclusion)
------------------------------------------------------------------
P(identification | nonmate & VIn & conclusion)
= Sensitivity
-------------------------
(1 – Specificity)
= 89.1
---------------
(1 – 99.8)
= 587.
An expert might testify that, based on this study, the identification is strong evidence for the prosecution because it is hundreds of times more likely to arise when a latent print and an exemplar originate from the same finger than when they come from different sources. The defense would have to argue that the examiner is below the average of the study group in skill, that he might have taken less care than those LPEs did, that he had more difficult prints to work with, or that even with this finding, the prosecution case leaves a reasonable doubt. If there were blind verification of the examiner's conclusion, the prosecutor could counter that the evidence is even more compelling than the LR of 587 suggests. The jury might regard the fingerprint evidence as powerful but not conclusive. With a careful examiner, that would appear to be a reasonable conclusion.