Tuesday, December 24, 2013

Breathalyzers and Beyond: The Unintuitive Meanings of "Measurement Error" and "True Values" in the 2009 NRC Report on Forensic Science

Five years ago, the National Research Council released its eagerly awaited and repeatedly postponed report on "Strengthening Forensic Science in the United States: A Path Forward." One theme of the report was that forensic experts must present their findings with due recognition of Rumsfeldian "known unknowns." For example, the report repeatedly referred to "the importance of ... a measurement with an interval that has a high probability of containing the true value" (NRC Committee 2009, p. 121), and it referred to "error rates" for categorical determinations (ibid., pp. 117-22). 

Earlier this year, UC-Davis law professor and evidence guru Edward Imwinkelried and I submitted a letter urging the Washington Supreme Court to review a case raising the issue of whether the state courts should admit point estimates of blood or breath alcohol concentration without an accompanying quantitative estimate of the uncertainty in each estimate. (The court denied review.) Since the NRC report uses breath-alcohol measurements to explain the meaning of its call for interval estimates, one would think that the report would have a good illustration of a suitable interval. But that is not what I found. The report's illustration reads as follows:
As with all other scientific investigations, laboratory analyses conducted by forensic scientists are subject to measurement error. Such error reflects the intrinsic strengths and limitations of the particular scientific technique. For example, methods for measuring the level of blood alcohol in an individual or methods for measuring the heroin content of a sample can do so only within a confidence interval of possible values. In addition to the inherent limitations of the measurement technique, a range of other factors may also be present and can affect the accuracy of laboratory analyses. Such factors may include deficiencies in the reference materials used in the analysis, equipment errors, environmental conditions that lie outside the range within which the method was validated, sample mix-ups and contamination, transcriptional errors, and more.

Consider, for example, a case in which an instrument (e.g., a breathalyzer such as Intoxilyzer) is used to measure the blood-alcohol level of an individual three times, and the three measurements are 0.08 percent, 0.09 percent, and 0.10 percent. The variability in the three measurements may arise from the internal components of the instrument, the different times and ways in which the measurements were taken, or a variety of other factors. These measured results need to be reported, along with a confidence interval that has a high probability of containing the true blood-alcohol level (e.g., the mean plus or minus two standard deviations). For this illustration, the average is 0.09 percent and the standard deviation is 0.01 percent; therefore, a two-standard-deviation confidence interval (0.07 percent, 0.11 percent) has a high probability of containing the person’s true blood-alcohol level. (Statistical models dictate the methods for generating such intervals in other circumstances so that they have a high probability of containing the true result.)
(Ibid., pp. 116-17.)

What is troublesome about this explanation? Let me count the ways.

1. "Measurement error" does not refer to all errors of measurement

"[D]eficiencies in the reference materials used in the analysis, equipment errors, environmental conditions that lie outside the range within which the method was validated, sample mix-ups and contamination, transcriptional errors, and more" all "can affect the accuracy of laboratory analyses." Nevertheless, they do no count as "measurement error" because they are "factors other than the inherent limitations of the measurement technique." Not being "intrinsic [to] the particular scientific technique," they fall outside the committee's definition of "measurement error."

That narrow definition calls to mind the claims of some fingerprint analysts that the ACE-V method has an "methodological" error rate of zero because the only possibility for error arises when a human being does not apply the method perfectly. The difference, however, is that one can measure the errors when the breathalyzer has no deficient reference materials, no extreme environmental conditions, no sample mix-ups and contamination, no transcriptional errors, and so on. The fingerprint analyst, in contrast, is the measuring instrument, and it is impossible to distinguish between instrument measurement error and human error in that context.

There is nothing illogical in quantifying some but not all measurement errors when some are more readily and validly quantifiable than others. Machines might not be tested periodically to ensure that they are operating as they are supposed to (e.g., DiFilipo 2011; Sovern 2012), but whether one can usefully build that possibility into the computation of the uncertainty of a measurement that might be suitable for courtroom testimony is not clear. Yet, using the seemingly all-encompassing phrase "measurement error" in a narrow, technical sense -- to denote only the noise inherent in the apparatus when operated under certain conditions -- is potentially misleading.

2. "True values" are not true blood-alcohol levels.

Because the committee's example of "measurement error" quantifies only "intrinsic" error, its statement that "a two-standard-deviation confidence interval (0.07 percent, 0.11 percent) has a high probability of containing the person’s true blood-alcohol level" also is easily misunderstood. The confidence interval (CI) for "true values" does not pertain to the actual blood-alcohol level. That level can differ from the point estimate of 0.09 for other reasons, making the real uncertainty greater than ± 0.02.

In addition, a breathalyzer measures alcohol in the breath, not in the bloodstream. The concentrations are related, but the precise functional relationship varies across individuals (e.g., Martinez & Martinez 2002). This is another source of uncertainty not reflected in the committee's CI for blood-alcohol concentration (BAC), although the committee could have sidestepped this issue by referring to breath-alcohol concentration (BrAC).

3. The standard error of the breathalyzer would be determined differently.

The NRC committee imagines using a breathalyzer to make three measurements of the same breath sample. The parenthetical, concluding sentence about "statistical models" for "other circumstances" suggests that the committee realized that this approach is not one that anyone would use to estimate the noise in the apparatus. The breathalyzer should be tested on many samples with known concentrations to ensure that it is not biased and to quantify the extent of the random variations about those known values. Manufacturers perform such tests (e.g., Coyle et al. 2010).

4. A CI of ±2 standard errors might not have "a high probability of containing the person’s true blood-alcohol level"

Let's put aside all the concerns raised so far. Suppose that the errors in the machine's measurement always are normally distributed about the true value in a breath sample; that the applicable standard deviation for this distribution is 0.01; and that the single measured value is 0.09. Is it now true that the interval 0.09 ± 0.02 "has a high probability of containing the person’s true blood-alcohol level"?

Maybe. Two standard errors give an interval with a confidence coefficient of approximately 95%. That is to say that this one interval comes from a procedure that generates intervals that cover the true value about 95% of the time. It is tempting to say that the probability that the interval in question covers the true value therefore is 95%.

But let's think about how the sample came to be tested. The arrested officer picks someone out of a population of motorists. The motorists have varying levels of BrACs, and the officer has some level of skill in spotting the ones who might well be inebriated. Suppose that the drivers the officer stops and tests have BrACs that are normally distributed with mean 0.04 and standard deviation 0.01. The officer's breathalyzer is functioning according to manufacturer's specifications, and the standard deviation in its measurements is 0.01, as in the NRC report. Having obtained a measurement of 0.08 on the one driver's breath sample, what is a high probability interval for true BrAC in this one breath sample? Is it 0.07 to 0.11?

It turns out the probability that the true BrAC falls within the NRC's interval is only 24% (applying equations 2.9 and 2.10 in Gelman et al. 2004). If the officer stopped drivers who whose mean BrAC were greater than 0.04 or with more variable BrACs, the probability for the NRC's interval being correct would be greater. If, for example, the standard deviation in this group were 0.02 instead of 0.01 (and the mean were still 0.04), then the probability for the NRC's interval would be 87%.

Of course, we do not know much about the distribution of BrAC in the group that the officer stops. As indicated above, this distribution would depend on the drinking habits of drivers in the town and the officer's skill in pulling over drunken drivers. The choice of a normal distribution with the parameters mentioned above is not likely to be realistic. But whatever the distribution may be, it, along with the single measured value, bears on the true value of the tested driver's BrAC. This fact makes it tricky to quantify the probability that the NRC's CI includes the driver's BrAC.

* * *

The NRC Report was certainly correct to call on forensic scientists to develop better measures of the uncertainty in their findings and to apply them in their reports and testimony. But figuring out what these measures should be and how to use them is a formidable challenge. Meeting this challenge will be a lot harder than the simple example of a confidence interval in the report might suggest.

References

Sunday, December 22, 2013

Forensic Science’s Latest Proof of Uniqueness

A federally funded study on the "Determination of Unique Fracture Patterns in Glass and Glassy Polymers" affirms that fracture-pattern matches are unique. The researchers believe their work permits experts to continue to provide their "usually conclusive" testimony about cracked glass and plastic.

"The purpose of the research" undertaken at the University of California at Davis's graduate program in forensic science was "to provide a first, objective scientific background that will illustrate that repetitive fractures, under controlled conditions on target materials such as glass window panes and glass bottles, are in fact different and unique. In this phase of our study, we fractured glass window panes, glass bottles (clear wine bottles), and polymer tail light lens covers. Each and every fracture was documented in detail for subsequent inter-comparison and to illustrate the uniqueness of the fracture pattern." (Tulleners et al. 2013, p. 7).

Not surprisingly, the researchers found that all their fractures were distinguishable. In all, they conducted 5,310 pairwise comparisons by examining the fracture patterns in all pairs formed within each of the three groups of 60 items. This finding, they concluded, "should aid the practitioner in any court testimony involving the significance of fracture matching of broken glass and polymers materials." (Ibid., p. 23).

What testimony might this be? "For the forensic community, the ability to piece together glass fragments in order to show a physical fit or a 'Physical Match' is the strongest evidentiary finding of an association." (Ibid., p. 6) "The usual statement is that 'the evidence glass fragment was physically matched to another glass establishing thus both share a common origin.'" (Ibid.) This testimony, the researchers suggest, is just fine: "we are substantiating the individuality of glass and polymer fractures under closely controlled conditions." (Ibid., p. 3, emphasis added). Thus, "[t]his research should enhance the capability of the analyst to testify in a court of law as to the uniqueness of a fracture." (Id., p. 61, emphasis added).

But why would the analyst want to claim universal uniqueness? Forensic science’s hoary division of its world into two parts -- "unique" feature sets and "class" characteristics -- is an article of faith. (E.g., Kaye 2009). The latest study certainly is of some use in confirming the intuition that fracture patterns are highly variable. The existence of varying patterns is one fact that makes "fractography" evidence, as it is called in the field, probative. But the study’s explanation of how it proves that every pattern is unique seems like a parody of scientific reasoning. The explanation is this:
In this research, it is hypothesized that every fracture forms a unique and nonreproducible fracture pattern. Alternately, it may be that some fracture patterns may be reproduced from time to time. If it is found that each fracture forms a unique and nonreproducible fracture pattern, then this finding will support the theory that coincidental duplication of fracture patterns cannot be attained. However, if duplicate fracture patterns are found, this would falsify the null hypothesis and show that some fracture patterns may be reproduced from time to time.
(Ibid., p. 27). Such is the power of the unique-vs-class thinking. This impoverished dichotomy collapses a spectrum of possible states of nature into two discrete states. Combined with a cartoon-like version of Sir Karl Popper’s criterion of falsification, it leads the researchers to believe that their failure to find a class characteristic proves the "null hypothesis" of uniqueness.

True, the failure to find "duplicate fracture patterns" in a small sample "support[s] the theory that coincidental duplication of fracture patterns cannot be attained." (Or it would in a study in which the analyst deciding on whether two patterns were the same did not already know that all of them came from different objects.)

But it also supports the alternative theory that coincidental duplication can be attained. Instead of taking no-duplication-is-possible as the “null hypothesis,” we could postulate that, on average, 1 in every 10,000 fractures of the items tested would produce indistinguishable fracture patterns. Or, we could hypothesize that the mean duplication rate is 1/100,000. Since we just spinning out hypotheses, we could pick still other rates.

A great many such hypotheses seem compatible with the finding of no duplicates among 180 fractures. Observing a unique set of patterns in the sample supports (to varying degrees) a wide range of hypotheses about the duplication probability. To indulge an overly simplistic model, if we were to assume that the probability of detecting a duplicated pattern in each of the 5,310 comparisons were some identical, albeit small, number, then the 95% confidence interval for this duplication probability would go from zero (uniqueness) all the way up to 1/1770. (See Eypasch et al. 1995). To testify that the experiment supports only “the theory that coincidental duplication of fracture patterns cannot be attained” would be foolish. A more accurate statement would be that it supports the theory that duplication occurs at an unknown, but not very large, rate.

To be sure, there is reason to believe that duplication is improbable, and the UC-Davis study adds to our knowledge of fracture patterns. However, fractographers should think twice (or more!) before they testify that the study demonstrates the utter uniqueness of all fractures. They gain little by embracing the claim of universal uniqueness (Cole 2009; Kaye et al. 2011), and this study does not deliver on the promise of "objective criteria to determine the uniqueness of a fit." (Tulleners et al., p. 7).

References
  • Simon A. Cole, 2009. Forensics Without Uniqueness, Conclusions Without Individualization: The New Epistemology of Forensic Identification. Law, Probability and Risk 8:233-255
  • Ernst Eypasch, Rolf Leferinga, C K Kuma, Hans Troid, 1995. Probability of Adverse Events That Have Not Yet Occurred: A Statistical Reminder. Brit. Med. J. 311:619, available at http://www.bmj.com/content/311/7005/619
  • David H. Kaye, David E. Bernstein & Jennifer L. Mnookin, 2011. The New Wigmore, A Treatise on Evidence: Expert Evidence. New York: Aspen Pub. Co. (2d ed.)
  • David H. Kaye, 2009. Identification, Individuality, and Uniqueness: What's the Difference? Law, Probability & Risk 8:85-89, http://ssrn.com/abstract=1261970 (abstract)
  • Frederic A. Tulleners, John Thornton & Allison C. Baca, 2013. Determination of Unique Fracture Patterns in Glass and Glassy Polymers, available at https://www.ncjrs.gov/pdffiles1/nij/grants/241445.pdf

Sunday, December 8, 2013

Error on Error: The Washington 23

Frequently cited in warnings on the risks of errors in DNA typing is a 2004 article prepared by unnamed staff of the Seattle Post-Intelligencer. In one highly praised book, for instance, Sheldon Krimsky of Tufts University and Tania Simoncelli, then with the ACLU, wrote that the paper “reported that forensic scientists at the Washington State Patrol Laboratory had made mistakes while handling evidence in at least 23 major criminal cases over three years” [1, p. 280]. The article itself begins “[c]ontamination and other errors in DNA analysis have occurred at the Washington State Patrol crime labs, most of it the result of sloppy work” [2].

Laboratory documentation of “sloppy work” should be encouraged. It should be scrutinized inside and outside of the laboratory. Within the laboratory, it can be a path to improvements. Outside the laboratory world, reporting on problems, quotidian and catastrophic alike, can increase the level of public and professional understanding of how forensic science is practiced. However, it is important to be clear about the nature, severity, and implications of specific “mistakes,” “errors,” and “contamination.” These terms cover a variety of phenomena.

Even before the earliest days of PCR-based DNA typing, it has been known that “contamination” is an omnipresent possibility. It can result from extraneous DNA in materials from companies that supply reagents and equipment, from the introduction of the analyst’s DNA into the sample being analyzed (“for example, when the analyst talks while handling a sample, leaving an invisible deposit of saliva” [2]), from inadequate precautions against transferring DNA from one test with one sample over to another test with a different sample (a form of “cross-contamination”), and so on. Many forms of contamination are detectable, but they can complicate or interfere with the interpretation of an STR profile [3]. Cross-contamination of a crime-scene sample with a potential suspect’s DNA either before or after it reaches the laboratory is particularly serious because it could result in a false match.

As described in an appendix below, it appears that only one of the 23 cases (#22) involved a false report of a match, and the report was corrected before any charges were filed. However, Bill Thompson presented a different case as a premier example of "false cold hits" [4, p. 230]. In his latest publication on errors in DNA typing, he wrote that
[W]hile the Washington State Crime Patrol Laboratory a cold-case investigation of a long-unsolved rape, it found a DNA match to a reference sample in an offender database, but it was a sample from a juvenile offender who would have been a toddler at the time the rape occurred. This prompted an internal investigation at the laboratory that concluded that DNA from the offender's sample, which had been used in the laboratory for training purposes, had accidentally contaminated samples from the rape case, producing a false match. [4, p. 230].
Thompson noted that he "assisted the newspaper in the investigation" [4, p. 341 n.12]. Apparently, he was referring to case #5 in the article (although the article labels it a homicide case). In any event, it is the only case Thompson lists as an example of a false match in Washington.

My conclusion is that the Washington cases certainly establish that mistakes of many types can occur in DNA laboratories and that some types of mistakes can produce false matches, false accusations, and even false convictions. But none of the 23 are themselves instances of false charges or false convictions. This conclusion neither condones the mistakes nor excludes the possibility that DNA has produced such outcomes in Washington.  But it may help put the 23 cases and the writing about them in perspective.

References
  1. Sheldon Krimsky & Tania Simoncelli, Genetic Justice: DNA Data Banks, Criminal Investigations, and Civil Liberties (2011)
  2. DNA Testing Mistakes at the State Patrol Crime Labs, Seattle Post-Intelligencer, July 21, 2004, 10:00 pm, http://www.seattlepi.com/local/article/DNA-testing-mistakes-at-the-State-Patrol-crime-1149846.php
  3. Terri Sundquist & Joseph Bessetti, Identifying and Preventing DNA Contamination in a DNA-Typing Laboratory, Profiles in DNA, Sept. 2005, at 11-13, http://www.promega.com/~/media/Files/Resources/Profiles%20In%20DNA/802/Identifying%20and%20Preventing%20DNA%20Contamination%20in%20a%20DNA%20Typing%20Laboratory.ashx
  4. William C. Thompson, The Myth of Infallibility, in Genetic Explanantions: Sense and Nonsense 227 (Sheldon Krimsky & Jeremy Gruber eds. 2013)
Related postings
APPENDIX
23 and Me

This Appendix quotes the newspaper descriptions in full, then offers my own remarks.

EXAMPLE NO. 1
Problem: Cross-contamination
When and where: July 2002, Spokane lab
Forensic scientist: Lisa Turpen
Case: child rape
What happened: Turpen contaminated one of four vaginal swabs with semen from a positive control sample. Corrected report issued almost two years later in March 2004. ....Yakima prosecutors offered plea deal during the trial, with defendant pleading guilty to two gross misdemeanors. Turpen's mistake was a factor, according to defense.”

REMARKS: I do not know what “semen from a positive control sample” means. When DNA from a cell line is used to ensure that PCR is amplifying those alleles, the cell-line DNA is known as a positive control sample. This example does not sound like a case of contamination involving that kind of a positive control. Adding semen to a vaginal swab obviously is unacceptable, but if the other three swabs produced a single male DNA profile and the fourth showed two male profiles in a case involving a single rapist, the anomalous profile would not be falsely matched to anyone.

EXAMPLE NO. 2
Problem: Erroneous lab report
When and where: August 2002, Seattle lab
Forensic scientist: William Stubbs
Case: Fatal police shooting of Robert Thomas
What happened: Two hours before testifying at inquest, Stubbs discovered his crime lab report was wrong and notified prosecutor. His report said test found brown stain on gun was likely blood, but his notes had no indication of blood. ... Corrected report issued in September 2002. ... Co-worker reviewing case did not catch mistake.

REMARK: Does not involve DNA typing.

EXAMPLE NO. 3
Problem: Self-contamination
When and where: April 2001, Spokane lab
Forensic scientists: Charles Solomon, Lisa Turpen
Case: rape/kidnapping/assault
What happened: In separate tests, Solomon and Turpen contaminated hair-root tests with their own DNA. Solomon also contaminated reference blood sample with his DNA. ...Three defendants were convicted.

REMARK: There is no suggestion of a false match here.

EXAMPLE NO. 4
Problem: Testing error
When and where: September 2002, Marysville lab
Forensic scientist: Mike Croteau
Case: robbery/assault
What happened: Rushing to meet deadlines, Croteau mixed up reference samples from victim and suspect. He reported incorrect findings verbally to prosecutor, then discovered his mistake. ... Defendant pleaded guilty.

REMARK: What is the mistake here? It must be something more than using the wrong names for the two samples that were compared to produce a false match.

EXAMPLE NO. 5
Problem: Cross-contamination
When and where: August 2003, Seattle lab
Forensic scientist: Robin Bussoletti
Case: homicide
What happened: Bussoletti likely contaminated work surface while testing a blood sample from a convicted felon during training. Next DNA analyst who used work station noticed contamination in chemical solution that is not supposed to contain DNA.

REMARKS: Definitely sloppy -- and potentially falsely incriminating if work surface was then used without a thorough cleaning for casework.

EXAMPLE NO. 6
Problem: Cross-contamination
When and where: January 2004, Tacoma lab
Forensic scientist: Jeremy Sanderson
Case: child rape
What happened: Sanderson failed to change gloves between handling evidence in two cases. He noticed contamination in chemical solution. ... Defendant convicted and sent to prison.

REMARK: Is this a case of cross-contamination of samples?

EXAMPLE NO. 7
Problem: Error during testing
When and where: June 2002, Seattle lab
Forensic scientist: Denise Olson
Case: aggravated murder
What happened: Olson did initial test to look for blood on shoes. She got weak positive result, then threw out swabs. She didn't document findings or notify police. Kirkland police complained because discarded swabs couldn't be tested for DNA. ... Shoes sent to private lab for retesting. ... Defendant Kim Mason convicted and sentenced to life without release.

REMARK: Not a false match

EXAMPLE NO. 8
Problem: Error in DNA test interpretation
When and where: October 1998, Seattle lab
Forensic scientist: George Chan
Case: rape
What happened: Chan misstated statistical likelihood of match with suspect. Co-worker reviewing case didn't catch error. ... Pierce County prosecutor noticed mistake at pretrial conference in September 2000. ... Defendant convicted.

REMARK: Not a false match

EXAMPLE NO. 9
Problem: Error in testing procedure
When and where: September 2002, Seattle lab
Forensic scientist: Denise Olson
Case: robbery/assault
What happened: Olson tested known DNA samples before evidence collected at crime scene -- a violation of lab procedure aimed at preventing cross-contamination. A co-worker caught the mistake while reviewing the case.... Tests were redone. ... Defendant pleaded guilty.

REMARK: This departure from protocol raises the risk of an incriminating case of cross-contamination, but there is no indication that any cross-contamination occurred.

EXAMPLE NO. 10
Problem: Self-contamination
When and where: November 2002, Tacoma lab
Forensic scientist: Mike Dornan
Case: rape

What happened: Dornan contaminated DNA test of victim's underwear with his own DNA. May have resulted from talking during testing process.... Defendant pleaded guilty.

REMARK: No false match.

EXAMPLE NO. 11
Problem: Unknown source of contamination
When and where: January 2004, Tacoma lab
Forensic scientist: Christopher Sewell
Case: homicide
What happened: Sewell found low level of DNA from unknown source in blood sample from victim. May have come from blood transfusion of victim before death. ... Case pending.

REMARK: The “unknown source of contamination” does not seem to have produced a false match if peak heights indicated a minor contributor, and the major contributor was the defendant,

EXAMPLE NO. 12
Problem: Self-contamination
When and where: March 2004, Tacoma lab
Forensic scientist: William Dean
Case: rape
What happened: Dean contaminated control sample with his own DNA while testing police evidence. ... No suspect.

REMARK: No suspect, no contamination of a crime-scene or suspect sample, no false match.

EXAMPLE NO. 13
Problem: Unknown source of contamination
When and where: January 2003, Spokane lab
Forensic scientist: Lisa Turpen
Case: murder
What happened: Turpen found unidentified female DNA in control sample while testing evidence in Stevens County double-murder case.... Defendant convicted.

REMARK: No contamination of a crime-scene or suspect sample, no false match.

EXAMPLE NO. 14
Problem: Unknown source of contamination
When and where: January 2003, Spokane lab
Forensic scientist: Lisa Turpen
Case: robbery/kidnapping
What happened: Turpen found unidentified female DNA in control sample while testing evidence in Yakima County case. Evidence tested same day as evidence in Example No.13.... Case pending.

REMARK: No contamination of a crime-scene or suspect sample, no false match.

EXAMPLE NO. 15
Problem: Self-contamination
When and where: September 2003, Marysville lab
Forensic scientist: Greg Frank
Case: murder
What happened: Frank contaminated control samples with his own DNA during testing in Snohomish County case. ...Case pending.

REMARK: No contamination of a crime-scene or suspect sample, no false match.

EXAMPLE NO. 16
Problem: Self-contamination
When and where: September 2003, Marysville lab
Forensic scientist: Greg Frank
Case: child molestation/rape
What happened: Frank contaminated control samples with his own DNA during testing in Kitsap County case. ... Defendant pleaded guilty.

REMARK: No contamination of a crime-scene or suspect sample, no false match.

EXAMPLES NO. 17 & 18
Problem: Unknown source of contamination
When and where: October 2003, Seattle lab
Forensic scientists: Phil Hodge, Amy Jagman
Cases: unknown
What happened: Hodge and Jagman both discovered unknown source of contamination in chemical used during DNA testing. Chemical discarded and evidence retested.

REMARK: No contamination of a crime-scene or suspect sample, no false match.

EXAMPLE NO. 19
Problem: Self-contamination
When and where: October 2002, Spokane lab
Forensic scientists: Charles Solomon, Lisa Turpen
Case: murder
What happened: Solomon found Turpen's DNA on three bullet casings retrieved from scene of Richland double murder. ... Defense expert disputed this at trial, testifying that DNA profile belonged to unknown female. ... Defendant Keith Hilton convicted.

REMARK: No false match.

EXAMPLE NO. 20
Problem: Cross-contamination
When and where: February 2002, Tacoma
Forensic scientist: Mike Dornan
Case: child rape
What happened: Dornan contaminated evidence in King County rape case with DNA from a previous case, likely by failing to properly sterilize scissors. ... Defendant pleaded guilty to a reduced charge before contamination was discovered.

REMARK: I presume that if the previous case were the defendant’s and that is what led to the charge against the defendant, the newspaper would have so stated. That would have been a false match.

EXAMPLE NO. 21
Problem: Self-contamination
When and where: January 2001, Marysville lab
Forensic scientist: Brian Smelser
Case: rape
What happened: Smelser contaminated three tests with his own DNA in Kirkland rape case. Prosecutor had to send remaining half-sample to California lab for retesting.... Defendant pleaded guilty to reduced charge.

REMARK: No false match.

EXAMPLE NO. 22
Problem: Error in testing
When and where: December 2002, Seattle lab
Forensic scientist: Denise Olson
Case: rape/attempted murder
What happened: Olson misinterpreted DNA results, telling Seattle police their suspect was a match. Co-worker caught error 11 days later, just as charges were about to be filed.... Case unsolved.

REMARK: A false positive report (not resulting from contamination).

EXAMPLE NO. 23
Problem: Self-contamination
When and where: January 2004, Seattle lab
Forensic scientist: George Chan/William Stubbs
Case: child rape
What happened: Chan's DNA found in suspect's boxer shorts by Stubbs. Problem traced to Chan talking to Stubbs during testing.... Suspect pleaded guilty.

REMARK: No contamination of a crime-scene or suspect sample, no false match.

Saturday, December 7, 2013

Error on Error: Quashing Brian Kelly's Conviction

Are there any errors in DNA testing? Are there any errors that produce false positives? Do DNA databases generate any false leads? Do false leads produce any false arrests? Any false convictions? The answers to these questions are yes, yes, yes, yes, and yes. (See related postings below.)

But how large is the risk of a false positive match to an existing suspect? To an innocent individual culled from a database? By and large, we are limited to isolated reports in newspapers--reports that are newsworthy precisely because they are rare. The most complete compilation of the troubling cases, presented in a survey of the ways that errors can arise, is to be found in a book chapter by Bill Thompson of the University of California at Irvine. [1]

Professor Thompson is an unusually knowledgeable and astute commentator, consultant, and advocate in the field of DNA evidence, and it should be revealing to work through his examples. That is what I have started to do. So far, I have looked into only the very first case noted in the chapter. According to Professor Thompson, it exemplifies a "common problem" [1, p. 230] of "[a]ccidental transfer of cellular material or DNA from one sample to another" [1, p. 229] causing "false reports of a DNA match between samples that originated from different people" [1, p. 230].

The example is a 1988 DNA test in a rape case in Scotland that led to the conviction of Brian Kelly. Thompson simply reports that "Scotland's High Court of Justiciary quashed a conviction in one case in which the convicted man (with the help of sympathetic volunteer scientists) presented persuasive evidence that the DNA match that incriminated him arose from a laboratory accident" [1, 230]. The "accident" in question consisted of DNA leaking from one well to an adjacent one in an agarose gel used in VNTR typing or an analyst's misloading some of the same DNA sample into both wells instead of just the one she was aiming for.

But the evidence that Professor Thompson found "persuasive" did not persuade the court. Indeed, the experts did not even testify that leakage or misloading had occurred. Rather, they stated that it was a "low risk" event, that the possibility could not be excluded, and that a procedure that would have reduced the risk could have been followed (and was adopted two years later) [2, ¶¶ 15-17].

Thus, the Scottish Appeals Court, noting other evidence in Kelly's favor and weaknesses in the Crown's case, quashed the conviction--but not because it concluded that that the match was false. The court quashed the conviction because the jury was not informed of the fact that the same DNA could end up in two adjacent lanes. The court wrote:
It was not suggested that there is evidence positively indicating that cross-contamination did, or may have, occurred. On the basis of the evidence tendered by the appellant, it is maintained, on the other hand, that there was a risk of cross-contamination arising from the practice at that time of using adjoining wells for DNA samples from the crime scene and the suspect, and of such cross-contamination being undetected. It was not in controversy that it was possible for there to be leakage between adjoining wells or for DNA material to fall accidentally into a well next to the one for which it was intended. Up to a point the evidence ... as to the procedures which were followed, and the special care which was taken, countered the risk that such a mishap would in practice occur or be undetected. However, such evidence does not in our view provide a complete answer. In particular there was, on the evidence, a risk that the leakage of DNA from the well for the suspect's reference sample to the adjoining well which already held the crime scene sample would not be detected. It was, of course, a low risk, but it was of sufficient importance to be recognised by experts ... .

... In our opinion there is evidence which is capable of being regarded as credible and reliable as to the existence of a risk of cross-contamination occurring without it being detected. The risk was a low risk. It may be that in other circumstances the fact that the jury did not hear such evidence would not lead to the conclusion that there had been a miscarriage of justice. However, in the present case it is otherwise since the DNA evidence was plainly of critical importance for the conviction of the appellant. If the jury had rejected that evidence there would, in our view, have been insufficient evidence to convict the appellant. Accordingly, while the evidence related to a low risk of cross-contamination, the magnitude of the implications for the case against the appellant were substantial. For these reasons we have come to the conclusion that the appellant has established the existence of evidence which is of such significance that the fact that it was not heard by the jury constituted a miscarriage of justice. [2, ¶ 21-22]

Based on this opinion, the 1988 DNA testing with a superseded technology is a far cry from is a true example of an innocent man convicted because of "a laboratory accident." It is nothing more--or less--than a case in which the defendant did not present expert testimony at trial that the laboratory used a procedure that left open a preventable mode of cross-contamination. The case is an appropriate illustration of the importance of improving laboratory practices, but such cases are not proof of known "false reports" commonly resulting from cross-contamination.

References
  1. William C. Thompson, The Myth of Infallibility, in Genetic Explanantions: Sense and Nonsense 227 (Sheldon Krimsky & Jeremy Gruber eds. 2013)
  2. Opinion in the Reference by the Scottish Criminal Cases Review Commission in the Case of Brian Kelly, Appeal Court, High Court of Justiciary, Appeal No. XC458/03, Aug. 6, 2004, http://www.scotcourts.gov.uk/opinions/XC458.html
Related postings

Friday, December 6, 2013

Get Serious: The US Department of Justice's Amicus Brief in Haskell v. Harris

As the U.S. Court of Appeals for the Ninth Circuit returns to the question of the constitutionality of California's DNA database law, the United States has weighed in with an amicus brief. It is worried (or should be) that the en banc panel will take too seriously the Supreme Court's references to “serious offenses” in Maryland v. King, the DNA-on-arrest case decided last June. The Maryland law that the Court narrowly upheld authorizes DNA collection for violent felonies, burglaries (and attempts to commit those crimes). The California law under attack in Haskell is broader, applying to all felony arrests, including those that would seem rather petty to the casual observer. (The federal law is broader still, encompassing every offense, no matter how trivial, for which a person is dragged into custody.)

Consequently, it comes as no surprise that the federal government wants the Ninth Circuit to read King expansively, whereas the ACLU, which represents the plaintiffs in Haskell, is pressing for the narrowest possible reading. Interestingly, opponents of all forms of DNA-BC (routine arrestee DNA sampling before conviction) tend to read the majority opinion in King broadly. Professor Erin Murphy, for example, concludes in her recent review of the case that the majority did not even "attempt[] to limit its holding to serious crimes." (Murphy 2013, p. 171).

The U.S. Department of Justice (DOJ) could not agree more. Here, I want to look critically at the DOJ’s arguments and statements. The brief essentially argues that (1) the King Court intended its opinion to settle the Fourth Amendment status of all existing DNA-BC laws, (2) by definition, the reasons to uphold the narrower Maryland law apply with equal force to the broader California law, and (3) the King Court's use of the word "booking" and its analogy between DNA profiling and fingerprinting settle the issue as a matter of logic and substance. None of these arguments is conclusive.

I. What Were the Justices Thinking?

The DOJ lawyers seem to think that because the Court was aware that DNA-BC is a national issue, its opinion was meant to settle the issue for all DNA-BC statutes. Their brief quotes the majority's observation that:
Noting that “[t]wenty-eight States and the Federal Government have adopted laws similar to the Maryland Act,” the Court explained that “[a]lthough those statutes vary in their particulars, such as what charges require a DNA sample, their similarity means that this case implicates more than the specific Maryland law.” King, 133 S. Ct. at 1968 (emphasis added).
Brief for the United States, at 3.

The Court’s phrasing cannot bear the weight the government places on it.  Of course “the case” implicates other laws. That is one reason the Court decided to review the case. The Court could have effectively struck down a swath of federal and state laws in one fell swoop. It did not. Is the Court’s awareness of the fact that state laws vary in the offenses that trigger arrestee sampling an announcement that the Court thinks it is upholding the laws of 28 states and the federal government? The next sentence in the majority opinion summarizes the spread of DNA-BC across the country: “At issue is a standard, expanding technology already in widespread use throughout the Nation.” 133 S. Ct. at 1968. The“standard, expanding technology” is the “national project to standardize collection and storage of DNA profiles [known as] the Combined DNA Index System (CODIS) [that] connects DNA laboratories at the local, state, and national level [and that] collects DNA profiles provided by local laboratories taken from arrestees, convicted offenders, and forensic evidence found at crime scenes.” Id. Before the Court even agreed to hear Maryland v. King, the Chief Justice stayed the enforcement of the Maryland Court of Appeal decision partly on the ground that it affected the national system. Yes, the Court expected its opinion in King to affect what other states would do, but that expectation does not mean that its opinion addresses the constitutionality of matters not before it.

II. Is the Balance the Same in California?

The real issue in Haskell is not whether the Justices had the California law in mind when they wrote their opinions in King. It is whether their reasoning dictates the same outcome. Addressing that question, the government claims that “[e]ach of the interests that informed the Court’s holding that the Maryland law was reasonable under the Fourth Amendment similarly applies to California’s law. Consequently, it too is reasonable under the Fourth Amendment.” Amicus Brief of the United States, at 5.

Huh? I invest my money in the common stock of the ABC corporation in light of my assessment of the balance of risk and reward. Although my interests—financial security and possible gain—are the same in all my investments, it does not follow from the fact that my decision to purchase the ABC shares was reasonable that all my investment decisions are equally reasonable. Thus, the DOJ’s argument is incomplete. What matters is not whether the same interests nominally are at play, but whether there are differences that affect the balance of these interests in each situation.

Surely the case for DNA-BC is at least somewhat weaker when it comes to minor offenses. Although "[p]eople detained for minor offenses can turn out to be the most devious and dangerous criminals," Florence v. Bd. of Chosen Freeholders of County of Burlington, 132 S. Ct. 1510, 1520 (2012), on average, people arrested for minor traffic offenses are less likely to be hiding their true identities and to have incriminating DNA samples at crime-scenes than are people arrested for far more serious matters. Whether differences like these are significant enough to change the outcome is debatable of course, but the government’s theory in Haskell is superficial. That the list of generic interests is the same for the most serious and the least serious offenses is the beginning, not the end, of the analysis.

III. Are All Booking Procedures for Identification the Same?

A third argument of sorts emerges in the government brief. Its logical structure is this: (1) arrestee fingerprinting is a constitutionally reasonable booking procedure; (2) arrestee DNA profiling is an analogous booking procedure; therefore, (3) arrestee DNA profile is constitutionally reasonable. The brief puts it this way:
If the term “serious offense” did carry any meaning in King, [it] includes any crime for which an individual is arrested and booked in police custody. This meaning is logical, not only because the Court analyzed DNA fingerprinting as a “booking procedure,” but also because it analogized DNA fingerprinting to traditional “fingerprinting and photographing.”
Brief for the United States, at 7-8.

This “logic” is specious. That DNA sampling is a permissible part of the bookkeeping process for an individual placed in custody for offense A does not imply that it also is permissible for offense B unless B = A in all relevant respects. There is no a priori logical reason to assume that all offenses are so fungible. Similarly, that DNA is like friction-ridge skin in that both can be used to differentiate among individuals does not necessarily mean that the two identifiers are equivalent in other respects. The real issue, as explained above, is whether the government’s interests in acquiring DNA profiles are so much less with respect to some offenses that the government’s demand for the DNA becomes unreasonable. That is a question of practical reason, not of deductive logic or word games.

Recognizing that the King opinion does not foreclose a distinction between serious and nonserious felonies, however, does not imply that the case should be confined to the qualifying offenses in the Maryland law.  The limited information content of a DNA identification profile was a very important factor on one side of the balance sheet in King.  It may not take a particularly puissant set of state interests to overcome the individual interest in shielding this limited information from discovery.  Inasmuch as the repeated references to "serious offenses" in King seem more descriptive than prescriptive (Murphy 2013, p. 170), little in that opinion supports the limitation that the Haskell plaintiffs now propose.

References
  • Brief for the United States as Amicus Curiae in Support of Appellees and Affirmance, Haskell v. Harris, No. 10-15152, Oct. 28, 2013
  • Haskell v. Harris, 686 F.3d 1121 (9th Cir. 2012) (granting rehearing en banc)
  • Maryland v. King, 133 S. Ct. 1958 (2013)
  • David H. Kaye, Why So Contrived? DNA Databases After Maryland v. King, Journal of Criminal Law and Criminology, Vol. 104, May 2014 (in press)
  • Erin Murphy, License, Registration, Cheek Swab: DNA Testing and the Divided Court, 127 Harv. L. Rev. 161 (2013)