Tuesday, December 24, 2013

Breathalyzers and Beyond: The Unintuitive Meanings of "Measurement Error" and "True Values" in the 2009 NRC Report on Forensic Science

Five years ago, the National Research Council released its eagerly awaited and repeatedly postponed report on "Strengthening Forensic Science in the United States: A Path Forward." One theme of the report was that forensic experts must present their findings with due recognition of Rumsfeldian "known unknowns." For example, the report repeatedly referred to "the importance of ... a measurement with an interval that has a high probability of containing the true value" (NRC Committee 2009, p. 121), and it referred to "error rates" for categorical determinations (ibid., pp. 117-22). 

Earlier this year, UC-Davis law professor and evidence guru Edward Imwinkelried and I submitted a letter urging the Washington Supreme Court to review a case raising the issue of whether the state courts should admit point estimates of blood or breath alcohol concentration without an accompanying quantitative estimate of the uncertainty in each estimate. (The court denied review.) Since the NRC report uses breath-alcohol measurements to explain the meaning of its call for interval estimates, one would think that the report would have a good illustration of a suitable interval. But that is not what I found. The report's illustration reads as follows:
As with all other scientific investigations, laboratory analyses conducted by forensic scientists are subject to measurement error. Such error reflects the intrinsic strengths and limitations of the particular scientific technique. For example, methods for measuring the level of blood alcohol in an individual or methods for measuring the heroin content of a sample can do so only within a confidence interval of possible values. In addition to the inherent limitations of the measurement technique, a range of other factors may also be present and can affect the accuracy of laboratory analyses. Such factors may include deficiencies in the reference materials used in the analysis, equipment errors, environmental conditions that lie outside the range within which the method was validated, sample mix-ups and contamination, transcriptional errors, and more.

Consider, for example, a case in which an instrument (e.g., a breathalyzer such as Intoxilyzer) is used to measure the blood-alcohol level of an individual three times, and the three measurements are 0.08 percent, 0.09 percent, and 0.10 percent. The variability in the three measurements may arise from the internal components of the instrument, the different times and ways in which the measurements were taken, or a variety of other factors. These measured results need to be reported, along with a confidence interval that has a high probability of containing the true blood-alcohol level (e.g., the mean plus or minus two standard deviations). For this illustration, the average is 0.09 percent and the standard deviation is 0.01 percent; therefore, a two-standard-deviation confidence interval (0.07 percent, 0.11 percent) has a high probability of containing the person’s true blood-alcohol level. (Statistical models dictate the methods for generating such intervals in other circumstances so that they have a high probability of containing the true result.)
(Ibid., pp. 116-17.)

What is troublesome about this explanation? Let me count the ways.

1. "Measurement error" does not refer to all errors of measurement

"[D]eficiencies in the reference materials used in the analysis, equipment errors, environmental conditions that lie outside the range within which the method was validated, sample mix-ups and contamination, transcriptional errors, and more" all "can affect the accuracy of laboratory analyses." Nevertheless, they do no count as "measurement error" because they are "factors other than the inherent limitations of the measurement technique." Not being "intrinsic [to] the particular scientific technique," they fall outside the committee's definition of "measurement error."

That narrow definition calls to mind the claims of some fingerprint analysts that the ACE-V method has an "methodological" error rate of zero because the only possibility for error arises when a human being does not apply the method perfectly. The difference, however, is that one can measure the errors when the breathalyzer has no deficient reference materials, no extreme environmental conditions, no sample mix-ups and contamination, no transcriptional errors, and so on. The fingerprint analyst, in contrast, is the measuring instrument, and it is impossible to distinguish between instrument measurement error and human error in that context.

There is nothing illogical in quantifying some but not all measurement errors when some are more readily and validly quantifiable than others. Machines might not be tested periodically to ensure that they are operating as they are supposed to (e.g., DiFilipo 2011; Sovern 2012), but whether one can usefully build that possibility into the computation of the uncertainty of a measurement that might be suitable for courtroom testimony is not clear. Yet, using the seemingly all-encompassing phrase "measurement error" in a narrow, technical sense -- to denote only the noise inherent in the apparatus when operated under certain conditions -- is potentially misleading.

2. "True values" are not true blood-alcohol levels.

Because the committee's example of "measurement error" quantifies only "intrinsic" error, its statement that "a two-standard-deviation confidence interval (0.07 percent, 0.11 percent) has a high probability of containing the person’s true blood-alcohol level" also is easily misunderstood. The confidence interval (CI) for "true values" does not pertain to the actual blood-alcohol level. That level can differ from the point estimate of 0.09 for other reasons, making the real uncertainty greater than ± 0.02.

In addition, a breathalyzer measures alcohol in the breath, not in the bloodstream. The concentrations are related, but the precise functional relationship varies across individuals (e.g., Martinez & Martinez 2002). This is another source of uncertainty not reflected in the committee's CI for blood-alcohol concentration (BAC), although the committee could have sidestepped this issue by referring to breath-alcohol concentration (BrAC).

3. The standard error of the breathalyzer would be determined differently.

The NRC committee imagines using a breathalyzer to make three measurements of the same breath sample. The parenthetical, concluding sentence about "statistical models" for "other circumstances" suggests that the committee realized that this approach is not one that anyone would use to estimate the noise in the apparatus. The breathalyzer should be tested on many samples with known concentrations to ensure that it is not biased and to quantify the extent of the random variations about those known values. Manufacturers perform such tests (e.g., Coyle et al. 2010).

4. A CI of ±2 standard errors might not have "a high probability of containing the person’s true blood-alcohol level"

Let's put aside all the concerns raised so far. Suppose that the errors in the machine's measurement always are normally distributed about the true value in a breath sample; that the applicable standard deviation for this distribution is 0.01; and that the single measured value is 0.09. Is it now true that the interval 0.09 ± 0.02 "has a high probability of containing the person’s true blood-alcohol level"?

Maybe. Two standard errors give an interval with a confidence coefficient of approximately 95%. That is to say that this one interval comes from a procedure that generates intervals that cover the true value about 95% of the time. It is tempting to say that the probability that the interval in question covers the true value therefore is 95%.

But let's think about how the sample came to be tested. The arrested officer picks someone out of a population of motorists. The motorists have varying levels of BrACs, and the officer has some level of skill in spotting the ones who might well be inebriated. Suppose that the drivers the officer stops and tests have BrACs that are normally distributed with mean 0.04 and standard deviation 0.01. The officer's breathalyzer is functioning according to manufacturer's specifications, and the standard deviation in its measurements is 0.01, as in the NRC report. Having obtained a measurement of 0.08 on the one driver's breath sample, what is a high probability interval for true BrAC in this one breath sample? Is it 0.07 to 0.11?

It turns out the probability that the true BrAC falls within the NRC's interval is only 24% (applying equations 2.9 and 2.10 in Gelman et al. 2004). If the officer stopped drivers who whose mean BrAC were greater than 0.04 or with more variable BrACs, the probability for the NRC's interval being correct would be greater. If, for example, the standard deviation in this group were 0.02 instead of 0.01 (and the mean were still 0.04), then the probability for the NRC's interval would be 87%.

Of course, we do not know much about the distribution of BrAC in the group that the officer stops. As indicated above, this distribution would depend on the drinking habits of drivers in the town and the officer's skill in pulling over drunken drivers. The choice of a normal distribution with the parameters mentioned above is not likely to be realistic. But whatever the distribution may be, it, along with the single measured value, bears on the true value of the tested driver's BrAC. This fact makes it tricky to quantify the probability that the NRC's CI includes the driver's BrAC.

* * *

The NRC Report was certainly correct to call on forensic scientists to develop better measures of the uncertainty in their findings and to apply them in their reports and testimony. But figuring out what these measures should be and how to use them is a formidable challenge. Meeting this challenge will be a lot harder than the simple example of a confidence interval in the report might suggest.


Sunday, December 22, 2013

Forensic Science’s Latest Proof of Uniqueness

A federally funded study on the "Determination of Unique Fracture Patterns in Glass and Glassy Polymers" affirms that fracture-pattern matches are unique. The researchers believe their work permits experts to continue to provide their "usually conclusive" testimony about cracked glass and plastic.

"The purpose of the research" undertaken at the University of California at Davis's graduate program in forensic science was "to provide a first, objective scientific background that will illustrate that repetitive fractures, under controlled conditions on target materials such as glass window panes and glass bottles, are in fact different and unique. In this phase of our study, we fractured glass window panes, glass bottles (clear wine bottles), and polymer tail light lens covers. Each and every fracture was documented in detail for subsequent inter-comparison and to illustrate the uniqueness of the fracture pattern." (Tulleners et al. 2013, p. 7).

Not surprisingly, the researchers found that all their fractures were distinguishable. In all, they conducted 5,310 pairwise comparisons by examining the fracture patterns in all pairs formed within each of the three groups of 60 items. This finding, they concluded, "should aid the practitioner in any court testimony involving the significance of fracture matching of broken glass and polymers materials." (Ibid., p. 23).

What testimony might this be? "For the forensic community, the ability to piece together glass fragments in order to show a physical fit or a 'Physical Match' is the strongest evidentiary finding of an association." (Ibid., p. 6) "The usual statement is that 'the evidence glass fragment was physically matched to another glass establishing thus both share a common origin.'" (Ibid.) This testimony, the researchers suggest, is just fine: "we are substantiating the individuality of glass and polymer fractures under closely controlled conditions." (Ibid., p. 3, emphasis added). Thus, "[t]his research should enhance the capability of the analyst to testify in a court of law as to the uniqueness of a fracture." (Id., p. 61, emphasis added).

But why would the analyst want to claim universal uniqueness? Forensic science’s hoary division of its world into two parts -- "unique" feature sets and "class" characteristics -- is an article of faith. (E.g., Kaye 2009). The latest study certainly is of some use in confirming the intuition that fracture patterns are highly variable. The existence of varying patterns is one fact that makes "fractography" evidence, as it is called in the field, probative. But the study’s explanation of how it proves that every pattern is unique seems like a parody of scientific reasoning. The explanation is this:
In this research, it is hypothesized that every fracture forms a unique and nonreproducible fracture pattern. Alternately, it may be that some fracture patterns may be reproduced from time to time. If it is found that each fracture forms a unique and nonreproducible fracture pattern, then this finding will support the theory that coincidental duplication of fracture patterns cannot be attained. However, if duplicate fracture patterns are found, this would falsify the null hypothesis and show that some fracture patterns may be reproduced from time to time.
(Ibid., p. 27). Such is the power of the unique-vs-class thinking. This impoverished dichotomy collapses a spectrum of possible states of nature into two discrete states. Combined with a cartoon-like version of Sir Karl Popper’s criterion of falsification, it leads the researchers to believe that their failure to find a class characteristic proves the "null hypothesis" of uniqueness.

True, the failure to find "duplicate fracture patterns" in a small sample "support[s] the theory that coincidental duplication of fracture patterns cannot be attained." (Or it would in a study in which the analyst deciding on whether two patterns were the same did not already know that all of them came from different objects.)

But it also supports the alternative theory that coincidental duplication can be attained. Instead of taking no-duplication-is-possible as the “null hypothesis,” we could postulate that, on average, 1 in every 10,000 fractures of the items tested would produce indistinguishable fracture patterns. Or, we could hypothesize that the mean duplication rate is 1/100,000. Since we just spinning out hypotheses, we could pick still other rates.

A great many such hypotheses seem compatible with the finding of no duplicates among 180 fractures. Observing a unique set of patterns in the sample supports (to varying degrees) a wide range of hypotheses about the duplication probability. To indulge an overly simplistic model, if we were to assume that the probability of detecting a duplicated pattern in each of the 5,310 comparisons were some identical, albeit small, number, then the 95% confidence interval for this duplication probability would go from zero (uniqueness) all the way up to 1/1770. (See Eypasch et al. 1995). To testify that the experiment supports only “the theory that coincidental duplication of fracture patterns cannot be attained” would be foolish. A more accurate statement would be that it supports the theory that duplication occurs at an unknown, but not very large, rate.

To be sure, there is reason to believe that duplication is improbable, and the UC-Davis study adds to our knowledge of fracture patterns. However, fractographers should think twice (or more!) before they testify that the study demonstrates the utter uniqueness of all fractures. They gain little by embracing the claim of universal uniqueness (Cole 2009; Kaye et al. 2011), and this study does not deliver on the promise of "objective criteria to determine the uniqueness of a fit." (Tulleners et al., p. 7).

  • Simon A. Cole, 2009. Forensics Without Uniqueness, Conclusions Without Individualization: The New Epistemology of Forensic Identification. Law, Probability and Risk 8:233-255
  • Ernst Eypasch, Rolf Leferinga, C K Kuma, Hans Troid, 1995. Probability of Adverse Events That Have Not Yet Occurred: A Statistical Reminder. Brit. Med. J. 311:619, available at http://www.bmj.com/content/311/7005/619
  • David H. Kaye, David E. Bernstein & Jennifer L. Mnookin, 2011. The New Wigmore, A Treatise on Evidence: Expert Evidence. New York: Aspen Pub. Co. (2d ed.)
  • David H. Kaye, 2009. Identification, Individuality, and Uniqueness: What's the Difference? Law, Probability & Risk 8:85-89, http://ssrn.com/abstract=1261970 (abstract)
  • Frederic A. Tulleners, John Thornton & Allison C. Baca, 2013. Determination of Unique Fracture Patterns in Glass and Glassy Polymers, available at https://www.ncjrs.gov/pdffiles1/nij/grants/241445.pdf

Sunday, December 8, 2013

Error on Error: The Washington 23

Frequently cited in warnings on the risks of errors in DNA typing is a 2004 article prepared by unnamed staff of the Seattle Post-Intelligencer. In one highly praised book, for instance, Sheldon Krimsky of Tufts University and Tania Simoncelli, then with the ACLU, wrote that the paper “reported that forensic scientists at the Washington State Patrol Laboratory had made mistakes while handling evidence in at least 23 major criminal cases over three years” [1, p. 280]. The article itself begins “[c]ontamination and other errors in DNA analysis have occurred at the Washington State Patrol crime labs, most of it the result of sloppy work” [2].

Laboratory documentation of “sloppy work” should be encouraged. It should be scrutinized inside and outside of the laboratory. Within the laboratory, it can be a path to improvements. Outside the laboratory world, reporting on problems, quotidian and catastrophic alike, can increase the level of public and professional understanding of how forensic science is practiced. However, it is important to be clear about the nature, severity, and implications of specific “mistakes,” “errors,” and “contamination.” These terms cover a variety of phenomena.

Even before the earliest days of PCR-based DNA typing, it has been known that “contamination” is an omnipresent possibility. It can result from extraneous DNA in materials from companies that supply reagents and equipment, from the introduction of the analyst’s DNA into the sample being analyzed (“for example, when the analyst talks while handling a sample, leaving an invisible deposit of saliva” [2]), from inadequate precautions against transferring DNA from one test with one sample over to another test with a different sample (a form of “cross-contamination”), and so on. Many forms of contamination are detectable, but they can complicate or interfere with the interpretation of an STR profile [3]. Cross-contamination of a crime-scene sample with a potential suspect’s DNA either before or after it reaches the laboratory is particularly serious because it could result in a false match.

As described in an appendix below, it appears that only one of the 23 cases (#22) involved a false report of a match, and the report was corrected before any charges were filed. However, Bill Thompson presented a different case as a premier example of "false cold hits" [4, p. 230]. In his latest publication on errors in DNA typing, he wrote that
[W]hile the Washington State Crime Patrol Laboratory a cold-case investigation of a long-unsolved rape, it found a DNA match to a reference sample in an offender database, but it was a sample from a juvenile offender who would have been a toddler at the time the rape occurred. This prompted an internal investigation at the laboratory that concluded that DNA from the offender's sample, which had been used in the laboratory for training purposes, had accidentally contaminated samples from the rape case, producing a false match. [4, p. 230].
Thompson noted that he "assisted the newspaper in the investigation" [4, p. 341 n.12]. Apparently, he was referring to case #5 in the article (although the article labels it a homicide case). In any event, it is the only case Thompson lists as an example of a false match in Washington.

My conclusion is that the Washington cases certainly establish that mistakes of many types can occur in DNA laboratories and that some types of mistakes can produce false matches, false accusations, and even false convictions. But none of the 23 are themselves instances of false charges or false convictions. This conclusion neither condones the mistakes nor excludes the possibility that DNA has produced such outcomes in Washington.  But it may help put the 23 cases and the writing about them in perspective.

  1. Sheldon Krimsky & Tania Simoncelli, Genetic Justice: DNA Data Banks, Criminal Investigations, and Civil Liberties (2011)
  2. DNA Testing Mistakes at the State Patrol Crime Labs, Seattle Post-Intelligencer, July 21, 2004, 10:00 pm, http://www.seattlepi.com/local/article/DNA-testing-mistakes-at-the-State-Patrol-crime-1149846.php
  3. Terri Sundquist & Joseph Bessetti, Identifying and Preventing DNA Contamination in a DNA-Typing Laboratory, Profiles in DNA, Sept. 2005, at 11-13, http://www.promega.com/~/media/Files/Resources/Profiles%20In%20DNA/802/Identifying%20and%20Preventing%20DNA%20Contamination%20in%20a%20DNA%20Typing%20Laboratory.ashx
  4. William C. Thompson, The Myth of Infallibility, in Genetic Explanantions: Sense and Nonsense 227 (Sheldon Krimsky & Jeremy Gruber eds. 2013)
Related postings
23 and Me

This Appendix quotes the newspaper descriptions in full, then offers my own remarks.

Problem: Cross-contamination
When and where: July 2002, Spokane lab
Forensic scientist: Lisa Turpen
Case: child rape
What happened: Turpen contaminated one of four vaginal swabs with semen from a positive control sample. Corrected report issued almost two years later in March 2004. ....Yakima prosecutors offered plea deal during the trial, with defendant pleading guilty to two gross misdemeanors. Turpen's mistake was a factor, according to defense.”

REMARKS: I do not know what “semen from a positive control sample” means. When DNA from a cell line is used to ensure that PCR is amplifying those alleles, the cell-line DNA is known as a positive control sample. This example does not sound like a case of contamination involving that kind of a positive control. Adding semen to a vaginal swab obviously is unacceptable, but if the other three swabs produced a single male DNA profile and the fourth showed two male profiles in a case involving a single rapist, the anomalous profile would not be falsely matched to anyone.

Problem: Erroneous lab report
When and where: August 2002, Seattle lab
Forensic scientist: William Stubbs
Case: Fatal police shooting of Robert Thomas
What happened: Two hours before testifying at inquest, Stubbs discovered his crime lab report was wrong and notified prosecutor. His report said test found brown stain on gun was likely blood, but his notes had no indication of blood. ... Corrected report issued in September 2002. ... Co-worker reviewing case did not catch mistake.

REMARK: Does not involve DNA typing.

Problem: Self-contamination
When and where: April 2001, Spokane lab
Forensic scientists: Charles Solomon, Lisa Turpen
Case: rape/kidnapping/assault
What happened: In separate tests, Solomon and Turpen contaminated hair-root tests with their own DNA. Solomon also contaminated reference blood sample with his DNA. ...Three defendants were convicted.

REMARK: There is no suggestion of a false match here.

Problem: Testing error
When and where: September 2002, Marysville lab
Forensic scientist: Mike Croteau
Case: robbery/assault
What happened: Rushing to meet deadlines, Croteau mixed up reference samples from victim and suspect. He reported incorrect findings verbally to prosecutor, then discovered his mistake. ... Defendant pleaded guilty.

REMARK: What is the mistake here? It must be something more than using the wrong names for the two samples that were compared to produce a false match.

Problem: Cross-contamination
When and where: August 2003, Seattle lab
Forensic scientist: Robin Bussoletti
Case: homicide
What happened: Bussoletti likely contaminated work surface while testing a blood sample from a convicted felon during training. Next DNA analyst who used work station noticed contamination in chemical solution that is not supposed to contain DNA.

REMARKS: Definitely sloppy -- and potentially falsely incriminating if work surface was then used without a thorough cleaning for casework.

Problem: Cross-contamination
When and where: January 2004, Tacoma lab
Forensic scientist: Jeremy Sanderson
Case: child rape
What happened: Sanderson failed to change gloves between handling evidence in two cases. He noticed contamination in chemical solution. ... Defendant convicted and sent to prison.

REMARK: Is this a case of cross-contamination of samples?

Problem: Error during testing
When and where: June 2002, Seattle lab
Forensic scientist: Denise Olson
Case: aggravated murder
What happened: Olson did initial test to look for blood on shoes. She got weak positive result, then threw out swabs. She didn't document findings or notify police. Kirkland police complained because discarded swabs couldn't be tested for DNA. ... Shoes sent to private lab for retesting. ... Defendant Kim Mason convicted and sentenced to life without release.

REMARK: Not a false match

Problem: Error in DNA test interpretation
When and where: October 1998, Seattle lab
Forensic scientist: George Chan
Case: rape
What happened: Chan misstated statistical likelihood of match with suspect. Co-worker reviewing case didn't catch error. ... Pierce County prosecutor noticed mistake at pretrial conference in September 2000. ... Defendant convicted.

REMARK: Not a false match

Problem: Error in testing procedure
When and where: September 2002, Seattle lab
Forensic scientist: Denise Olson
Case: robbery/assault
What happened: Olson tested known DNA samples before evidence collected at crime scene -- a violation of lab procedure aimed at preventing cross-contamination. A co-worker caught the mistake while reviewing the case.... Tests were redone. ... Defendant pleaded guilty.

REMARK: This departure from protocol raises the risk of an incriminating case of cross-contamination, but there is no indication that any cross-contamination occurred.

Problem: Self-contamination
When and where: November 2002, Tacoma lab
Forensic scientist: Mike Dornan
Case: rape

What happened: Dornan contaminated DNA test of victim's underwear with his own DNA. May have resulted from talking during testing process.... Defendant pleaded guilty.

REMARK: No false match.

Problem: Unknown source of contamination
When and where: January 2004, Tacoma lab
Forensic scientist: Christopher Sewell
Case: homicide
What happened: Sewell found low level of DNA from unknown source in blood sample from victim. May have come from blood transfusion of victim before death. ... Case pending.

REMARK: The “unknown source of contamination” does not seem to have produced a false match if peak heights indicated a minor contributor, and the major contributor was the defendant,

Problem: Self-contamination
When and where: March 2004, Tacoma lab
Forensic scientist: William Dean
Case: rape
What happened: Dean contaminated control sample with his own DNA while testing police evidence. ... No suspect.

REMARK: No suspect, no contamination of a crime-scene or suspect sample, no false match.

Problem: Unknown source of contamination
When and where: January 2003, Spokane lab
Forensic scientist: Lisa Turpen
Case: murder
What happened: Turpen found unidentified female DNA in control sample while testing evidence in Stevens County double-murder case.... Defendant convicted.

REMARK: No contamination of a crime-scene or suspect sample, no false match.

Problem: Unknown source of contamination
When and where: January 2003, Spokane lab
Forensic scientist: Lisa Turpen
Case: robbery/kidnapping
What happened: Turpen found unidentified female DNA in control sample while testing evidence in Yakima County case. Evidence tested same day as evidence in Example No.13.... Case pending.

REMARK: No contamination of a crime-scene or suspect sample, no false match.

Problem: Self-contamination
When and where: September 2003, Marysville lab
Forensic scientist: Greg Frank
Case: murder
What happened: Frank contaminated control samples with his own DNA during testing in Snohomish County case. ...Case pending.

REMARK: No contamination of a crime-scene or suspect sample, no false match.

Problem: Self-contamination
When and where: September 2003, Marysville lab
Forensic scientist: Greg Frank
Case: child molestation/rape
What happened: Frank contaminated control samples with his own DNA during testing in Kitsap County case. ... Defendant pleaded guilty.

REMARK: No contamination of a crime-scene or suspect sample, no false match.

EXAMPLES NO. 17 & 18
Problem: Unknown source of contamination
When and where: October 2003, Seattle lab
Forensic scientists: Phil Hodge, Amy Jagman
Cases: unknown
What happened: Hodge and Jagman both discovered unknown source of contamination in chemical used during DNA testing. Chemical discarded and evidence retested.

REMARK: No contamination of a crime-scene or suspect sample, no false match.

Problem: Self-contamination
When and where: October 2002, Spokane lab
Forensic scientists: Charles Solomon, Lisa Turpen
Case: murder
What happened: Solomon found Turpen's DNA on three bullet casings retrieved from scene of Richland double murder. ... Defense expert disputed this at trial, testifying that DNA profile belonged to unknown female. ... Defendant Keith Hilton convicted.

REMARK: No false match.

Problem: Cross-contamination
When and where: February 2002, Tacoma
Forensic scientist: Mike Dornan
Case: child rape
What happened: Dornan contaminated evidence in King County rape case with DNA from a previous case, likely by failing to properly sterilize scissors. ... Defendant pleaded guilty to a reduced charge before contamination was discovered.

REMARK: I presume that if the previous case were the defendant’s and that is what led to the charge against the defendant, the newspaper would have so stated. That would have been a false match.

Problem: Self-contamination
When and where: January 2001, Marysville lab
Forensic scientist: Brian Smelser
Case: rape
What happened: Smelser contaminated three tests with his own DNA in Kirkland rape case. Prosecutor had to send remaining half-sample to California lab for retesting.... Defendant pleaded guilty to reduced charge.

REMARK: No false match.

Problem: Error in testing
When and where: December 2002, Seattle lab
Forensic scientist: Denise Olson
Case: rape/attempted murder
What happened: Olson misinterpreted DNA results, telling Seattle police their suspect was a match. Co-worker caught error 11 days later, just as charges were about to be filed.... Case unsolved.

REMARK: A false positive report (not resulting from contamination).

Problem: Self-contamination
When and where: January 2004, Seattle lab
Forensic scientist: George Chan/William Stubbs
Case: child rape
What happened: Chan's DNA found in suspect's boxer shorts by Stubbs. Problem traced to Chan talking to Stubbs during testing.... Suspect pleaded guilty.

REMARK: No contamination of a crime-scene or suspect sample, no false match.

Saturday, December 7, 2013

Error on Error: Quashing Brian Kelly's Conviction

Are there any errors in DNA testing? Are there any errors that produce false positives? Do DNA databases generate any false leads? Do false leads produce any false arrests? Any false convictions? The answers to these questions are yes, yes, yes, yes, and yes. (See related postings below.)

But how large is the risk of a false positive match to an existing suspect? To an innocent individual culled from a database? By and large, we are limited to isolated reports in newspapers--reports that are newsworthy precisely because they are rare. The most complete compilation of the troubling cases, presented in a survey of the ways that errors can arise, is to be found in a book chapter by Bill Thompson of the University of California at Irvine. [1]

Professor Thompson is an unusually knowledgeable and astute commentator, consultant, and advocate in the field of DNA evidence, and it should be revealing to work through his examples. That is what I have started to do. So far, I have looked into only the very first case noted in the chapter. According to Professor Thompson, it exemplifies a "common problem" [1, p. 230] of "[a]ccidental transfer of cellular material or DNA from one sample to another" [1, p. 229] causing "false reports of a DNA match between samples that originated from different people" [1, p. 230].

The example is a 1988 DNA test in a rape case in Scotland that led to the conviction of Brian Kelly. Thompson simply reports that "Scotland's High Court of Justiciary quashed a conviction in one case in which the convicted man (with the help of sympathetic volunteer scientists) presented persuasive evidence that the DNA match that incriminated him arose from a laboratory accident" [1, 230]. The "accident" in question consisted of DNA leaking from one well to an adjacent one in an agarose gel used in VNTR typing or an analyst's misloading some of the same DNA sample into both wells instead of just the one she was aiming for.

But the evidence that Professor Thompson found "persuasive" did not persuade the court. Indeed, the experts did not even testify that leakage or misloading had occurred. Rather, they stated that it was a "low risk" event, that the possibility could not be excluded, and that a procedure that would have reduced the risk could have been followed (and was adopted two years later) [2, ¶¶ 15-17].

Thus, the Scottish Appeals Court, noting other evidence in Kelly's favor and weaknesses in the Crown's case, quashed the conviction--but not because it concluded that that the match was false. The court quashed the conviction because the jury was not informed of the fact that the same DNA could end up in two adjacent lanes. The court wrote:
It was not suggested that there is evidence positively indicating that cross-contamination did, or may have, occurred. On the basis of the evidence tendered by the appellant, it is maintained, on the other hand, that there was a risk of cross-contamination arising from the practice at that time of using adjoining wells for DNA samples from the crime scene and the suspect, and of such cross-contamination being undetected. It was not in controversy that it was possible for there to be leakage between adjoining wells or for DNA material to fall accidentally into a well next to the one for which it was intended. Up to a point the evidence ... as to the procedures which were followed, and the special care which was taken, countered the risk that such a mishap would in practice occur or be undetected. However, such evidence does not in our view provide a complete answer. In particular there was, on the evidence, a risk that the leakage of DNA from the well for the suspect's reference sample to the adjoining well which already held the crime scene sample would not be detected. It was, of course, a low risk, but it was of sufficient importance to be recognised by experts ... .

... In our opinion there is evidence which is capable of being regarded as credible and reliable as to the existence of a risk of cross-contamination occurring without it being detected. The risk was a low risk. It may be that in other circumstances the fact that the jury did not hear such evidence would not lead to the conclusion that there had been a miscarriage of justice. However, in the present case it is otherwise since the DNA evidence was plainly of critical importance for the conviction of the appellant. If the jury had rejected that evidence there would, in our view, have been insufficient evidence to convict the appellant. Accordingly, while the evidence related to a low risk of cross-contamination, the magnitude of the implications for the case against the appellant were substantial. For these reasons we have come to the conclusion that the appellant has established the existence of evidence which is of such significance that the fact that it was not heard by the jury constituted a miscarriage of justice. [2, ¶ 21-22]

Based on this opinion, the 1988 DNA testing with a superseded technology is a far cry from is a true example of an innocent man convicted because of "a laboratory accident." It is nothing more--or less--than a case in which the defendant did not present expert testimony at trial that the laboratory used a procedure that left open a preventable mode of cross-contamination. The case is an appropriate illustration of the importance of improving laboratory practices, but such cases are not proof of known "false reports" commonly resulting from cross-contamination.

  1. William C. Thompson, The Myth of Infallibility, in Genetic Explanantions: Sense and Nonsense 227 (Sheldon Krimsky & Jeremy Gruber eds. 2013)
  2. Opinion in the Reference by the Scottish Criminal Cases Review Commission in the Case of Brian Kelly, Appeal Court, High Court of Justiciary, Appeal No. XC458/03, Aug. 6, 2004, http://www.scotcourts.gov.uk/opinions/XC458.html
Related postings

Friday, December 6, 2013

Get Serious: The US Department of Justice's Amicus Brief in Haskell v. Harris

As the U.S. Court of Appeals for the Ninth Circuit returns to the question of the constitutionality of California's DNA database law, the United States has weighed in with an amicus brief. It is worried (or should be) that the en banc panel will take too seriously the Supreme Court's references to “serious offenses” in Maryland v. King, the DNA-on-arrest case decided last June. The Maryland law that the Court narrowly upheld authorizes DNA collection for violent felonies, burglaries (and attempts to commit those crimes). The California law under attack in Haskell is broader, applying to all felony arrests, including those that would seem rather petty to the casual observer. (The federal law is broader still, encompassing every offense, no matter how trivial, for which a person is dragged into custody.)

Consequently, it comes as no surprise that the federal government wants the Ninth Circuit to read King expansively, whereas the ACLU, which represents the plaintiffs in Haskell, is pressing for the narrowest possible reading. Interestingly, opponents of all forms of DNA-BC (routine arrestee DNA sampling before conviction) tend to read the majority opinion in King broadly. Professor Erin Murphy, for example, concludes in her recent review of the case that the majority did not even "attempt[] to limit its holding to serious crimes." (Murphy 2013, p. 171).

The U.S. Department of Justice (DOJ) could not agree more. Here, I want to look critically at the DOJ’s arguments and statements. The brief essentially argues that (1) the King Court intended its opinion to settle the Fourth Amendment status of all existing DNA-BC laws, (2) by definition, the reasons to uphold the narrower Maryland law apply with equal force to the broader California law, and (3) the King Court's use of the word "booking" and its analogy between DNA profiling and fingerprinting settle the issue as a matter of logic and substance. None of these arguments is conclusive.

I. What Were the Justices Thinking?

The DOJ lawyers seem to think that because the Court was aware that DNA-BC is a national issue, its opinion was meant to settle the issue for all DNA-BC statutes. Their brief quotes the majority's observation that:
Noting that “[t]wenty-eight States and the Federal Government have adopted laws similar to the Maryland Act,” the Court explained that “[a]lthough those statutes vary in their particulars, such as what charges require a DNA sample, their similarity means that this case implicates more than the specific Maryland law.” King, 133 S. Ct. at 1968 (emphasis added).
Brief for the United States, at 3.

The Court’s phrasing cannot bear the weight the government places on it.  Of course “the case” implicates other laws. That is one reason the Court decided to review the case. The Court could have effectively struck down a swath of federal and state laws in one fell swoop. It did not. Is the Court’s awareness of the fact that state laws vary in the offenses that trigger arrestee sampling an announcement that the Court thinks it is upholding the laws of 28 states and the federal government? The next sentence in the majority opinion summarizes the spread of DNA-BC across the country: “At issue is a standard, expanding technology already in widespread use throughout the Nation.” 133 S. Ct. at 1968. The“standard, expanding technology” is the “national project to standardize collection and storage of DNA profiles [known as] the Combined DNA Index System (CODIS) [that] connects DNA laboratories at the local, state, and national level [and that] collects DNA profiles provided by local laboratories taken from arrestees, convicted offenders, and forensic evidence found at crime scenes.” Id. Before the Court even agreed to hear Maryland v. King, the Chief Justice stayed the enforcement of the Maryland Court of Appeal decision partly on the ground that it affected the national system. Yes, the Court expected its opinion in King to affect what other states would do, but that expectation does not mean that its opinion addresses the constitutionality of matters not before it.

II. Is the Balance the Same in California?

The real issue in Haskell is not whether the Justices had the California law in mind when they wrote their opinions in King. It is whether their reasoning dictates the same outcome. Addressing that question, the government claims that “[e]ach of the interests that informed the Court’s holding that the Maryland law was reasonable under the Fourth Amendment similarly applies to California’s law. Consequently, it too is reasonable under the Fourth Amendment.” Amicus Brief of the United States, at 5.

Huh? I invest my money in the common stock of the ABC corporation in light of my assessment of the balance of risk and reward. Although my interests—financial security and possible gain—are the same in all my investments, it does not follow from the fact that my decision to purchase the ABC shares was reasonable that all my investment decisions are equally reasonable. Thus, the DOJ’s argument is incomplete. What matters is not whether the same interests nominally are at play, but whether there are differences that affect the balance of these interests in each situation.

Surely the case for DNA-BC is at least somewhat weaker when it comes to minor offenses. Although "[p]eople detained for minor offenses can turn out to be the most devious and dangerous criminals," Florence v. Bd. of Chosen Freeholders of County of Burlington, 132 S. Ct. 1510, 1520 (2012), on average, people arrested for minor traffic offenses are less likely to be hiding their true identities and to have incriminating DNA samples at crime-scenes than are people arrested for far more serious matters. Whether differences like these are significant enough to change the outcome is debatable of course, but the government’s theory in Haskell is superficial. That the list of generic interests is the same for the most serious and the least serious offenses is the beginning, not the end, of the analysis.

III. Are All Booking Procedures for Identification the Same?

A third argument of sorts emerges in the government brief. Its logical structure is this: (1) arrestee fingerprinting is a constitutionally reasonable booking procedure; (2) arrestee DNA profiling is an analogous booking procedure; therefore, (3) arrestee DNA profile is constitutionally reasonable. The brief puts it this way:
If the term “serious offense” did carry any meaning in King, [it] includes any crime for which an individual is arrested and booked in police custody. This meaning is logical, not only because the Court analyzed DNA fingerprinting as a “booking procedure,” but also because it analogized DNA fingerprinting to traditional “fingerprinting and photographing.”
Brief for the United States, at 7-8.

This “logic” is specious. That DNA sampling is a permissible part of the bookkeeping process for an individual placed in custody for offense A does not imply that it also is permissible for offense B unless B = A in all relevant respects. There is no a priori logical reason to assume that all offenses are so fungible. Similarly, that DNA is like friction-ridge skin in that both can be used to differentiate among individuals does not necessarily mean that the two identifiers are equivalent in other respects. The real issue, as explained above, is whether the government’s interests in acquiring DNA profiles are so much less with respect to some offenses that the government’s demand for the DNA becomes unreasonable. That is a question of practical reason, not of deductive logic or word games.

Recognizing that the King opinion does not foreclose a distinction between serious and nonserious felonies, however, does not imply that the case should be confined to the qualifying offenses in the Maryland law.  The limited information content of a DNA identification profile was a very important factor on one side of the balance sheet in King.  It may not take a particularly puissant set of state interests to overcome the individual interest in shielding this limited information from discovery.  Inasmuch as the repeated references to "serious offenses" in King seem more descriptive than prescriptive (Murphy 2013, p. 170), little in that opinion supports the limitation that the Haskell plaintiffs now propose.

  • Brief for the United States as Amicus Curiae in Support of Appellees and Affirmance, Haskell v. Harris, No. 10-15152, Oct. 28, 2013
  • Haskell v. Harris, 686 F.3d 1121 (9th Cir. 2012) (granting rehearing en banc)
  • Maryland v. King, 133 S. Ct. 1958 (2013)
  • David H. Kaye, Why So Contrived? DNA Databases After Maryland v. King, Journal of Criminal Law and Criminology, Vol. 104, May 2014 (in press)
  • Erin Murphy, License, Registration, Cheek Swab: DNA Testing and the Divided Court, 127 Harv. L. Rev. 161 (2013)

Friday, November 29, 2013

Maryland v. King: The Dissent's Ten Second Rule

The four dissenting Justices in Maryland v. King insisted that DNA databases and fingerprint databases are as different as night and day. As NYU Law Professor Erin Murphy put it:
Most powerfully, Justice Scalia explained (partially through the use of a chart) why fingerprinting differed dramatically from DNA typing. He observed that known fingerprints are not “systematically compared” with latent prints from unsolved crime scenes (in contrast to DNA), and even if so, courts have never approved such action. He also observed that while fingerprinting may not even be a “search,” analysis of genetic code certainly is.
(Murphy 2013, p. 166, note omitted). Relying solely on Justice Scalia's "powerful" assurances, she adds that
Police have never routinely collected or used photographs or prints for random crime-solving purposes; both were always mainly for identification of persons already suspected of a crime (i.e., individualized suspicion).114 We know this intuitively: how common are newspaper headlines about thirty-year-old cases solved through “cold hit” fingerprint or mug shot matches, or exonerations based on a hit to a fingerprint or photograph newly uploaded to the database?
114. See King, 133 S. Ct. at 1987-88 (Scalia, J., dissenting). Indeed, police could not have used photos or fingerprints for random crime-solving even if they had wanted to, since it was not until twenty or so years ago--when large biometric databases were developed--that it was even possible to conduct a random automated comparison between known files and crime scene samples.
Id. at 177-78.There are several problems with Justice Scalia's claims as well as this gloss on them.

I. What Does the Possibility that Fingerprinting Might Not Be a "Search" Prove?

To begin with, the claim that fingerprinting might not be a search as that word is used in the Fourth Amendment is not proof of any "dramatic" difference between fingerprinting and DNA typing.1/ Given Justice Scalia's emphasis on the invasion of physical space as the touchstone for defining a search in recent cases, e.g., United States v. Jones, 132 S.Ct. 945 (2012), how could he maintain that taking control of a person's fingers to rub them on an inked pad and then onto cardboard paper, or to press them against a scanner, is not a search? And even if he were to classify fingerprinting as something less than a search, would not he have do the same for DNA collection if Maryland substituted pressing just one digit onto a sticky pad to recover cells (instead of rubbing a swab along the inside of a cheek)?2/

More fundamentally, the dissent's declaration that one invasion of personal security (fingerprinting) might not rise to the level of a search while another (DNA typing) clearly does assumes what must be proved--that the two are indeed "dramatically" different. Because slight differences can lead to one legal classification (a "search") as opposed to the other (not quite a search), the fact that the Supreme Court has never adjudicated whether fingerprinting is a search does little to demonstrate any stark contrast between that practice and DNA swabbing.

II. How Often Are Suspicionless Fingerprint Database Trawls Conducted?

Moving to the part of the dissenting opinion that does offer an actual distinction (as opposed to a legal label that might (or might not) flow from some unarticulated differences), let us consider Justice Scalia's basic distinction and Professor Murphy's remarks about it. According to the dissent, it is critical that “‘[l]atent prints’ recovered from crime scenes are not systematically compared against the database of known fingerprints, since that requires further forensic work.” 133 S. Ct. at 1987 (note omitted). And citing only Justice Scalia's opinion, Professor Murphy (p. 166) concurs that "[p]olice have never routinely collected or used photographs or prints for random crime-solving purposes ... ."

Justice Scalia provides no particular support for the proposition that arrestee fingerprints are not systematically compared to latent prints, and it is apparent that police often compare latent prints to those from arrestees to generate investigative leads—just as Maryland did with the DNA in King. Let me elaborate on each of these points.

A. Justice Scalia's Ten-second Rule

To support the claim that fingerprints are not used like DNA profiles--to forge previously unsuspected links between unsolved crimes and arrestees--the dissenting opinion cites only one publication. It is an FBI webpage entitled "Privacy Impact Assessment: Integrated Automated Fingerprint Identification System (IAFIS)/Next Generation Identification (NGI) Repository for Individuals of Special Concern (RISC)." To the extent that this webpage is on point, it contradicts the four dissenting Justices' claim. The page only discusses a special database for individuals who may or may not have been arrested. Specifically, RISC “consist[s] of records of known or appropriately suspected terrorists, wanted persons, registered sexual offenders, and (potentially) other categories of heightened interest warranting more rapid responses to inquiring criminal justice users.” Id. § 1.2. The purpose of RISC is “to support rapid biometric searches ... in time-critical situations.” Id. § 1.1. The idea is that “first responder law enforcement officials in the course of their interaction with potential suspects” will acquire the suspect’s fingerprints with a mobile scanner and send them to the FBI. Within ten seconds, the FBI’s computer will advise the submitting agency whether there is a probable match to one of the “individuals of special concern.” Id. At the same time, the FBI will query its “Unsolved Latent File (ULF).” Id. This “cascaded search of the ULF may take considerably more time than the RISC search,” but “if a RISC submission hits on a record in the ULF, ... the ULF record submitter will receive notification of a potential match.” Id.

The dissenting Justices seem to think that the FBI’s notice that “searches of the ‘Unsolved Latent File’ may ‘take considerably more time’” than ten seconds means that the FBI does not perform these slower searches. King, 133 S. Ct. at 1987 n. 4. I am reminded of the five-second rule for consuming food that falls on the floor. Folklore has it that if the period of contact is less than five seconds, contamination is not worth worrying about. Of course, the "rule" is silly. (See Dawson et al. 2007).

The dissent's ten-second rule is not much better. The FBI’s description of RISC is clear (once one cuts through the bureaucratic jargon). Every time the police submit a suspect’s prints for a RISC check, the prints also are checked against those of “unknown persons whose latent fingerprints have been retrieved from locations, property, or persons associated with criminal activity or related to criminal justice or authorized national security investigations.” Id. If this is not a systematic use of suspects’ prints to associate them with unsolved crimes, nothing is.

B. Better Indications of Fingerprint Database Practice

RISC is just one database, and it is not used for routine, station-house bookings. In those situations, whether and when arrestee fingerprints are checked against latent prints from unsolved crimes varies by jurisdiction. In California, “new incoming latent prints from unsolved crimes are routinely searched against arrestee booking prints, regardless of the arrest disposition (e.g., whether or not the arrestee ultimately was convicted of the offense) in the Automated Latent Print System (ALPS) database.” Cal. Dep’t of Just. (2013). Conversely, arrestee prints are checked against the ALPS database to generate investigative leads whenever they do not match known prints already on file. Id. These new arrestee-to-ALPS trawls occurred more than 25% of the time in recent years. Id.

A brief on behalf of California (and every other state) informed the Court of some of these facts. (Brief for the States 2013, pp. 17–18). The dissent’s citation to the largely inapposite RISC system, its misreading of how that system works, and its failure to consider the widespread use of automated-fingerprint-identification systems and to present any information about manual searches conducted before the 1980s or 1990s (see below) partakes more of advocacy than accuracy.

But California is just one state. I have no data on whether other states are more or less systematic in using automated fingerprint identification systems (AFIS's) with their databases or with the national one, to solve crimes. I have requested information on the use of the IAFIS database administered by the FBI from that agency, but I have yet to receive a reply. Could Professor Murphy's intuition that latent prints rarely are searched against the fingerprints on file from arrestees--and that such searches never happened "until twenty or so years ago" be correct?

It seems most unlikely. Even before the introduction of automated trawling of fingerprint databases in the 1970s, fingerprint analysts tediously compared latent prints to the databases. Systems for organizing fingerprints by features such as arches, whorls, and loops on each finger assisted in these searches. Automation, however, "made crime scene processing dramatically more productive. Local and county AFIS purchases were usually justified on the basis of their crime-solving potential." (Moses 2011, p. 6-9). Latent-print searches were not "routine" in the sense that most cases did not lend themselves to this investigative technique, but "fingerprints for random crime-solving" predates electronic searching. One forensic science textbook refers to San Francisco’s hit rate "of 8 percent for manual latent print searches." (Saferstein 2d 2004, p. 416).

Electronic searching, however, made trawls of arrestee databases more feasible, successful, and, yes, systematic. In San Francisco, in 1983, "a new crime scene unit was organized specifically with the new [AFIS] system as its centerpiece. ... All latents that met minimum criteria [were] searched in AFIS." Id. at 6-7. The result was "a dramatic 10-fold increase in latent print identifications in 1984." Id. at 6-8. As with DNA today, database hits made international news. (E.g., id. at 6-8; 1985). Today, they are less publicized, but no less real. (See Kaye 2013, p. 38 n. 38 (collecting statistics indicating that hundreds of thousands of fingerprint database trawls occur annually in active and cold cases combined)).

In short, claims that "[l]atent prints” recovered from crime scenes are not systematically compared against the database of known fingerprints," 133 S. Ct. at 1987 (Scalia, J., dissenting), and that "[p]olice have never routinely collected or used ... prints for random crime-solving purposes" (Murphy 2013, p. 166), understate a common use of fingerprint databases. The most that can--and should--be said is that, historically, the primary motivation for amassing large arrestee fingerprint databases was not to trawl them for matches to latent prints from crime scenes. It was to ascertain whether an arrestee's prints already were in the database as a result of a previous arrest.

To that extent, arrestee DNA profiling differs from arrestee fingerprinting. DNA profiling always had criminal-intelligence gathering as its primary purpose. Although today that also is a major purpose of arrestee fingerprinting, it was not always so. This historical fact returns us a crucial question: Given that both types of biometric data now have the same two uses in the criminal justice system, and given the differences between fingerprints and DNA samples (as opposed to profiles), should the law treat them differently? The dissenting opinion offers no convincing answer.3/

  1. Justice Scalia is one of least likely Justices to accept Professor Murphy's theory that the mere "analysis of genetic code" is, as she has suggested elsewhere, a "constitutional moment" separate from the acquisition of the sample. Erin Murphy, Relative Doubt: Familial Searches of DNA Databases, 109 Mich. L. Rev. 291, 334 (2010). In King, Justice Scalia did not state or imply that "analysis of genetic code" is a search in itself, and his remarks at oral argument suggest that it is the physical aspect of DNA acquisition--the trespass on the person--that was dispositive for him. Cf. Ferguson v. City of Charleston, 532 U.S. 67, 92 (2001) (Scalia, J., dissenting and maintaining that “only one act ... could conceivably be regarded as a search ... the taking of the urine sample”) (cited in Murphy, supra).
  2. If the dissent's critique of the analogy to fingerprinting is simply that fingerprinting is less invasive than buccal swabbing, then the collection method in the text eliminates the argument. If the dissenting Justices believe what Professor Murphy implies--that the "analysis of genetic code" is itself a Fourth Amendment "search"--then they are trodding new ground without the benefit of any argument or analysis.
  3. The majority opinion reaches its conclusion rather summarily as well.
  • Brief for the States as Amici Curiae Supporting Petitioner, Maryland v. King, No. 12-302, Jan. 2, 2013.
  • Cal. Dep’t of Just., Office of the Attorney General, BFS DNA Frequently Asked Questions: Effects of the All Adult Arrestee Provision, http://oag.ca.gov/bfs/prop69/faqs, last accessed Nov. 27, 2013.
  • Paul Dawson et al., Residence Time and Food Contact Time Effects on Transfer of Salmonella Typhimurium from Tile, Wood and Carpet: Testing the Five-second Rule, 102 J. Applied Microbiology 945 (2007).
  • Philip Elmer-Dewitt, Computers: Taking a Byte Out of Crime, Police Hail Computer System that Cracked the Night Stalker Case, Time Mag., Oct. 14, 1985, available at http://content.time.com/time/magazine/article/0,9171,960128,00.html.
  • David H. Kaye, Maryland v. King: Per Se Unreasonableness, the Golden Rule, and the Future of DNA Databases, 127 Harv. L. Rev. Forum 39 (2013), available at http://ssrn.com/abstract=2340456.
  • Kenneth R. Moses, Automated Fingerprint Identification System (AFIS),in The Fingerprint Sourcebook ch. 6 (Alan McRoberts & Debbie McRoberts eds., 2011), available at http://www.ncjrs.gov/pdffiles1/nij/225326.pdf.
  • Erin Murphy, License, Registration, Cheek Swab: DNA Testing and the Divided Court, 127 Harv. L. Rev. 161 (2013).
  • Richard Saferstein, Criminalistics: An Introduction to Forensic Science (8th ed. 2004).
Previous Postings on the Opinions in Maryland v. King

Wednesday, November 27, 2013

Maryland v. King: When Being Smart and Witty Isn't Enough

Justice Scalia's dissenting opinion in Maryland v. King, the arrestee-DNA case, has been praised as "one of the best Fourth Amendments dissents, ever" and his "smartest, wittiest ruling of all time." [1] But one man's wit is another's vitriol, and the opinion, according to another law professor, is "dripping with contempt." [2]  Stylistically, this opinion is more evidence that the art of writing with courtesy as well as conviction has been lost.

Substantively, what makes this dissent "one of the best"--other than one's feelings about which result is correct? It cannot be that the opinion sets forth some enduring principle for understanding and applying the Fourth Amendment. The opinion is less concerned with what the Maryland police did than with why they did it. Thus, the opinion begins with the following seemingly bright-line rule: "[w]henever this Court has allowed a suspicionless search, it has insisted upon a justifying motive apart from the investigation of crime" and then single-mindedly devotes itself to demonstrating that the "primary purpose [of the search] was to detect evidence of ordinary criminal wrongdoing." Id. at 1981-82. In the process, it overlooks the real possibility that something about the type of search and evidence in question makes the putative rule inapposite.

I say "putative rule" because the law is not as clear as the opinion suggests. The Court allowed a suspicionless search of a person on parole in Samson v. California, 547 U.S. 843 (2006). Although the majority led by Justice Kennedy in King relied prominently on Samson, Justice Scalia made no effort to disavow or distinguish the case.

More fundamentally, the dissent's desire for a rule that prohibits all suspicionless searches that have as their "primary purpose" the production of evidence of a crime leads to an odd result. The police may not collect a DNA sample by painlessly swabbing the inside of a cheek if they intend to see whether it matches one on file from an unsolved murder, rape, or other crime; however, they can if they want the same DNA profile, first and foremost, to verify the name and look up any previously recorded criminal history of the same person. Other than reciting the supposedly absolute rule about motives, the dissent offers no justification for this difference. It does not contest the majority's claim that the nature of the invasion of personal security and privacy in King is minor, akin to photographing or fingerprinting an arrested person. (I will look at what the dissent had to say about photography and fingerprinting in a separate posting.)

In criminal procedure cases, Justice Scalia favors absolute rules that require little inquiry into competing values. In King, this mode of analysis allowed him to vehemently insist that the Fourth Amendment does not allow forcing a prisoner to provide a small DNA sample. Yet, he did not dissent from Justice Kennedy's opinion in Florence v. Chosen Board of Freeholders, 132 S. Ct. 1510 (2012), which upheld a police practice of forcing all prisoners in jails to disrobe, open their mouths, wiggle their tongues, and move their genitalia so that their jailers could inspect their nude bodies--all without the slightest suspicion that the individual is concealing evidence or contraband. [3] In that case, why did Justice Scalia express no doubt that "the proud men who wrote the charter of our liberties would have been so eager to open their mouths for royal inspection"? King, 133 S.Ct. 1958, 1989 (2013) (Scalia, J., dissenting).

To avoid misunderstanding, I hasten to add that I too find the dissenting opinion powerful, at least when it comes to showing that Maryland's primary legislative purpose in authorizing DNA collection and analysis before conviction was to investigate crimes other than ones for which the arrest is made. But that was hardly a blinding insight [4], and the opinion does not address the more basic question of why the intent to gather evidence should invalidate biometric data collection. Is it unconstitutional for police to collect fingerprints from arrestees with the sole intent to check them against a database of latent prints from unsolved crimes? With the database check as the "primary purpose"? Id. at 1981-82 emphasis added). It is nearly impossible to tell from this, Justice Scalia's "smartest ... ruling."

  1. Jeffrey Rosen, A Damning Dissent: Scalia's Smartest, Wittiest Ruling of All Time, New Republic, June 4, 2013, available at http://www.newrepublic.com/article/113375/supreme-court-dna-case-antonin-scalias-dissent-ages.
  2. Noah Feldman, Grumpy Old Scalia v. Those Pesky Kids, Bloomberg View, June 30, 2013, http://www.bloomberg.com/news/2013-06-30/grumpy-old-scalia-v-those-pesky-kids.html.
  3. Sherry F. Colb, The Road to Justice Scalia Is Paved With (Some) Intentions, Verdict, June 12, 2013, http://verdict.justia.com/2013/06/12/the-road-to-justice-scalia-is-paved-with-some-intentions#sthash.v1VilcDr.dpuf
  4. David H. Kaye, Who Needs Special Needs? On the Constitutionality of Collecting DNA and Other Biometric Data from Arrestees, 34 J. L., Med. & Ethics 188 (2006), available at http://ssrn.com/abstract=944359
Previous Postings on the Opinions in Maryland v. King

Wednesday, November 20, 2013

The Significance of Significance

Today’s issue of Nature includes a cautionary essay entitled “Twenty Tips for Interpreting Scientific Claims.”  The essay is written by two conservation biologists (William J. Sutherland and Mark Burgman) and one statistician (David Spiegelhalter)—all eminent in their fields. The essay lists “20 concepts that should be part of the education of civil servants, politicians, policy advisers and journalists—and anyone else who may have to interact with science or scientists.”

Increasing statistical and scientific literacy is a laudable goal, but it is not at all easy to achieve. The late David Freedman and I struggled to describe some 16 of the 20 concepts for judges in a reference manual for judges. By and large, our expositions are consistent with the short ones in Twenty Tips, but two tips seem less useful than others.

First, how many policy makers, journalists, or other consumers of scientific information need to be told that smaller samples tend to be less representative (assuming that everything else is the same)?    Twenty Tips seems to suggest that sample size usually should be on the order of "tens of thousands":
Bigger is usually better for sample size. The average taken from a large number of observations will usually be more informative than the average taken from a smaller number of observations. That is, as we accumulate evidence, our knowledge improves. This is especially important when studies are clouded by substantial amounts of natural variation and measurement error. Thus, the effectiveness of a drug treatment will vary naturally between subjects. Its average efficacy can be more reliably and accurately estimated from a trial with tens of thousands of participants than from one with hundreds.
Nobody can dispute the truth of the bolded heading if “better” means more likely to produce an estimate of a population parameter that is close to its true value. The problem I have seen with judges, however, is not that they do not appreciate that large samples usually are preferable to small ones when accuracy is the only criterion of what is “better.”  It is that they are overly impressed with the perceived need for very large samples when smaller ones would be quite satisfactory. They do not recognize that doubling the sample size rarely doubles the precision of an estimate. They think a fixed percentage of large population needs to be sampled to obtain a good estimate.

Although this reaction merely concerns the understandable incompleteness of a short tip, the second tip I will mention contains more of an invitation to misunderstanding or misinterpretation. According to Twenty Tips
Significance is significant. Expressed as P, statistical significance is a measure of how likely a result is to occur by chance. Thus P = 0.01 means there is a 1-in-100 probability that what looks like an effect of the treatment could have occurred randomly, and in truth there was no effect at all. Typically, scientists report results as significant when the P-value of the test is less than 0.05 (1 in 20).
The explanation of this call for “significant" results invites confusion. First “statistical significance” is not “expressed as P.” Rather a P-value is (arbitrarily) translated into a yes-no statement of “significance.” Second, “P = 0.01” does not mean “there is a 1-in-100 probability that . . . in truth there was no effect at all.” It means that if “in truth there was no effect at all,” differences denominated “significant” at the 0.01 level would be seen about 1 time in 100 in a large number of repeated experiments.

I am being picky, but that comes from being a lawyer who worries about the choice of words. The paragraph on the significance of significance certainly could be read more charitably, but I suspect that the policy-makers it is intended to educate easily could misunderstand it. Indeed, judicial opinions are replete with transpositions of the P-value into posterior probabilities, and Twenty Tips offers little immunity against this common mistake.


David H. Kaye & David A. Freedman, Reference Guide on Statistics, in Reference Manual on Scientific Evidence, National Academy Press, 3d ed., 2011, pp. 211-302; Federal Judicial Center, 2d ed., 2000, pp. 83-178; Federal Judicial Center, 1st ed., 1994, pp. 331-414

William J. Sutherland, David Spiegelhalter & Mark Burgman, Policy: Twenty Tips for Interpreting Scientific Claims, Nature, Nov. 20, 2013, http://www.nature.com/news/policy-twenty-tips-for-interpreting-scientific-claims-1.14183?WT.ec_id=NATURE-20131121

Monday, August 19, 2013

Ninth Circuit Upholds Indefinite Retention of DNA Samples: But Why Retain Them?

Having considered the legal basis and justifications for allowing convicted offenders to demand that the DNA samples taken from them be removed from the FBI's repository [1, 2, 3], we should look on the other side of the coin. How strong are the justifications for retaining the DNA samples in the first place?

According to the Kriesel III majority, the "primary justification" for keeping samples is that "match confirmation" ensures "the continued accuracy and integrity of the CODIS system." That is:
Upon receiving a CODIS Match Report, the ... FDDU [Federal DNA Database Unit] locates the retained blood sample ... , re-extracts junk DNA from it, and runs a new analysis. If the newly generated profile is the same as the one in CODIS that formed a Candidate Match, the match is confirmed and the accuracy of the match between the CODIS profile and the identified offender is ensured. The confirmation of the CODIS match is thus achieved by comparing the profile generated from the retained sample with the Codis profile. Although CODIS has not yet encountered such a “mismatch” or “misidentification” error, in the event that the generated profile did not match the CODIS profile, the lab would then determine what caused the error and, presumably, prevent similar errors from occurring in the future.
Let's pause to note the infelicities here. First, a cavil about phrasing. The government does not just extract "junk DNA" (a term that is best avoided). It extracts the entirety of the physical genome (all the DNA) but types only a small number of loci that do not reveal much about the individual's health or fitness. Second, a match to the DNA sample that is on file (as a spot on a card) does not "ensure" that the "identified offender" matches the crime-scene sample. If the name associated with the card is incorrect (Mr. Jones's DNA has been recorded as Mr. Smith's), the profile is not the offender's even though both DNA profiles match. [4]

Still, the retyping step does protect against recording a profile in the database that is not the one on the card; moreover, as Judge Schroeder observed, "[i]t also enables pre-arrest confirmation of a match."

The dissent's response is that
The government's argument is severely undercut by the fact that the CODIS database has never led to a false identification of a suspect ... . Moreover, if by some remote chance, there was an unprecedented error in the CODIS system, that error would be swiftly discovered when the CODIS-identified suspect had a new blood sample drawn and the new sample was compared with the DNA found at the crime scene, as is the regular practice. For this reason alone, the government's rationale is wholly theoretical at best and, to put it bluntly, is entirely without merit.
Entirely without merit? Surely a DNA database can create an ordeal for a suspect who is arrested even if later testing of an entirely new sample from him leads to his release. [5] Judge Reinhardt seems to maintain that a mistyped DNA sample or a misrecorded profile could not "infringe the liberty of the misidentified suspect" because it merely "pertains to not taking a blood sample from the wrong person as a result of a CODIS misidentification." But police do not simply ask suspects who come to their attention because of a database hit to mail in a DNA sample at their convenience. Often, they arrest these suspects. Although the jailed suspects may have a get-out-jail-free card in the form of their DNA, it can take time to play it. [6]

The stronger argument is that the indignity and injury that comes from being an active suspect in a criminal investigation and, quite possibly, being detained and interrogated is, as the dissent ultimately recognizes, "an unfortunate occurrence, but avoiding this wholly theoretical and comparatively minor infringement on the suspect's rights cannot, by any measure, justify the retention of the entirety of that individual's, and millions of others', private genetic information for the rest of their lives."

But even this formulation is misconceived. The pertinent balance is either (1) between a single individual's risk of being falsely arrested as opposed to the same individual's risk of having truly private areas of his genome examined by the government, or (2) between these same risks summed over everyone in the database. To balance between a single individual's risk of being falsely identified and everyone's risk of improper acquisition or disclosure of genetic data is to place a very heavy thumb on the scales.

The dissent also dismisses the government's argument that avoiding public disclosure of laboratory or clerical errors is a good reason to hang on to samples. This point is well taken. Why should the government fear that the public might discover the obvious--that some mistakes are possible when an organization maintains millions of records? CODIS's efficacy (and public confidence in it) would not suffer if preliminary false matches are rare and inevitably corrected by confirmatory testing of fresh samples, as the government claims.

Finally, the dissent suggests that the FBI's "quality assurance" strategy of "randomly re-testing 1% of samples that were received by the FBI laboratory in the previous six months" cannot justify retaining millions of samples for more than six months. Certainly, less drastic methods of quality assurance are available. For example, the FBI could keep only 1% of the samples for re-testing. But this procedure would make it more difficult to correct a large number of profiles if they turned out to be incorrect. Suppose that a discrepancy resulted from a problem that affected all the samples placed on a tray with many wells on a given day. (This may not be the current technology, but it illustrates the broader point.) It might be easier to retrieve these samples from storage and retype them than to locate and collect more DNA from the individuals who provided the affected samples. However, if errors are as improbable as the dissent maintains, this precaution would not be justified.

Interestingly, one traditional argument for sample retention is absent from both opinions (apparently because the government did not raise it). What if some of the old loci are retired and new ones are used as replacements? (Maybe the former will be found to have privacy-laden associations with phenotypes?) Sample retention obviates any possible need to obtain new samples--a task that could prove quite onerous given the size of modern databases. Then too, what if outer-directed database trawling [7] is undertaken? A recent report proposes that "familial searching" as implemented in California could classify crime-scene samples from not-so-close relatives as coming from first-degree ones, causing turmoil for the first-degree relatives and complicating matters for the police. [8] One solution (if this is a real problem) would be to analyze more loci in the samples to ascertain the relationship with greater confidence. [9, 10]

In sum, Judge Reinhardt's complaint that "tens if not hundreds of millions of dollars expended on maintaining a totally unnecessary and wholly pointless system of collecting and maintaining tens of millions of blood samples indefinitely in a national warehouse" seems overdrawn--but the justifications discussed in the Kriesel III opinions for indefinite and widespread sample retention also seem strained. The value--immediate and potential--of sample retention is more complex than the case suggests, but it is unlikely that the government would suffer greatly if it had to destroy samples from individuals like Kriesel, who have completed their sentences. Indeed, however one comes out on the issue of the individual's Fourth Amendment right to compel eventual destruction or disgorgement of databank samples, even initial sample retention may not be the best public policy. [11]


1. Ninth Circuit Upholds Indefinite Retention of DNA Samples: The Majority Opinion in Kriesel III, July 17, 2013, http://for-sci-law-now.blogspot.com/2013/07/ninth-circuit-upholds-indefinite.html

2. Ninth Circuit Upholds Indefinite Retention of DNA Samples: The Dissent’s Perception of the Loss of Privacy in Kriesel III, July 18, 2013, http://for-sci-law-now.blogspot.com/2013/07/ninth-circuit-upholds-indefinite_18.html

3. Ninth Circuit Upholds Indefinite Retention of DNA Samples: More Problems with Judge Reinhardt’s Dissenting Opinion, July 25, 2013

4. Police Finger Wrong Man after DNA Data Mix-up, Asahi Shimbun, Mar. 22, 2010, http://www.asahi.com/english/TKY201003210142.html

5. Linda Geddes, DNA Super-network Increases Risk of Mix-ups, New Scientist, Sept. 5, 2011, http://www.newscientist.com/article/mg21128285.500-dna-supernetwork-increases-risk-of-mixups.html#.UhJ-zH-wXdU

6. Jack Doyle, Innocent Man Spent Five Months in Prison After Forensics Mix-up Meant He Was Falsely Accused of Rape, Daily Mail, Oct. 1, 2012, http://www.dailymail.co.uk/news/article-2211365/Adam-Scott-Innocent-man-spent-FIVE-MONTHS-prison-forensics-mix-meant-falsely-accused-rape.html#ixzz2cSKkGvBI

7. David H. Kaye, The Genealogy Detectives: A Constitutional Analysis of “Familial Searching”, 51 Am. Crim. L. Rev. 109 (2013), available at http://ssrn.com/abstract=2043091

8. Rori V. Rohlfs, Erin Murphy, Yun S. Song, Montgomery Slatkin, The Influence of Relatives on the Efficiency and Error Rate of Familial Searching, 8 PLoS ONE e70495, http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0070495

9. Jianye Ge et al., Haplotype Block: A New Type of Forensic DNA Markers, 6 Forensic Sci. Int’l Genetics 322 (2011)

10. Chad Huff et al., Maximum-likelihood Estimation of Recent Shared Ancestry (ERSA), 21 Genome Research 768 (2011)

11. David H. Kaye, Behavioral Genetics Research and Criminal DNA Databanks, 69 Law & Contemporary Problems 259 (2006), available at http://ssrn.com/abstract=1411861