Saturday, December 29, 2018

Results of a Proficiency Test of Hair Examiners

Existing proficiency tests of forensic examiners who offer opinions on the origins of trace evidence are not designed to estimate the conditional probabilities for false negatives (exclusions) and false positives (inclusions). 1/ They rarely replicate the conditions of casework; indeed, proficiency exercises may be significantly easier than casework, and the examiners usually know they are being tested..

This is not to question the importance and value of basic proficiency testing. Even simple tests can inform managers of the capabilities of examiners and identify some analysts who could benefit from additional training. But it remains tempting to think that high scores on proficiency tests mean that examiners rarely make mistakes in practice. 2/ Conversely, it has been argued that poor scores on proficiency tests are danger signs for admitting or relying on expert opinions in court. 3/

To the extent that current proficiency test data are pertinent to accuracy in casework, a recent test by Forensic Testing Services may be of interest. 4/ FTS administered the test to hair examiners at an undisclosed number of laboratories, where 45 out of 52 examiners completed it. It investigated whether examiners could tell that a small number of "questioned" hairs, that FTS sampled from one individual and gave to the examiners, did not come from two other individuals (as indicated by their correspondence to reference samples that FTS produced from those individuals). The questioned sample, designated Item #3 (and said for test purposes to have been found clutched in the hand of a victim), was a set of five hears from the scalp of a 56-year-old white woman. Item #1 was a set of ten hairs from the scalp of a deceased 87-year-old white male. Item #2 was ten hairs from the scalp of a deceased 65-year-old white male.

The examiners were asked to determine whether item #3 -- the five "questioned" hairs -- are "consistent in microscopic characteristics to the known hair sample sets" #1 and #2. They also considered macroscopic characteristics such as color. A "Statement Regarding Test Design" explained that
Different hairs from the same body region of a person exhibit variation in microscopical characteristics and features. It is difficult to prepare a microscopical examination of hair proficiency test due to this natural variation. Our approach to this test is to provide several questioned hairs (known to be from the same individual) to compare as a group to a sample of known hair. Although the test is not realistic from the standpoint that most analysts would characterize each hair individually, this approach is designed to ensure consistency between distributed tests. [5/]

We realize that most examiners would prefer larger sample sets of known hairs. The use of smaller known sample sets was also intended to ensure consistency [among] distributed tests. [6/]
In essence, the examiners responded C (consistent), X (excluded), or I (inconclusive) as shown in the following table:

#3 from #1? #3 from #2?
C 6 0 ← false inclusions
X 37 41 ← true exclusions
I 2 4 ← no conclusion

Putting aside the inconclusives (which should not have any impact in court even though they are an important aspect of an examiner's proficiency at acquiring and evaluating information), the comparisons of #3 to #1 produced 6 / (6+37) = 14% false inclusions, and the comparison of #3 to #2 produced no false inclusions. Pooling these decisions, the examiner made (6+0) / (6+37 + 0+41) = 6/84 = 7.1% false inclusions.

Because the examiners had no opportunity to compare the five questioned hairs to a representative sample of the woman's hairs, the proficiency test yields no true inclusions. Consequently, it is not possible to estimate the probative value of the hair examiners' conclusions of consistency. To see this, suppose that a reference set of #3 hairs had been provided and that, disappointingly, the examiners found consistency only 7.1% of  the time in this situation. Then the examiners would be declaring consistency as often with different sources as with the same source! 7/ Findings of C would not help a judge or jury distinguish between sources and (these particular) nonsources of the questioned hairs.

Of course, it seems likely that examiners would achieve a higher proportion of true inclusions than a mere 7.1%. If the proportion were, say, 71%, judgments of C would offer ten times as much support to the hypothesis that the inclusion is correct than to the hypothesis that a nonsource is included. The highest probative value (point estimate) compatible with the reported data occurs when the examiners are perfect at responding to true sources with a judgment of C. In that case, the ratio would be 100%  / 7.1%  = 14.

But figures like 1, 10, and 14 are speculative. This proficiency test provides no data on sensitivity (to use the technical term for Pr(C | #3)), which is an essential component of probative value. 8/ Proficiency test manufacturers might want to consider adding a measure of sensitivity to their tests.

  1. See, e.g., Jonathan J. Koehler, Proficiency Tests to Estimate Error Rates in the Forensic Sciences, 12 Law, Probability & Risk 89 (2013).
  2. E.g., United States v. Crisp, 324 F.3d 261, 270 (4th Cir. 2003) (handwriting expert "had passed numerous proficiency tests, consistently receiving perfect scores") United States v. Otero, 849 F.Supp.2d 425 434 (D.N.J. 2012) (because "proficiency testing is indicative of a low error rate ... for false identifications made by trained examiners ...  this Daubert factor also weighs in favor of admitting the challenged expert testimony").
  3. E.g., Edward J. lmwinkelried, The Constitutionality of Introducing Evaluative Laboratory Reports Against Criminal Defendants, 30 Hastings L.J. 621,636 (1979); Randolph N. Jonakait, Forensic Science: The Need for Regulation, 4 Harv. J. L. & Tech. 109 (1991).
  4. Forensic Testing Services, 2018 Hair Comparison Proficiency Test FTS‐18‐HAIR1 Summary Report
  5. Because the examiners were told, in effect, that the questioned hairs all had the same source for a reason having nothing to do with their physical features, a report on the five questioned hairs individually or on the internal consistency of that set would not have tested their skill at distinguishing hairs on the basis of those features.
  6. According to R.A. Wickenheiser & D.G. Hepworth, Further Evaluation of Probabilities in Human Scalp Hair Comparisons, 35 J. Forensic Sci. 1323, 1329 (1990), "[m]acroscopic selection of 5-13 mutually dissimilar hairs was frequently unrepresentative of the microscopic range of features present in the known samples." The FTS summary does not state how the two sets of ten reference hairs were selected. Failing to capture the full range of variation in the head hairs of an individual would increase the chance of an exclusion. For this study, that could decrease the proportion of false inclusions and inflate the proportion of true inclusions.
  7. The likelihood ratio in the sample would be Pr(C | #3) / Pr(C | #1 or #2) = 7.1 / 7.1 = 1.
  8. See, e.g., David H. Kaye, Review-essay, Digging into the Foundations of Evidence Law, 116 Mich. L. Rev. 915 (2017).

Wednesday, December 26, 2018

"Our Worst Fears Have Been Realized" -- Forensic "Evidence, Science, and Reason in an Era of 'Post-truth' Politics" (Part 2)

This posting continues the previous summary, with some annotations in the form of footnotes, of a October 2017 panel discussion of forensic science. 1/ It does not include the audience question-and-answer period because that part of the recording, although posted for a time, is no longer available.

PROFESSOR CHARLES FRIED, who represented Merrell Dow Pharmaceuticals in Daubert v. Merrell Dow Pharmaceuticals, described his role as “easy." In the context of "civil trials ... having to do with whether a particular chemical, which was usually a therapeutic chemical, had ... a capacity to cause a particular untoward event," the issue "was studied regularly [thanks to] the Food and Drug Administration [which] usually required enormously rigorous, randomized, double-blind trials.” However, in contrast to the “easy domain ... of causation in areas where there were really quite regular methods for testing ... and institutions that did it, .... God help us when we get to fingerprints, bullet lead, bite marks, hair samples. So there is a real problem here, and I have no sympathy with the current Department of Justice.”

Coming from the man who was the Department’s Solicitor General during the Reagan administration, this sentiment is chilling. But it is mild in comparison to DR. ERIC LANDER’s suggestion that the Justice Department has yet to embrace the scientific revolution that began in the 15th or 16th century. In his view,
[W]hat Judge Edwards did [with the NAS committee that he co-chaired] was write a spectacular report that pointed out all the scientific problems. [T]he Department of Justice dismissed it because they said it was about how to make forensic science better. [I]f it wasn’t about admissibility, they didn't really care, because if you ... could talk to a jury, well, you didn’t have to make it better.

So PCAST took the next step. We wrote a report that was about Rule 702. [W]e really didn’t care about anything else. We weren’t writing about how to improve forensic evidence in general. ... We made a specific recommendation to [the] standing committee on the Federal Rules of Evidence that they revise the advisory note around Rule 702, essentially that 702 needs fixing. This morning, they met. ... I spent four and one-half hours with said committee that convened in response to the PCAST report. [2/] ... Ted Hunt was there, and we had a grand old time. So I’m still full of vim and vigor about this thing here. ...

[M]ost of these [feature-comparison] methods weren’t developed to be science. They were developed to be rough heuristics for an investigation. [T]he courts have accepted this kind of evidence despite the lack of any empirical testing. ...

Fingerprints. [In] 1984, the Department of Justice in an official document ... which it disavowed last year, said ... that fingerprints were infallible — papally infallible. [3/] In 2009, the former head of the FBI crime lab testified [that] the error rate was less than one in 11 million. Why? Because the FBI had done 11 million fingerprint cases and he was not aware of an error. ... This is true. It is cited in the PCAST report. [4/] Since the time of Judge Edwards’ 2009 report, the FBI, God bless them, did a real empirical test of fingerprints. And now we have a measurement of an error rate. [O]ne in 600 is their best guess. Could be, with error bars, as high as one in 300. That’s great. We now actually know that it ain’t perfect. It’s not terrible, and you can tell that to a jury. ...

Firearms. They did a whole bunch of fish-in-a-barrel tests. They gave you a bag of bullets. They gave you another bag of bullets. They said every bullet in here has a match in here. Figure out who matches. They make very few mistakes when they know that the right answer is there on the multiple-choice test. If the multiple-choice test includes “none of the above,” you might not do as well. ... They did multiple choices without “none of the above,” and they found an error rate of one in 5,000. Then in 2014, the Department of Defense commissioned a study, and they found, well, one in 50—kind of like one in 5,000—just a hundredfold less.

Hair analysis. They did an amazing study in 1974, which the Justice Department cited last year as the foundational proof of the validity of hair analysis in which they found the error rate was less than 1 in 40,000. That study involved giving people hairs and asking if they thought they matched, and by the way, telling the examiners each hair you’re considering comes from a different person. As a matter of fact, it’s shocking they made any errors at all. When the FBI actually used DNA analysis on hairs that had been said by examiners to match, they found one time in nine they got it wrong.

... Bite marks. The ... field said one in six trillion was the error rate. When you give them four choices of people, they still get it wrong one time in six in that closed set—a remarkable off by one in a trillionfold. ...

Footwear matches is declared in the seminal textbook in the field to have an error rate [of] about one in 683 billion. I can’t tell you how far off that is because there has never, ever been an empirical test of footwear because they know they can calculate that it must be that accurate.

So our radical position — and I say “radical” because Mr. Hunt this morning described PCAST’s position as radical [5/] — was that a forensic feature-comparison method can be considered reliable only if its accuracy has been empirically tested under conditions appropriate to its intended use and found to have accuracy appropriate to the intended use. That’s our radical position, which I think is about sort of the foundation of the scientific revolution — that empirical evidence is necessary. This would have been controversial in ancient Greece and other places, but in the last four hundred years, this hasn’t been so controversial.

But in the forensic community they doubt it. They argue other things can substitute for it. It’s enough if the method is based on science, like based on a true story. The examiners [maintain that] "[w]e haven’t got reliability data, but [we] have good professional practices, training, certification, accreditation programs, professional organizations, best practices manuals, extensive experience using it, and published papers in peer-reviewed journals." And PCAST noted in passing that the same is true about psychics. If you go online, all of those indicia apply to the field of psychics. There are peer reviewed journals for psychics, accreditation standards, etc. There's even a subdiscipline of forensic psychics, by the way. And so we said, those are all good. I don’t want you to not have those things, but they can never establish reliability.

So it’s flamingly obvious, but some people disagree. And of 20 speakers this morning, only three quibbled with the need for empiricism. They all were employed by the Department of Justice. They were Ted Hunt and two colleagues. I asked this question, yes-no, and I would say it broke down 17-3 on "Is empirical evidence actually necessary?" And 17 people are post the scientific revolution, and three are, well, the jury is out on the scientific revolution.

In any case, I’ll just add that the Department of Justice, as you might imagine, hated this report. They hated it. We ... reported to the President, and this was done at the request of the President. We then took it to the Department of Justice, as we do with all agencies, and let them know what we were thinking, and they had a fit. They had a fit because, they said, “Do you realize this could jeopardize existing cases and past convictions?” And they said, “Could you grant us grace, like three or four years to fix all this before we have to live by these rules. We ... concluded that as scientists, it was not within our purview to grant grace, that others might be able to do that. All we could do was speak the facts. And so we did. And they hated it, and they attempted very hard to kill the report. We did battle for about four months. The Justice Department sent over 300 comments, and we dutifully answered every one and made small changes in response to them. And at the end they still opposed the release of the report. And I will merely note that in the first inaugural President Obama said we will restore science to its rightful place. The White House was faced with a disagreement between its science advisors as to whether a report should be released and the Department of Justice. The White House called the Depart of Justice and said “You’re going to have to wrap your head around the idea this report’s coming out.” And it came out.

One of our recommendations, as I said was the federal Judicial Conference should take on this question of, Does Rule 702 need a change, either as to the rule or to the advisory note. There was a robust discussion. There was no agreement as to whether the rule itself should be changed and how—there was a broad range of ideas about that—or whether the advisory note should be changed. We’d recommended just change the advisory note, but we were told if you don’t change the rule, you can’t change the advisory note. So I suggested put a comma in somewhere and change the advisory note. And they agreed that would trigger it, that would be fine. And we’ll see where it goes. [S]cience isn’t rolling over yet. [I]n the end, science does win out, and we’re just going to have to be very, very stubborn.
JUDGE EDWARDS added that
You should all be wondering why the courts haven’t been able to step in and turn us in the right direction, since we’re about justice, supposedly. ... First of all, the people who are testifying ... often don’t know what they don’t know ... . We often have a defense counsel who was not up to the task. We have judges [who] don’t want to move away from precedent unless there’s compelling reason, and there are a lot of cases out there saying that these disciplines are acceptable. And what the judges have done is to accept that precedent and not even allow Daubert hearings in the criminal arena, which is really very sad..

The other thing ... is ... we don’t know how to quantify variability because they haven’t been studied. ... In most of these areas they have not done the studies to quantify the variability, the error rates, et cetera. And the judges get this. So when the judges are told—and there are some judges who are willing to listen carefully—are told you should at least limit the testimony of the expert so they don’t overstate and say “match!” ... [w]hat do you tell their expert they can say and not say? ... If you say to the expert, “Don’t overstate, don’t claim ‘matched,’ claim something less, the prosecutor is up in arms because ... if you show any uncertainty coming out of the mouth of your expert, you may not meet [the proof-beyond-a-reasonable-doubt] standard. So you have no support coming from the prosecution, and we don’t yet know ... what [to] tell the experts [about] the limits of [their] testimony. [W]e don’t have any good case law helping us. The Supreme Court has given us nothing. The Melendez[-Diaz] case was the best hope we had a number of years ago in 2009. [6/] They cited our report and said it was terrific—we need reform. And then nothing. And there’s been no other case, and that’s where I think we’re stuck. The judges are not moving because I think they don’t know how to limit the testimony of the experts in a way that would be effective and would achieve what we’re talking about.
PROFESSOR FRIED: Let me ask a question because not being a criminal lawyer, I find this puzzling. I am a constitutional lawyer, and ... you have got to prove guilt beyond a reasonable doubt, and there’s the Confrontation Clause — much misused by Justice Scalia, but here it could really do a job. All the judges would have to do — but you’re telling me they don’t do it, and they’re not doing their job, they’re acting unconstitutionally. [Suppose] you get one of these phony experts — and they are phony, some of them are. [I]n the civil area, they’re not only phony, but they’re crooks. I mean they are what [is] known as paid liars, but that’s a different thing. In the criminal area, in the prosecution, they are professional liars. [T]hey may not be paid; nevertheless, why are the defense lawyers not allowed to poke these holes under the Confrontation Clause and under cross-examination? It would fall apart in cross-examination, particularly if you had a contrary expert ... . Why doesn’t that happen? That would create reasonable doubt in an unreasonable number of cases. Why doesn’t that happen? You tell me, judge.

JUDGE EDWARDS: I’ve never understood the bitemark example. And I say this with great sadness .... The judges let it in. They’ve tried to do the cross-examination. It comes in, the judges let it in. If you can get someone who’s been identified as an expert, you’ve got the jury.

PROFESSOR FRIED: But what if you get an expert on the other side?

JUDGE EDWARDS: Here’s the problem. You don’t have scientists, serious scientists, like Eric, who have any interest in doing serious work in forensics.

To which DR. LANDER added that "[i]t's an unusual kind of science when the scientists work for one side. In criminal law, the scientists work almost exclusively for the prosecution." After elaborating, he concluded the panel's presentation with the following reaction to Professor Fried's question about why vigorous cross-examination and countervailing experts do not solve the problems of dubious science and overclaiming: 7/
What do you do when you don't really know what your accuracy is? You don't have a method. End of story, which means it's not admissible. If ... I have a scientific method that measured something [but] I have no clue how accurate it is, it's not a method. It doesn't come in. It doesn't go to weight. It goes to admissibility.
  1. "Our Worst Fears Have Been Realized" — Forensic "Evidence, Science, and Reason in an Era of 'Post-truth' Politics" (Part 1). Nov. 20, 2017, .
  2. The committee's regularly scheduled meeting took place during the afternoon, while Dr. Lander was speaking at Harvard. The committee spent the morning at Boston College listening to short presentations from many invited speakers — among whom Dr. Lander was prominent. The transcript of the addresses and discussion — including back-and-forth between Dr. Lander and a few Justice Department employees — is reproduced in the Fordham Law Review, along with papers supplied by a few of the speakers. Symposium on Forensic Expert Testimony, Daubert, and Rule 702, 86 Ford. L. Rev. 1463 (2018).
  3. The body of the PCAST report does not provide the name, date, or author(s) of the "official document" declaring papal infallibility. Note 97 on page 45 refers only to the defunct URL However, a separate list of references for fingerprinting includes the publication "Federal Bureau of Investigation. The Science of Fingerprints. U.S. Government Printing Office. (1984): p. iv." This booklet seems to be referring to a full set of fingerprints as a token of individual identity. It states at iv that
    Of all the methods of identification, fingerprinting alone has proved to be both infallible and feasible. Its superiority over the older methods, such as branding, tattooing, distinctive clothing, photography and body measurements (Bertillion system), has been demonstrated time after time. While many cases of mistaken identification have occurred through the use of these older systems, to date the fingerprints of no two individuals have been found to be identical.
  4. The witness in the case was not "the former head of the FBI crime lab." But he was the head of the FBI's latent fingerprint unit.
  5. The transcript of the advisory committee's symposium at Boston College does not reflect any use of the word "radical" by Ted Hunt. But he did take issue with the insistence in the PCAST report that for highly subjective feature-comparison methods,
    The sole way to establish foundational validity is through multiple independent black box studies that measure how often examiners reach accurate conclusions across many feature-comparison problems involving samples representative of the intended use. In the absence of such studies, the feature comparison method cannot be considered scientifically valid.
    The Department of Justice, he explained, regarded as "wrong and ill advised ... PCAST’s novel premise that the set of criteria that comprise its nonseverable six-part test collectively constitute the exclusive means by which scientific validity of a feature-comparison method can be established." Symposium on Forensic Expert Testimony, Daubert, and Rule 702, 86 Ford. L. Rev. 1463, 1520 (2018). The Department's position is that "mainstream scientific thought" looks to "all available information, evidence, and data." The real issue, of course, is what to do when "all available information" includes almost no well designed studies of the accuracy and reliability of subjective measurements and opinions from them.
  6. Justice Scalia's opinion for the Court in Melendez-Diaz v. Massachusetts, 557 U.S. 305 (2009), devoted but a single sentence (shown in italics) to the NRC report:
    Nor is it evident that what respondent calls "neutral scientific testing" is as neutral or as reliable as respondent suggests. Forensic evidence is not uniquely immune from the risk of manipulation. According to a recent study conducted under the auspices of the National Academy of Sciences, "[t]he majority of [laboratories producing forensic evidence] are administered by law enforcement agencies, such as police departments, where the laboratory administrator reports to the head of the agency." National Research Council of the National Academies, Strengthening Forensic Science in the United States: A Path Forward 6-1 (Prepublication Copy Feb. 2009) (hereinafter National Academy Report). And "[b]ecause forensic scientists often are driven in their work by a need to answer a particular question related to the issues of a particular case, they sometimes face pressure to sacrifice appropriate methodology for the sake of expediency." Id., at S-17. A forensic analyst responding to a request from a law enforcement official may feel pressure — or have an incentive — to alter the evidence in a manner favorable to the prosecution.
  7. The availability of cross-examination is part of the Justice Department's argument for leaving Rule 702 and the committee note alone. Andrew Goldsmith argued that
    PCAST and the changes predicated on PCAST’s suggestions ignore the basic nature of the criminal justice system. It ignores, as Justice Harry Blackmun wrote in Daubert, both the capabilities of the jury and of the adversary system generally. Vigorous cross-examination, presentation of contrary evidence, and careful instruction on the burden of proof are the traditional and appropriate means of attacking shaky but admissible evidence.
    Symposium on Forensic Expert Testimony, Daubert, and Rule 702, 86 Ford. L. Rev. 1463, 1527 (2018). The last sentence comes verbatim from Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579, 596 (1993). However, the Department is ignoring the rest of Justice Blackmun's paragraph, which concludes with these words: "These conventional devices, rather than wholesale exclusion under an uncompromising 'general acceptance' test, are the appropriate safeguards where the basis of scientific testimony meets the standards of Rule 702." PCAST's argument is that the testimony does not meet the standards of Rule 702 unless the highly subjective and largely standardless assessments of skill-and-experience based "scientific" experts are adequately tested. The real issue is what adequate testing requires in this context.

Monday, December 24, 2018

Mississippi Court of Appeals Sees No Problem with the Usual Bullet-mark Testimony

In an opinion devoid of serious analysis, the Mississippi Court of Appeals ruled that the trial court properly admitted a firearms examiner's testimony that a shell casing found at the scene of a murder was fired from a stolen pistol 9-millimeter pistol found in the defendant's girlfriend's car. Despite the effort by the President's Council of Advisors on Science and Technology (PCAST) to read into the (federal) rules of evidence a requirement that the examiner supply an estimate of the false-positive probability for this type of toolmark matching, the court was satisfied with the witness's adoption of the prosecutor's characterization of the identification of the murder weapon to "a reasonable degree of scientific certainty." The opinion in Willie v. State, No. 2016-KA-01416-COA, 2018 WL 5810067 (Miss. Ct. App. Nov. 6, 2018), is as yet unpublished. Excerpts follow.
On appeal, Willie argues that the trial court erred in qualifying [Bryan] McIntire as an expert witness and in allowing him to testify that the casing found at the murder scene matched the gun recovered in Willie's possession. He claims McIntire's testimony was conclusory, and the scientific methods used by the expert were “questionable,” particularly noting McIntire's failure to provide a margin-of-error rate regarding the science of firearm identification and to take any photographs. ...

Willie ... claims the validity of McIntire's testimony has been called into question by “recent developments in the scientific community,” citing a 2008 NAS report ... , a 2009 NAS report of forensic science, as well as a 2016 report from the President's Council of Advisors on Science and Technology. The 2008 report cautioned:
Conclusions drawn in firearms identification should not be made to imply the presence of a firm statistical basis when none has been demonstrated. Specifically, ... examiners tend to cast their assessments in bold absolutes, commonly asserting that a match can be made to the exclusion of all other firearms in the world.” Such comments cloak an inherently subjective assessment of a match with extreme probability statement that has no firm grounding and unrealistically implies an error rate of zero.
The 2016 report stated:
Whether firearms analysis should be deemed admissible based on current evidence is a decision that belongs to the courts. If firearms analysis is allowed in courts, the scientific criteria for validity as applied should be understood to require clearly reporting the error rates seen in one appropriately designed black-box study. Claims of higher accuracy are not scientifically justified. ...
During direct examination of McIntire, the State asked:
Q. Okay. And what determination did you reach, to a reasonable degree of scientific certainty, as to the class characteristics of the particular gun as it relates to the casing?
A. The class characteristics that Mr. Williams described, they are the same. The firearm in State's Exhibit 7A is a 9-millimeter Ruger caliber, and the cartridge case that's in State's Exhibit 8 is a 9-millimeter Ruger caliber cartridge casing.
Q. And to a reasonable degree of scientific certainty, did you reach a conclusion as to the individual characteristics as it relates to the casing in State's Exhibit 8, as well as the gun in State's Exhibit 7A?
A. Yes, I did.
Q. And what was that conclusion, to a reasonable degree of scientific certainty, that you reached?
A. That the cartridge case that's in State's Exhibit 8 was fired in the gun that's in State's Exhibit 7A. ....
... When asked by defense counsel about the margin of error, McIntire replied: “I understand what you're talking about with ‘margin of error,’ but we do not have a reporting procedure for a margin of error.” But McIntire also clarified that he was not saying “there was not a margin of error in the field.”

We do not find McIntire's failure to cite any margin of error warrants reversible error as asserted by Willie. The United States Supreme Court noted in Daubert [that] "The inquiry envisioned by Rule 702 is, we emphasize, a flexible one." ... [C]ourts have ... upheld an examiner's determination that a bullet or casing came from the defendant's gun to within a “reasonable degree of scientific certainty.” ... United States v. Otero, 849 F.Supp.2d 425, 435 (D.N.J. 2012), ... United States v. Ashburn, 88 F.Supp.3d 239, 247 (E.D.N.Y. 2015) ... People v. Rodriguez, 413 Ill.Dec. 996, 79 N.E.3d 345, 356-57 (Ill. Ct. App. 2017)
The Louisiana Court of Appeals recently considered this issue in State v. Lee, 217 So.3d 1266 (La. Ct. App. 2017). Like McIntire, the expert witness had years of experience, was certified by the AFTE, and had routinely passed proficiency tests. Id. at 1273. The expert witness also testified that he “was not aware of any error rate with respect to the type of testing he performed.” Id. at 1274. Addressing Daubert and Rule 702, the appellate court found the testimony offered by the expert to be “relevant” and “reliable,” stating:
Based on the foregoing, it cannot be said that the jurisprudence supports Defendant's assertion that the scientific community has rejected the methodology and theory of firearms identification. To the contrary, even after publication of the [2009] NAS Report, courts have addressed, in detail, the reliability of such testimony and ruled it admissible, although to varying degrees of specificity.
Id. at 1275-78. We conclude that the trial court did not err in qualifying McIntire as an expert and allowing his testimony.
Despite the outcomes in Willie and Lee and arguments about PCAST's construction of Rule 702 notwithstanding, firearms examiners would be well advised to have better responses to questions about error rates in their field than "not aware of any" and "we do not have a reporting procedure."

Friday, December 21, 2018

"But most of all, we have the bite mark"

Two days ago, Texas's highest court for criminal appeals granted habeas corpus relief and issued opinions in Ex parte Chaney, No. WR-84,091-01 (Tex. Crim. App. Dec. 10, 2018). The case was easy to decide, as the state conceded that the trial involved egregious misconduct and misinformation. The opinions address the meaning and standards for findings of "actual innocence," violations of the prosecution's duty to disclose exculpatory evidence (scientific and otherwise), the use of false evidence, and "new scientific evidence [that] contradicts bitemark-comparison evidence relied on by the State at trial." Excerpts from Judge Barbara Hervey's opinion for the court about the bitemark evidence follow. (Footnotes and most citations are omitted.)

On June 20, 1987, ... the bodies of John and Sally Sweek [were found] in their apartment. Their throats were slashed, and they had been stabbed multiple times. Police also found what they believed to be a human bitemark on John's left forearm. There were no eyewitnesses to the offense. ...

The final piece of the State's case was testimony from two forensic odontologists that a mark found on John's left forearm was a human bitemark made by Chaney at the time of the murders. ... Doctor James Hales said that there was only a "[o]ne to a million" chance that someone other than Chaney bit John because the mark was a "perfect match" with "no discrepancies" and "no inconsistencies." He claimed that the "one to a million" statistic was found in "the literature." He also testified that the injury was inflicted at the time of the murders. ... Doctor Homer Campbell, testified that the mark was actually at least four separate human bitemarks and that, after comparing dentition models and examining photographs, he was certain to a "reasonable degree of dental certainty" that Chaney was the one who bit John. The bitemark evidence was the State's strongest evidence according to its own closing arguments. ...

The defense called two witnesses to testify about the mark on John's left forearm. ... Linda Norton, testified that the mark was a human bitemark but that it was "virtually unsuitable for making a good dental comparison . . . ." because "almost anyone who has relatively even top and bottom teeth is going to be capable of leaving this bite mark." According to her, she would not have submitted the mark for comparison. ... Doctor John McDowell, a forensic odontologist, agreed with the State's experts that the mark on John's left forearm was a human bitemark, and he agreed that the photographs from Weiner's office were of good quality, but his comparisons were nonetheless inconclusive. ...

The rest of the defense's closing argument focused on discrediting the State's bitemark evidence. The defense argued that the bitemark should have been better preserved and that better equipment should have been used to examine the mark. It also asserted that bitemark comparisons are merely "interpretative," pointing to the conflicting testimony about whether the injury was even a bitemark. ...

The State spent almost all its second summation discussing the bitemark evidence. The prosecutor emphasized Hales's testimony that "only one in a million could have possibly made that bite mark" before asking the jury "[w]hat more do you need?" He then cited Campbell's testimony that he was sure, to a "reasonable degree of dental certainty," that Chaney bit John. The State also tried to discredit Norton, one of the defense experts, as a charlatan somewhere between "Quincy and Matt Dillon" and painted McDowell's testimony ... as helpful to the State even though he testified that his comparisons were inconclusive. The prosecutor concluded by arguing that the bitemark evidence was "better than eyewitness testimony. [Eyewitnesses] can make mistakes, as [defense counsel] said" and that,
But, most of all, we have the bite mark. I wouldn't ask you to convict just based on the testimony of the tennis shoes, of the statements Chaney made to Westphalen, or the statements he made to Curtis Hilton. But, by golly, I'm going to ask you to convict on that dental testimony.
... According to Chaney, "while much of th[e] [trial] testimony appeared to be in accord with the state of scientific knowledge in 1987 about what could and could not be concluded from a bite mark, in the intervening decades since [his] conviction, the ground on which Drs. Hales and Campbell based their assertions [about bitemark comparisons] has given way entirely." He contends that the "[s]cientific understanding about whether it is possible to 'match' a particular person to a bite mark in skin and whether random match probabilities can be given for a bite mark has now reversed course." He also argues that he is entitled to relief because Hales has changed his trial opinion that the bitemark was inflicted at the time of John's death, which was an opinion upon which the State heavily relied. Hales now believes that the wound was two to three days old when John and Sally were killed. ...

In 2013, the legislature enacted Article 11.073 of the Texas Code of Criminal Procedure, which allows a defendant to obtain post-conviction relief based on a change in science relied on by the State at trial. ...

In its agreed findings of facts and conclusions of law, the habeas court found that
no scientific evidence has been produced to support the basis of individualization of a bite mark to the exclusion of all other potential sources in an open population. [T]he reference manual published by the American Board of Forensic Odontology (ABFO) [in March 2015] ... prohibits ABFO Diplomates from testifying to individualization of bite marks in an open population, i.e., where the universe of potential suspects, or "biters," is unknown. ... Dr. Hales's use of the terms ["]match["] and ["]biter["] as it related to [Chaney] was appropriate under the ABFO guidelines and scientific field of forensic odontology at the time of trial. ... Dr. Hales's and Dr. Homer Campbell's testimony that it was their opinion, to a reasonable degree of dental certainty, that [Chaney] made the bite mark on John Sweek's arm was appropriate under the ABFO guidelines and scientific field of forensic odontology at that time. However, ... such testimony would not be justified, admissible, or accurate under today's guidelines because the scientific community and the ABFO guidelines have invalidated individualization of bite marks in an open population, as we have in this case. ... [T]he changes in science and the evolution of the field of forensic odontology as it relates to bite mark comparisons constitutes relevant scientific evidence that was not available to be offered by [Chaney] at the time of trial in 1987. As such, ... the current relevant scientific evidence related to bite marks was not available at the time of [Chaney]'s trial because the evidence was not ascertainable through the exercise of reasonable diligence by [Chaney] before the date of or during trial. ...  [H]ad the bite mark evidence been presented at trial under current scientific standards, on the preponderance of the evidence [Chaney] would not have been convicted. ... [T]he [ABFO] Manual was updated again in [March 2016]. The current Manual prohibits individualization testimony entirely, regardless of whether the population at issue is open or closed. Under the current Manual, the only permissible conclusions for ABFO Diplomates are: "Excluded as Having Made the Bitemark"; "Not Excluded as Having Made the Bitemark"; or "Inconclusive." ...
The record reasonably supports the findings of the habeas court, so we adopt those findings. [N]ot only has the body of scientific knowledge underlying the field of bitemark comparisons evolved in a way that discredits almost all the probabilistic bitemark evidence at trial, but also ... Hales's new opinion [is] that the bitemark was inflicted days before the murders based on his new scientific knowledge that was not available at Chaney's trial. ...

To support his "change in the body of the science" arguments, Chaney cites (1) excerpts from the 2009 National Academy of Science: "Strengthening Forensic Science in the United States: A Path Forward" (NAS Report), (2) an affidavit from Drs. Mary Bush, DDS, and Peter Bush; (3) an affidavit from Hales, who testified at trial; (4) a supplemental affidavit from Hales; (5) an odontology report written by Dr. Alastair Pretty; (6) a supplemental odontology report written by Pretty; (7) an affidavit from Pretty, (8) an affidavit from Drs. Cynthia Brzozowski, James Wood, and Anthony Cardoza (Brzozowski et al.); (9) a supplemental affidavit from Brzozowski, et al.; (10) an affidavit of Dr. Michael Baden, M.D.; and (11) an amicus curiae brief filed in the California Supreme Court, which was authored by 38 "scientists, statisticians, and law-and-science scholars and practitioners."

In response to Chaney's writ application, the State "acknowledges and concedes that the science behind forensic odontology, as it relates to bite mark comparison, has considerably evolved since the time of trial in 1987" and that "[u]nder today's scientific standards, Dr. Hales relayed that he 'would not, and could not' testify as he did at trial, nor could he testify that there was a 'one to a million' chance that anyone other than [Chaney] was the source of the bite mark." The State succinctly summarizes its position, when it states that "the bitemark evidence, which once appeared proof positive of . . . Chaney's guilt, no longer proves anything."

The dual principles underlying Hales's and Campbell's opinions were that a human dentition, like a fingerprint, is unique and that human skin is a medium capable of recording a person's biting surface with sufficient fidelity that a particular individual can be identified as the source of a particular bitemark. If either of those premises are invalid, then the comparisons by Hales and Campbell claiming that Chaney was a "match" have no probative value because they are based on principles now known to be unsupported by science. According to Chaney (and his experts), although those two assumptions were accepted by the scientific community at the time of Chaney's trial, that community now rejects them. He argues that experts in the field have developed a new body of science, mainly in response to the NAS Report. That report also asserted that those principles were unproven and unreliable. That report concluded that:
(1) The uniqueness of the human dentition has not been scientifically established.
(2) The ability of the dentition, if unique, to transfer a unique pattern to human skin and the ability of the skin to maintain that uniqueness has not been scientifically established.
i. The ability to analyze and interpret the scope of extent of distortion of bite mark patterns on human skin has not been demonstrated.
ii. The effect of distortion of different comparison techniques is not fully understood and therefore had not been quantified.
(3) A standard for the type, quality, and number of individual characteristics required to indicate that a bite mark has reached a threshold evidentiary value has not been established.
NAS Report at 175-76. It also stated that "bite marks on the skin will change over time and can be distorted by the elasticity of the skin, the unevenness of the surface bite, and swelling and healing. These features may severely limit the validity of forensic odontology." Id. at 174.
The Bushes undertook a number of peer-reviewed studies to test the assumptions underlying Hales's and Campbell's testimony. The first group of studies tried to replicate the Rawson Study's conclusion—the literature relied on by Hales and Campbell at trial—that each human dentition is unique. The Rawson Study "examined tooth positions within dentitions and concluded that the very large number of possible positions meant that the human dentition is unique 'beyond any reasonable doubt.'" However, that study was based on two unproven assumptions, according to the Bushes. The first was that there was "no correlation of tooth position (i.e., that the position of one tooth did not affect the position of any other)," and second was that "there was a uniform or equal distribution over all possible tooth positions (i.e., that tooth locations did not gather into common patterns)." Using Rawson's methods, the Bushes plotted "landmark points on two sets of dentitions, resulting in x, y, and angle coordinates for each tooth." They then looked for matches one, two, three, four, five, and six teeth at a time. They ran two thousand simulated tests to verify their results and to determine whether the Rawson Study's results would remain accurate "if its assumptions about the lack of correlation and non-uniformity of dental arrangement were ignored." Their results were contrary to those of the Rawson Study—the Bushes observed "significant correlations and non-uniform distributions of tooth positions in [their] data sets." In other words, they found that the human dentition is not unique.

In a second series of peer-reviewed studies, the Bushes devised another way to test the unique-dentition theory. They studied "dental shape in large populations using geometric morphometric analysis and mathematical modeling methods common in other scientific disciplines." They found that dental shape matches occurred in the populations that they studied, which was consistent with the results of their earlier studies and indicated that the human dentition is not unique.

The Bushes also tried to replicate the Rawson Study's conclusion that human skin can record the characteristics of a bitemark with sufficient resolution to trace the source of the bitemark to the "biter." The Bushes "began with a series of studies that used the same dentition impressed into cadavers to explore how skin might distort any marks." For example, they "examined how anisotropy might create distortion by examining bitemarks made both parallel and perpendicular to skin's tension lines (also known as Langer lines)." They also looked at the effect of tissue movement and found that "the same dentition did not produce identical marks across these conditions." Id. They found that some marks made by the same dentition were "dramatically distorted from others," and that "bitemarks created by the same dentition on the same individual appeared substantially different depending on the angle and movement of the body and whether the mark was made parallel or perpendicular to tension or Langer lines." A number of experts (and the NAS Report) agree that the human dentition is not unique and that, even if it was, skin is an inadequate medium to "match" a bitemark to a "biter."

In addition to those studies, Chaney also directs us to Hales's affidavits about his own testimony and the evolving standards of the ABFO. In his first affidavit, Hales explains that his testimony about "biters" and "matches" was acceptable at the time of trial under ABFO guidelines; however, the scientific body of knowledge about bitemark comparisons has changed since trial and that, under current guidelines, he would not, and could not, give the same opinions that he did at Chaney's 1987 trial. The habeas court reached the same conclusion, noting that the 2016 ABFO Manual has completely invalidated any population statistics, regardless of whether the population is open or closed, and that the Manual no longer allows examiners to give opinions to a "reasonable degree of dental certainty."

[W]e agree with Chaney. The body of scientific knowledge underlying the field of bitemark comparisons has evolved since his trial in a way that contradicts the scientific evidence relied on by the State at trial. New peer-reviewed studies discredit nearly all the testimony given by Hales and Campbell about the mark on John's left forearm and Chaney being a "match." The revised ABFO standards and affidavits attached to Chaney's writ application support that conclusion. ...

In his next complaint, Chaney argues that Hales's testimony that there was only a "one to a million" chance that someone other than Chaney was the source of the injury was false according to the literature at the time. He also argues that Weiner's and Hales's testimony that the bitemark was inflicted at the time of the murders was false and misleading and that the described testimony was material to his conviction. The habeas court agreed with Chaney, and we adopt the findings of fact and conclusions of law of the habeas court because they are supported by the record. ...

Even though the scientific principles at the time of Chaney's trial supported some level of individualization (although those principles are no longer credible), Hales confesses that he knew at the time of trial that the body of science did not support his "one to a million" testimony. Other record evidence supports Hales's assertions, including newly discovered notes from Hales's trial file, where he initially wrote that the odds were "thousands to one," with a nearby notation of "100,000 to 1," the Bushes's peer-reviewed studies disproving the tenets and conclusions of the Rawson Study dealing with population statistics, and the 2016 ABFO Manual forbidding the use of all population statistics. ...

Acknowledgments: Thanks to Ed Imwinkelried for calling the opinions to my attention.

Friday, December 14, 2018

Reprising the Idea of a Population-wide DNA Identification Database

Talk of creating a US population-wide DNA database to identify the sources of DNA found at crime scenes began in the last century. In 1997, former Attorney General Janet Reno appointed a National Commission on the Future of DNA Evidence. 1/ At the first meeting of the Commission's legal issues working group, Commissioner Philip Reilly urged the group to study the issue. 2/ The working group submitted a report that included a discussion of "more inclusive databases" than those limited to convicted offenders. This report noted that under the law as it stood, medical research and other databases and tissue repositories could be subject to law enforcement inspection. Although it recognized that "this country would hesitate before demanding its citizens to surrender their DNA to a massive, centralized databank," the report concluded that "there is a strong case" for a national, population-wide database with rigorous privacy protections. 3/ The section of the report on "comprehensive databanking" provided the following argument (footnotes are omitted) in favor of a population-wide database:
     ...  First, the deterrent effect of DNA databanking is greatest for a population-wide database. Convicted-offender databases can deter only those offenders who have been caught and convicted for previous crimes. By increasing the probability of detection of first-time and repeat offenders alike, a comprehensive database can do much more to reduce the rate of certain crimes. And, making apprehension more certain permits the same level of deterrence with less Draconian (and costly) periods of imprisonment.
     Second, a comprehensive database avoids many problems or issues associated with offender or arrestee databanking. It obviates the need to draw some line between those offenses for which databanking is permitted and those for which it is not. It avoids any risk that police will make pretextual arrests merely to secure DNA samples. It makes it unnecessary to infer physical traits or racial or ethnic identity from trace evidence samples. Perhaps most important, it avoids stigmatizing any person or group. A comprehensive database imposes the same obligation on all racial and ethnic groups. There is a widespread perception that minorities are overrepresented in the criminal justice system in part because they are wrongfully arrested and convicted to a greater degree than whites. A universal database would help prevent wrongful convictions and arrests of minorities. When an eyewitness mistakenly concludes that the criminal was a minority member, a wrongful arrest (and conviction) can ensue. A comprehensive database would increase the probability that a minority citizen mistakenly arrested for a crime would be promptly exonerated. A readily accessible population-wide database thus would aid in preventing such arrests and subsequent miscarriages of justice.
     Third, a single national database would be more efficient than a system of over 50 separate databases of offenders or suspected offenders. From this perspective, the current system of multiple, overlapping databases represents unnecessary duplication and a waste of scarce resources. For all these reasons, a single, secure, national DNA identification database is attractive.
The report continued with a discussion of feasibility, constitutionality, and impact on personal privacy. The working group's chairman and reporter went on to publish an expanded analysis in a law review article 5/ and a book chapter. 6/ A condensed version appeared in an ABA journal 7/ and in an op-ed in USA Today. 8/ Again, the authors 9/ argued that such a database had several attractive features and speculated on the economy and constitutionality of adding identifying STR profiles to the panel of disease markers used in neonatal screening and sending only the STR data to a national law enforcement database (with no samples ending up in the hands of law enforcement).

In the early 2000s, other lawyers, scientists and politicians -- in Australia 10/ and England 11/ as well as America 12/ -- unequivocally advocated population-wide databases. This commentary appeared in leading newspapers and journals. A major theme (besides the obvious desire to maximize the crime-solving potential of DNA evidence) was that the existing, decentralized regime lacks adequate privacy protections and discriminates against those individuals with whom police have the most contact.

The issue of expanding DNA databases to include arrestee profiles, which was a more immediate topic at National Commission meetings, re-entered the national spotlight after the Supreme Court granted a writ of certiorari in Maryland v. King, 569 U.S. 435 (2013), to review the constitutionality of DNA sampling and profiling before conviction. Arrestee sampling prompted additional mention of a population-wide DNA database. 13/ In the same period, prominent successes with "familial searching" of convicted offender databases 14/ also led to comparisons to a more universal database. 15/

Most recently, the success of kinship searches in an open-to-the-public genealogy database has inspired yet another reprise of the idea. Writing in Science last month, four scholars at Vanderbilt University revived the argument that "if correctly implemented, a universal database would likely be more productive and less discriminatory than our current system, without compromising as much privacy." 16/ Given that Science affords very little space to its Policy Forum articles, it is not surprising that the legal analysis and references to previous writing in Is It Time for a Universal Forensic Database? are minimal, but some of the claims about the law cry out for more extended analysis.

To begin with, Is It Time? contends that "a subpoena is all that law enforcement needs to force those [direct-to-consumer] companies [such as 23andMe and] to determine whether they have a match with crime scene data." But a subpoena duces tecum normally applies only to existing documents. The recreational genetics companies do not have a database of the STR profiles that law enforcement laboratories now produce. The power to subpoena information may not include the authority to force the companies to produce a new database of STR profiles for the benefit of law enforcement. In other words, a subpoena demanding all names of likely relatives of "John Doe, with the following STRs ..." could be met with the response that "we do not have any STR profiles in our records."

Presumably, Is It Time? contemplates crime laboratories' generating their own genome-wide SNP-array data on crime-scene samples. Then a subpoena could ask for the names of all customers with large haploblocks in common with a crime-scene sample. After the Supreme Court's decision on extended cellphone tracking in Carpenter v. United States, however, it is fair to ask whether a subpoena as opposed to a search warrant based on probable cause must be honored. After all, if the subpoena leads to an individual whose genome-wide-array data are in the commercial database, the police will have acquired information on that individual that is far more threatening to personal privacy than is the more strictly identifying information that comes from an STR-profile match.

Second, Is It Time? assumes that a subpoena is all it takes to acquire medical records from a health care provider, so that a population-wide database would enhance overall privacy by reducing the incentive police have for accessing medical records through subpoenas. Before Carpenter, this was a plausible assumption. 17/ After Carpenter, it is easier to argue that these records are the kind of information that cannot be compelled without a judicial warrant based on probable cause.

Third, Is It Time? assumes the constitutionality of a population-wide database with profiles retained for a great many years (perhaps starting from birth) for no other reason than the value of the database for criminal or missing-person investigations. Whether this premise is true is far from obvious. 18/ There is an argument for the constitutionality of such a system, 19/ but it has not been tested in court. It is not to be found in Maryland v. King, 569 U.S. 435 (2013), or other DNA database cases. 20/

Is It Time? goes beyond the previous proposals in some ways. For one, the DNA information that it proposes for the "universal database" is more extensive than the profiles of existing law-enforcement databases. The article suggests that
Profiles would consist of a few dozen short-tandem repeats, with perhaps a modest expansion of the 20 CODIS loci currently used to improve the identification of degraded samples or the addition of a limited subset of “forensic” single-nucleotide polymorphisms to enhance the identification of more distant relatives in the rare instances in which familial searches were still needed. 21/
The "limited subset" of SNPs for detecting distant relatives is somewhat mysterious. Haploblock matching for inferring distant relationships uses hundreds of thousands of SNPs. Maybe there is a way to get to distant relatives with a combination of a few dozen STRs and a small number of SNPs, but the article cited in Is It Time? does not directly address this possibility. Rather, it describes an investigation of the feasibility of using an STR profile to find a close relative in a direct-to-consumer database of genome-wide-SNP-array of data (by exploiting linkage disequilibrium between the STRs and nearby SNPs). 22/

Is It Time? concludes with this thought:
At the very least, putting the idea of a universal forensic database on the table would spur a long overdue debate about the deficiencies of the current system and, more broadly, our societal commitment to privacy, fairness, and equal protection under the law.
Surely this "long overdue debate" has been going on for decades. The issues of privacy, fairness, and equality that arise from law enforcement DNA databanks and databases have always been "on the table," as shown by law review articles, books, radio and television programs, newspaper articles, blogs, government reports, and conferences. This literature, like the latest incarnation in Science, repeatedly has alluded to a population-wide database for a counterpoint to the compromises of the present system. 23/ From this perspective, elaborating on the details of a "universal database" could be helpful even though the idea has no political legs.

  1. NIJ, National Commission on the Future of DNA Evidence, (modified Apr. 3, 2013). The Commission's first met in March 1998, and it is possible that not all the commissioners were in place in 1997.
  2. David H. Kaye, Legal Issues Working Group Meeting Summary, Sept. 14, 1998.
  3. David H.Kaye & Edward J. Imwinkelried, Forensic DNA Typing, Selected Legal Issues: A Report to the Working Group on Legal Issues, National Commission on the Future of DNA Evidence 73 (Nov. 30, 2001). The Working Group submitted the report to the Commission without explicitly endorsing it. See David H. Kaye, The Double Helix and the Law of Evidence 186 (2010).
  4. Id. at 71.
  5. D.H. Kaye & Michael E. Smith, DNA Identification Databases: Legality, Legitimacy, and the Case for Population-Wide Coverage, 2003 Wisc. L. Rev. 414 (2003).
  6. D.H. Kaye & Michael E. Smith, DNA Databases: The Coverage Question and the Case a Population-wide Database, in DNA and the Criminal Justice System: The Technology of Justice 247 (D. Lazer ed., Cambridge, Mass.: MIT Press 2004).
  7. D.H. Kaye et al., Is a DNA Identification Database in Your Future?, Crim. Just., Fall 2001, pp. 4-11.
  8. Michael E. Smith et al., DNA Data Would Combat Crime, Racism, USA Today, July 26, 2001, at 15A.
  9. Another working group member -- Professor Edward Imwinkelried -- joined in writing the previous two publications.
  10. Ben Ruse, MP Wants DNA Birth Records, West Australian, May 4, 2001, at 6, available at 2001 WL 20291651; Robert Williamson & Rony Duncan, Commentary, DNA Testing for All: There Are Two Fair Possibilities for Forensic DNA Testing: Everyone or No One, 418 Nature 585 (2002).
  11. Robin McKie, Inventor Warns over Abuse of DNA Data: Privacy in Peril from Genetic Fingerprint Technology, Guardian, Aug. 7, 2004. See also Judge Calls for UK DNA Database, BBC News, Nov. 24, 2004.
  12. Akhil Reed Amar, Foreword: the Document and the Doctrine, 114 Harv. L. Rev. 26, 125-26 (2000); Akhil Reed Amar, A Safe Intrusion, Am. Law., June 11, 2001 ("We could 'fingerprint' everyone's DNA and still protect privacy if doctrinal obstructionists would get out of the way"); Akhil Reed Amar, A Search for Justice in Our Genes, N.Y. Times, May 7, 2002; Alan Dershowitz, Identification Please, Boston Globe, Aug. 11, 2002 at 14.
  13. E.g., David H. Kaye, Why So Contrived? DNA Databases After Maryland v. King, 104 J. Crim. L. & Criminol. 535, 580-82 (2014); Richard Lempert, Maryland v. King: An Unfortunate Supreme Court Decision on the Collection of DNA Samples, Brookings Institute Up Front, June 6, 2013; Eric Posner, The Mother of DNA Databases, Slate, Mar. 5, 2013.
  14. E.g., Greg Miller, Familial DNA Testing Scores A Win in Serial Killer Case, 329 Science 262 (2010).
  15. Compare Erin Murphy, Relative Doubt: Familial Searches of DNA Databases, 109 Mich. L. Rev. 291, 329 n.152 (2010) (“virtually impossible that a universal database could withstand constitutional scrutiny”), with David H. Kaye, The Genealogy Detectives: A Constitutional Analysis of “Familial Searching”, 51 Am. Crim. L. Rev. 109, 128-29 (2013).
  16. J. W. Hazel, E. W. Clayton, B. A. Malin & C. Slobogin, Is it Time for a Universal Genetic Forensic Database?, 362 Science 898 (2018), DOI: 10.1126/science.aav5475
  17. See authorities cited supra notes 5-6.
  18. See supra note 15.
  19. David H. Kaye, A Fourth Amendment Theory for Arrestee DNA and Other Biometric Databases, 15 U. Pa. J. Const. L. 1095 (2013).
  20. King (unconvincingly) relied on interests limited to the pretrial period to uphold compulsory sampling on arrest but not to keep the profile in a database without a conviction. See David H. Kaye, Why So Contrived? DNA Databases After Maryland v. King, 104 J. Crim. L. & Criminology 535 (2014).
  21. Hazel et al., supra note 16, at 899.
  22. Joyce Kim et al., Statistical Detection of Relatives Typed with Disjoint Forensic and Biomedical Loci, 175 Cell 848 (2018).
  23. The references collected here are not exhaustive. See, e.g., David H. Kaye, Maryland v. King: Per Se Unreasonableness, the Golden Rule, and the Future of DNA Databases, 127 Harv. L. Rev. Forum 39 (2013) (using a "universal database" as "a thought-experiment" for evaluating less inclusive database systems).

Saturday, November 24, 2018

Breaking the Promise of Confrontation in Stuart v. Alabama

Denials of cert usually are not worth mentioning, but the one in Stuart v. Alabama is notable. In that case, Alabama courts relied on a surrogate-witness theory to admit into evidence two laboratory reports of high blood-alcohol concentration without any testimony from the technician who wrote the reports or from anyone else involved in their preparation -- even though in Bullcoming v. New Mexico the Supreme Court rejected this theory as a justification for not affording a defendant the opportunity to confront the author of the report.

In response to a petition for a writ of certiorari, the state contended that the Bullcoming violation did not matter because the numbers in the reports were only the basis for an expert's extrapolation to the concentration at the time of the accident that gave rise to the negligent homicide and drunken driving prosecution. Stuart described this representation of the record as "not candid," but the state insisted that the admission of the reports could be upheld by piecing together votes from Williams v. Illinois. It dismissed the fact that, unlike Williams, the reports were introduced into evidence (and the reported concentrations described as a far in excess of the legal limit) as a mere technicality.

The Supreme Court denied the petition. The state's tortured argument about Williams provoked Justice Gorsuch, together with Justice Sotomayor, to file a dissenting opinion maintaining that cross-examination is needed to expose bias and error in forensic science reports and expressing strong disagreement with the plurality and Justice Thomas's opinions in Williams v. Illinois. More details follow.
At around 11:00 p.m., April 1, 2015, police found Vanessa Stuart's vehicle off the steep shoulder of the road. Inside, Stuart was talking on the telephone. Another vehicle sat at the edge of the woods with Tiffany Howell's dead body inside. A traffic-homicide investigator determined that Stuart’s vehicle, traveling at 90 to 100 miles per hour, had struck Howell’s from behind, spinning it and causing it to roll several times before striking a tree.

At the hospital, Stuart refused a blood-alcohol test and tried to leave. Police arrested her and took her to jail. After acquiring a search warrant for her blood, they took her back to the hospital to secure vials of her blood. By that time, four hours had passed. A second sample was taken half an hour later. The vials went to the Alabama Department of Forensic Sciences, where Belicia Sutton wrote reports about the alcohol levels in the samples from "the suspect."

At Stuart's trial for negligent homicide and driving under the influence, the state did not call Sutton to the witness stand. Instead, it offered the reports themselves into evidence and then had Dr. James Hudson, the laboratory's toxicology section chief, extrapolate backwards from the already high level of 0.174 recorded in the first report to conclude that Stuart’s blood-alcohol level at the time of the wreck was a whopping 0.234.

Stuart appealed her resulting convictions, arguing in part that the state deprived her of her Sixth Amendment right to confront the witnesses against her. Dr. Hudson, she pointed out, was not involved in the testing and did not even work for the state at the time of the accident. In an unpublished opinion, the Alabama Court of Criminal Appeals rejected the argument on the theory that Hudson could stand in as a surrogate for Sutton. It wrote that:
Dr. Hudson gave extensive testimony regarding the policies and procedures of the DFS’s toxicology laboratory. This included controls in the analysis and the laboratory’s standard practice of having the results of the analysis independently reviewed. Dr. Hudson testified that "as the [toxicology] section chief, I’m fundamentally the toxicology supervisor so I’m responsible for the day-to-day workflow in the laboratory, testing assignments for cases, as well as personnel management.” (R. 630.) “This testimony provided [Stuart] with ample opportunity to cross-examine [Dr. Hudson] regarding the [blood]-analysis report.” Taylor v. State [Ms. CR-15-0354, Sept. 9, 2016] __ So. 3d __, __ (Ala. Crim. App. 2016). This Court holds that Stuart’s right to confront the witnesses against her was not violated by the circuit court’s allowing Dr. Hudson to testify to the results of her blood analysis. As such, this issue does not entitle Stuart to any relief.
The Alabama Supreme Court declined to review the case, and Stuart petitioned the U.S. Supreme Court for a writ of certiorari on the ground that Sutton's reports were received as evidence of Stuart's blood-alcohol level through Hudson's testimony in stark violation of the Confrontation Clause as applied in Bullcoming v. New Mexico, 564 U.S. 647 (2011).

She had a point. Bullcoming was another DUI case in which a suspect's blood was taken at a hospital and sent to the state forensic laboratory for analysis. As in Stuart, "the State called another analyst who was familiar with the laboratory's testing procedures, but had neither participated in nor observed the test on [the] blood sample." In an opinion joined in relevant part by four members of the Court, Justice Ginsburg rejected the surrogate theory in sweeping terms:
The question presented is whether the Confrontation Clause permits the prosecution to introduce a forensic laboratory report containing a testimonial certification — made for the purpose of proving a particular fact — through the in-court testimony of a scientist who did not sign the certification or perform or observe the test reported in the certification. We hold that surrogate testimony of that order does not meet the constitutional requirement. The accused's right is to be confronted with the analyst who made the certification, unless that analyst is unavailable at trial, and the accused had an opportunity, pretrial, to cross-examine that particular scientist.
Id. at 652. Justice Sotomayor concurred, highlighting the circumstances that made the surrogate witness's testimony an unacceptable substitute: "the person testifying [was not] a supervisor, reviewer, or someone else with a personal, albeit limited, connection to the scientific test at issue" and "an expert witness was [not] asked for his independent opinion about underlying testimonial reports that were not themselves admitted into evidence."

In response to the Bullcoming argument, the state abandoned the surrogacy theory of its trial and appellate courts. It argued that Hudson's testimony about the reports was not subject to the confrontation requirement because the blood-alcohol level of 0.174 (and a slightly lower reading from the later sample) were not introduced to prove that Stuart was driving with a blood-alcohol concentration above the legal limit, but rather was a hypothetical assumption made solely to arrive at the extrapolated figure of 0.234. So characterized, Hudson's testimony did not offend the Confrontation Clause because "[t]he Clause ... does not bar the use of testimonial statements for purposes other than establishing the truth of the matter asserted," Crawford v. Washington, 541 U.S. 36, 53 n.4 (2004), and the state was not trying to prove that Stuart's blood alcohol concentration was 0.174 hours after the accident. At least, that is what the state claimed. .

That a majority of the Justices of the Supreme Court rejected just such an argument (in separate opinions that conflicted in another respect) in Williams v. Illinois, 567 U.S. 50 (2012), did not faze Alabama's Attorney General. His brief contended that because four Justices propounded the hypothetical-assumption argument in Williams, and because one of the Justices who rejected it also maintained that most laboratory reports lack the formality necessary to be statements that trigger a right to confront their authors, Hudson's testimony was constitutionally admitted into evidence.

Aside from its inherent artificiality, this reasoning overlooks the fact that, as in Bullcoming (but not Williams), the laboratory reports were explicitly admitted into evidence. Their admission and publication to the jury without an opportunity to cross-examine their author violated the Confrontation Clause even if Hudson's reiteration of their content was permissible under the plurality's opinion in Williams. Apparently, the jury was not instructed that they were not to rely the numbers in the reports as true, but only to use Dr. Hudson's opinion -- that is, his extrapolation from them -- as evidence against the accused. Indeed, the state had Dr. Hudson testify that the laboratory's findings of 0.174 and 0.158 greatly exceeded the legal limit of 0.08 (prompting Stuart to describe Alabama's argument as "not candid"). In contrast, the Williams plurality noted that the trier of fact there was a learned judge who could be expected (somehow) not to rely on the laboratory report for its truth but to consider it only as an explanation of how the testifying expert reached her "independent" conclusion). Alabama dismissed these differences as mere technicalities.

The Supreme Court denied the petition in Stuart. Of course, in itself a denial of such a petition has no precedential effect and is not even an expression of views on the merits of the case. The Court grants cert for but a small fraction of the many petitions it receives, rarely giving a reason for denying the petitions.

Nevertheless, the inaction in Stuart may seem disappointing. With its four inconclusive and conflicting opinions, Williams has licensed chaos in the lower courts. But Stuart may not have been a suitable vehicle for re-examining the not-for-the-truth reasoning of the Williams plurality. Had it granted certiorari, the Court might have written a two-sentence opinion remanding the case for a determination of whether the violation of Bullcoming was harmless error. (Well, maybe more than two, just to point out that the not-for-the truth reasoning, already rejected by a majority of the Court in Williams, cannot possibly be extended to cases in which laboratory reports are admitted into evidence without limitation.) Or, the Court could have used Stuart to overrule the 5-4 decision in Bullcoming in order to affirm. But the case was not ideally suited to cleaning up the mess left by Williams.

Even so, two Justices dissented from the denial of certiorari and issued a substantial opinion on the merits -- an unusual action. Justice Gorsuch, who was not on the Court for its trilogy of opinions on the Confrontation Clause and laboratory reports (Melendez-Diaz, Bullcoming, and Williams), wrote this dissenting opinion. Justice Sotomayor joined it. The opinion begins with a paean to cross-examination:
More and more, forensic evidence plays a decisive role in criminal trials today. But it is hardly “immune from the risk of manipulation.” Melendez-Diaz v. Massachusetts, 557 U.S. 305, 318 (2009). A forensic analyst “may feel pressure—or have an incentive—to alter the evidence in a manner favorable to the prosecution.” Ibid. Even the most well-meaning analyst may lack essential training, contaminate a sample, or err during the testing process. ... To guard against such mischief and mistake and the risk of false convictions they invite, our criminal justice system depends on adversarial testing and cross-examination. Because cross-examination may be “the greatest legal engine ever invented for the discovery of truth,” ... the Constitution promises every person accused of a crime the right to confront his accusers. ... [¶] That promise was broken here.
Whether cross-examination is generally effective at exposing inadequate training, contamination, or error is open to question, but it certainly can complement the scientific engine for discovering truths about alcohol levels, trace evidence, and the like.

With this introduction in place, Justice Gorsuch observed that "the State of Alabama introduced in evidence the results of a blood-alcohol test conducted hours after [Stuart's] arrest [but] refused to bring to the stand the analyst who performed the test." But the opinion does not note that the state was seeking to extend the plurality's rule in Williams to a laboratory report actually admitted into evidence and presented to the jury as proof of what it asserts. Rather, Justice Gorsuch simply endorsed the position taken in Williams by the five dissenting Justices and Justice Thomas. They maintained that the not-for-the-truth theory is untenable because the testifying expert's opinion cannot be credited unless the missing witness's report is true. As Justice Gorsuch put it,
The whole point of the exercise was to establish—because of the report’s truth—a basis for the jury to credit the testifying expert’s estimation of Ms. Stuart’s blood-alcohol level hours earlier. As the four dissenting Justices in Williams explained, “when a witness . . . repeats an out-of-court statement as the basis for a conclusion, . . . the statement’s utility is then dependent on its truth.” 567 U. S., at 126 (opinion of KAGAN, J.). With this JUSTICE THOMAS fully agreed, observing that “[t]here is no meaningful distinction between disclosing an out-of-court statement so that the factfinder may evaluate the [testifying] expert’s opinion and disclosing that statement for its truth.”  Id., at 106 (opinion concurring in judgment).
Although this is the better understanding of the situation even when, as in Williams, the report is not introduced into evidence, in Stuart, the Williams plurality could adhere to their more contrived analysis while agreeing with Justice Gorsuch that no "prosecutor [would] bother to offer in evidence the nontestifying analyst’s report in this case except to prove the truth of its assertions about the level of alcohol in Ms. Stuart’s blood at the time of the test" (emphasis added).

The opinion concludes with a short analysis of Alabama's additional claim that the laboratory report was not "testimonial" because it lacked the formality of depositions, affidavits, certificates, or similar instruments. Here Justice Gorsuch joins the ranks of nearly every other Justice. Only Justice Thomas contends that police laboratory reports prepared for criminal investigations and possible prosecutions are not sufficiently formal to be testimonial unless they are sworn certificates.

The Stuart dissent is a clear and well warranted plea for a clarification of the Williams decision. Significantly, it places Justice Gorsuch on the side of those who oppose insulating the authors of a laboratory report from cross-examination simply by presenting those reports as the basis for some other expert's opinion. Laboratory reports raise special issues for Confrontation Clause jurisprudence, but they should be faced more directly. See Jennifer L. Mnookin & David H. Kaye, Confronting Science: Expert Evidence and the Confrontation Clause, 2012 Sup. Ct. Rev. 99 (2013).

Friday, November 23, 2018

Cheapskate's DNA Could Be His Undoing

Wisconsin was the first state to issue criminal complaints "naming" the suspect through a DNA profile so as to avoid the statute of limitations. The state court of appeals upheld the practice in State v. Dabney, 663 N.W.2d 366 (Wisc. Ct. App. 2003). Today, there are some 23 such DNA complaints pending in Wisconsin. Most are for burglaries. Some are for unsolved sexual assaults. One is for an armed robbery.

The most recent complaint addressed to an an unknown defendant, however, is for threatening a county judge in 2012. It is captioned
State of Wisconsin, Plaintiff
Doe, John, Unknown Male, with Matching Deoxyribonucleic Acid (DNA) Profile at Genetic Locations D3S1358 (15, 18), TH01(6, 9.3), D21S11 (29, 31.2), D18S51 (13, 15), Penta E (12), D5S818 (11, 13), D13S17 (11, 14), D7S820 (10, 11), D16S539 (13, 14), CSF1PO (11, 12), Penta D (9, 12), Amelogenin (X, Y), vWA (17), D8S1179 (12, 13), TPOX (9, 11), and FGA (22, 22.2), Defendant
The list is not just the genetic locations. (That would be useless, since everyone has these genetic locations.) The identification of the individual comes from the DNA features -- the "alleles" -- at these "loci." The identifying alleles are designated by the numbers in parentheses.

The DNA that produced this profile came from a nine-cent stamp affixed to the envelop containing the threatening letter. Presumably, the individual making the threat licked the stamp. Indeed, the same profile was found for DNA recovered from threatening letters mailed to three other public officials in Wisconsin. Whether the sender was able to get away with using nine-cent stamps in these other incidents has not been reported. If he is ever caught, postal fraud will be the least of his problems.

  1. Ed Treleven, With Clock Ticking, DOJ Charges Unidentified Suspect for Threatening Judge in 2012, Wisc. State J., Oct. 9, 2018.
  2. Meagan Flynn, The Culprit’s Name Remains Unknown. But He Licked a Stamp, and Now His DNA Stands Indicted, Wash. Post, Oct. 17, 2018

Muddling Through the Measurement of IQ

IQ scores are a critical component in the diagnosis of intellectual disability. That measurements of IQ are subject to various sources of measurement error is widely appreciated, but by and large, lawyers and psychologists have supplied rather imprecise -- and sometimes incorrect -- explanations of the statistics involved. A recent example is Intellectual Disability and the Death Penalty: Current Issues and Controversies, a book intended as "a valuable resource for mental health experts, attorneys, investigators, mitigation specialists, and other members of legal teams, as well as judges." 1/ The authors explain the "standard scores" that put the mean IQ score in the population at 100 as follows:
A person's standard score on a test is calculated by transforming the individual's obtained raw score on Test A (e.g., the sum of the number of correct responses on a test) using the population's known mean and standard deviation on Test A, which transforms the individual's test performance onto a common metric allowing us to compare his or her score to anyone else tested with Test A. Standard scores are possible only for tests where, if administered to the entire population, the distribution of all test scores on said test would be normally distributed ... . A percentile score is one form of a standard score that permits the interpretation of a person's performance in relation to a reference group. Although not a requirement, in the case of many psychological tests the scale for standard scores is set to have a mean or average score of 100 and a standard deviation of 15. Thus, a test performance that results in a standard score of 70 is said to be "significantly" below average or approximately two standard deviations below the population mean. A standard deviation is a unit of measure that indicates the distance from the average. During the standardization phase of the development of a standardized test, the test and its items are administered to a large anc representative sample of the reference group of interest or population. This is generally referred to as the standardization sample or norming group. From this norming group, the test developers compute the population's mean score and standard deviation on the test. The mean score and standard deviation are essential to transforming subsequently obtained raw scores (i.e., the sum of the number of correct items) on said test to a standard scale score (e.g., intelligence quotient, or IQ). 2/
Percentiles and Standard Scores

Standard scores have some value in "compar[ing one individual's] score to anyone else tested with Test A." Unlike raw scores, they incorporate the variance in the scores across different test-takers into the reported score. They are perhaps more useful for comparing scores from different tests (or different forms of the same test, or from tests administered to populations that are changing over time)

But whatever the motivation for a standardized reporting scale, it is strange to describe percentiles as standard scores. A standard score is just a particular linear transformation of a raw score that specifies "the number of standard deviations above (+) or below (-) the mean you are." 3/ As an example, suppose that the raw-score population mean for "Test A" is 60; that the population standard deviation is 12; and that a test taker has a raw score of 50. The standard score is 5/6s of a standard deviation below the mean: z = (50 - 60)/12 = -5/6 = -0.83.

To translate the raw score (or the corresponding z-score of -0.83) into a percentile, we need to know how the raw scores are distributed. For example, if raw scores were uniformly distributed from about 39 to 81, then some 26% of them would be 50 or less. If the raw scores were normally distributed (with the same mean and standard deviation), then 20% of the population would have a raw score of 50 (or less). Other distributions would produce other percentiles. Consequently, the percentile is not "one form of a standard score." At best, the percentile can be deduced from the standard score and other information.

A Standardized Scale Does Not Require Normality

Why are "[s]tandard scores ... possible only for tests where, if administered to the entire population, the distribution of all test scores on said test would be normally distributed"? Standard scores can be constructed for any distribution of test scores with a defined mean and standard deviation. Normality may be convenient or common, but it is not essential to a standardized score scale.

So What?

Not much turns on these corrections to the explanation in Intellectual Disability and the Death Penalty. IQ scores are more or less normally distributed, and the use of IQ scores of 70 and below (z ≤ -2) as the range in which an individual can be diagnosed as intellectually disabled limits the diagnosis to no more than roughly 2.3% of the general population.

But why should "a standard score of 70 [be] said to be 'significantly' below average"? Why is not an IQ score of 71 -- or even 80 -- significantly below the mean of 100? There is no statistical reason to focus on 70 as a cut off. In Hall v. Florida, 572 U.S. 5 (2014), a  majority of the Supreme Court was content with categorically excluding from the zone of intellectual disability (for the purpose of deciding potential eligibility for capital punishment) all defendants with true IQ scores above 70. Yet, no one could explain the basis for this fundamental choice. It is a convention currently in vogue among experts who want to have some such threshold. 4/

Quantifying Measurement Error

At the same time that the Court limited eligibility for the constitutional exemption from capital punishment because of intellectual disability to a small fraction of the population by approving of the z ≤ -2 range for true scores, it held that a slightly higher cutoff for observed scores was constitutionally necessary to ensure that random error in measuring IQ does not preclude too many defendants with true scores of 70 or less from consideration. Intellectual Disability and the Death Penalty explained this refinement as follows:
The Supreme Court of the United States in Hall v. Florida ruled that states must consider the test's standard error of measurement when interpreting obtained IQ scores in cases where the defendant is making an intellectual decision claim. ...
The standard error of measurement (SEM) is a direct measure of the test's reliability and is computed by administering the test to a large and representative sample of the population to be assessed on the test and computing the test's reliability coefficient, which can then be translated into an average error of measurement for the population ... . Generally, the SEM is computed and then used to create confidence intervals around the obtained standard scores (e.g., 95% certainty). A confidence interval of 95% represents a statistical certainty that, based on the knowledge of this test's reliahility coefficient, there is a 95% chance that the person's true score falls within a confidence interval that is +/-2 times the test's SEM. Thus, a professional reporting on an assessed individual's "obtained" full-scale IQ score of 70 on IQ Test A and knowing that Test A has a SEM of 2.5 around its full-scale IQ score, he would report that there is a 95% certainty that the assessed person's "true" full-scale IQ score falls within the range of 65-75 (i.e., 2x2.5= +/-5 points). 5/
This passage is garbled in two ways. To begin with, SEM is not "a direct measure of the test's reliability." It is a statistic derived from "the test's reliability coefficient." There are many ways to estimate reliability, and the logic behind the move from reliability to SEM is subtle. A better statistic for estimating the uncertainty in the observed score would be the standard error of estimate (SEE). The SEM is an average across all scores. The SEE takes into account the fact that uncertainty increases as one moves away from the population mean (IQ = 100). A description of the SEE can be found elsewhere. 6/

Second, the 95% in a 95% confidence interval is neither a "statistical certainty" nor "a 95% chance that the person's true score falls within [the computed] confidence interval." This interpretation of "confidence" is ubiquitous -- and widely known (to statisticians) to be wrong. The misinterpretation was apparent in the dissenting opinion written for four Justices by Justice Alito. It probably was implicit in the majority opinion penned by Justice Kennedy. Although we would expect (in the long run) 95% of all 95% confidence intervals  to contain the true value, the probability that a particular interval covers the true score cannot be computed with the machinery of confidence intervals. 7/ Interval estimates that can be said to provide such probabilities require Bayes theorem. Again, discussion and examples for IQ scores are available elsewhere. 8/

Clinical psychologists, lawyers, and judges are not statisticians. They do not have to compute means, standard deviations, standard errors, confidence intervals, or Bayesian credible regions. Nevertheless, to become more astute users of such statistics, they need a better understanding of the reasoning behind standard scores and expressions for measurement error.

  1. Marc L. Tassé & John H. Blume, Intellectual Disability and the Death Penalty: Current Issues and Controversies vii (Prager 2018).
  2. Id. at 87.
  3. Penn State University Eberly College of Science, STAT 100: Statistical Concepts and Reasoning § 5.2 (2018),
  4. David H. Kaye, Deadly Statistics: Quantifying an "Unacceptable Risk" in Capital Punishment, 16 Law, Probability & Risk 7-34 (2017),
  5. Tassé & Blume, supra note 1, at 90.
  6. Kaye, supra note 4.
  7. For an elaboration in legal settings, see David H. Kaye, Apples and Oranges: Confidence Coefficients and the Burden of Persuasion, 73 Cornell L. Rev. 54 (1987).
  8. Kaye, supra note 4.