Thursday, August 15, 2019

Post PCAST: Washington D.C. High Court Won't Tolerate No-doubt Testimony Matching a Bullet to a Single Gun

In an opinion relying in part on the PCAST Report on feature-comparison evidence, the District of Columbia's highest court discussed limits on the testimony a firearms-toolmark examiner can give. But it did not get very far. An earlier opinion in Williams v. United States, 130 A.3d 343 (D.C. 2016), determined that the admission of some extreme testimony (described below) was not plain error. In Williams v. United States, 210 A.3d 734 (D.C. 2019) (Williams II), the Court of Appeals revisited the plain-error question in light of later rulings decided before sentencing. It concluded that even though entertaining the opinion testimony was "error" and the error was "plain," the "plain error" exception to the rule against reversing a conviction on the basis of unobjected-to testimony did not justify reversal. (I know, that is a convoluted sentence, but the law on the plain-error exception to the need for a contemporaneous objection is convoluted.)

At trial,
[T]he examiner opined that “these three bullets were fired from this firearm.” On redirect, when asked whether there was “any doubt in [his] mind” that the bullets recovered from Mr. Kang's SUV were fired from the gun found in Mr. Williams's bedroom, the examiner responded, “[n]o, sir.” The examiner elaborated that “[t]hese three bullets were identified as being fired out of Exhibit No. 58. And it doesn't matter how many firearms Hi[-]Point made. Those markings are unique to that gun and that gun only.” The examiner then restated his unequivocal opinion: “Item Number 58 fired these three bullets.”
(Citations omitted). On the petition for rehearing that generated the Williams II opinion, the government relied on a footnote in one of these cases, Gardner v. United States, 140 A.3d 1172 (D.C. 2016). The note in Gardner stated that the holding was “limited in that it allows toolmark experts to offer an opinion that a bullet or shell casing was fired by a particular firearm, but it does not permit them to do so with absolute or 100% certainty.” 140 A.3d at 1184 n.19. The government argued in Williams II that "this footnote authorized opinion testimony identifying a specific bullet as having been fired by a specific gun."

Justice Catharine Easterly's opinion for the court found this interpretation of the footnote "difficult to square with the above-the-line holding that the trial court 'had erred' by admitting the examiner's 'unqualified opinion,' that the 'the silver gun was the murder weapon.'" Id. at 1184. The opinion added that
Moreover, the publication post Gardner of another federal government report—President's Council of Advisors on Science and Technology (“PCAST”), Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods (Sept. 2016), ... reiterates toolmark and firearms examiners do not currently have a basis to give opinion testimony that matches a specific bullet to a specific gun and that such testimony should not be admitted without a verifiable error rate does not support the government's argument that only express statements of certainty should be prohibited.
Nonetheless, the opinion did not "resolve the ambiguity of Gardner's footnote because "in this case ... the firearms and toolmark examiner not only testified ... that a specific bullet could be matched to a specific gun, but also that he did not have 'any doubt' about his conclusion." (Footnote omitted.) In the end, after emphasizing that the no-doubt-specific-source testimony "was error," and "the error is plain," the court only held that the plain-error exception to the rule against reversing on the basis of evidence that was not the subject of a contemporaneous objection did not apply. It did not apply because, considering "the government['s] powerful circumstantial case" in other respoects, "Mr. Williams ... cannot show a reasonable probability of a different result absent this error."

Reading between the lines, it appears that Justice Easterly was unable to convince the other two panel members to explicitly adopt (in dictum) the procedure PCAST recommended for source attributions -- a categorical conclusion accompanied by the upper bound of an estimated rate of Type I error as seen in so-called black-box experiments (or something similar). She wrote separately that the Gardner footnote "can only logically be understood in one way: as an acknowledgment that the government might be able to present expert opinion testimony that a specific bullet was fired by a specific a gun if the examiner could reliably qualify his pattern-matching opinion—i.e., if he can provide a verifiable error rate." To which Senior Judge Frank Nebeker replied in his separate opinion: "This is not a case in which to resolve the knotty question of to what degree of certainty, or not, an expert's opinion is admissible as to a particular fact." 1/

  1. Whether any source-attribution opinion -- with or without some initial qualification as to the degree of certainty -- is necessary or desirable is a further question, even more removed from what Judge Nebeker loosely called a "harmless error judgment." (The harmless-error doctrine is a little different from the plain-error doctrine.)

Sunday, July 21, 2019

Confidence Intervals -- If Only It Were That Simple

Confidence Interval: Statistics such as means (or averages) and medians are often calculated from data from a portion—or sample—of a population rather than from data for an entire population. Statistics based on sample data are called “sample statistics,” whereas those based on an entire population are called “population parameters.” A confidence interval is the range of values of a sample statistic that is likely to contain a population parameter, and that likeliness is expressed with a specific probability. For example, if a study of a sample of 1,500 Americans finds their average weight to be 150 pounds with a 95 percent confidence interval of plus/minus 25 pounds, this means that there is a 95 percent probability that the average weight of the entire American population is between 125 and 175 pounds. --Wm. Nöel & Judy Wang, Is Cannabis a Gateway Drug? Key Findings and Literature Review: A Report Prepared by the Federal Research Division, Library of Congress, Under an Interagency Agreement with the Office of the Director, National Institute of Justice, Office of Justice Programs, U.S. Department of Justice, Nov. 2018, at 3.

{T]here is a 5 percent chance the true value [of a 95% one-sided confidence interval] exceeds the bound. --President’s Council of Advisors on Science and Technology, Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods, Sept. 2016, at 153.
[T]he confidence level does not give the probability that the unknown parameter lies within the confidence interval. ... According to the frequentist theory of statistics, probability statements cannot be made about population characteristics: Probability statements apply to the behavior of samples. That is why the different term ‘confidence’ is used. --David H. Kaye & David A. Freedman, Reference Guide on Statistics, in Reference Manual on Scientific Evidence 211, 247 (Federal Judicial Center & National Research Council Committee on the Development of the Third Edition of the Reference Manual on Scientific Evidence eds., 3d ed. 2011).

Warning! ... [T]he fact that a confidence interval is not a probability statement about [an unknown value] is confusing. --Larry Wasserman, All of Statistics: A Concise Course in Statistical Inference 93(2004) (emphasis in original).

Wednesday, July 17, 2019

No Tension Between Rule 704 and Best Principles for Interpreting Forensic-science Test Results

At a webinar on probabilistic genotyping organized by the FBI, the Department of Justice’s Senior Advisor on Forensic Science, Ted Hunt, summarized the rules of evidence that are most pertinent to scientific and expert testimony. In the course of a masterful survey, he suggested that Federal Rule of Evidence 704 somehow conflicts with the evidence-centric approach to evaluating laboratory results recommended by a subcommittee of the National Commission on Forensic Science, by the American Statistical Association, and by European forensic-science service providers. 1/ In this approach, the expert stops short of opining on whether the defendant is the source of the trace. Instead, the expert merely reports that the data are L times more probable when the hypothesis is true than when some alternative source hypothesis is true. (Or, the expert gives some qualitative expression such as "strong support" when this likelihood ratio is large.)

Whatever the merits of these proposals, Rule 704 does not stand in the way of implementing the recommended approach to reporting and testifying. First, the identity of the source of a trace is not necessarily an ultimate issue. To use the example of latent-print identification given in the webinar, the traditional opinion that a named individual is the source of a print is not an opinion on an ultimate issue. Courts have long allowed examiners to testify that the print lifted from a gun comes from a specific finger. But this conclusion is not an opinion on whether the murder defendant is the one who pulled the trigger. The examiner’s source attribution bears on the ultimate issue of causing the death of a human being, but the examiner who reports that the prints were defendant's is not opining that the defendant not only touched the gun (or had prints planted on it) but also pulled the trigger. Indeed, the latent print examiner would have no scientific basis for such an opinion on an element of the crime of murder.

Furthermore, even when an expert does want to express an opinion on an ultimate issue, Rule 704 does not counsel in favor of admitting it into evidence. Rule 704(a) consists of a single sentence: “An opinion is not objectionable just because it embraces an ultimate issue.” The sole function of these words is to repeal an outmoded, common-law rule categorically excluding these opinions. The advisory committee that drafted this repealing rule explained that “to allay any doubt on the subject, the so-called ‘ultimate issue’ rule is specifically abolished by the instant rule.” The committee expressed no positive preference for such opinions over evidence-centric expert testimony. It emphasized that Rules 701, 702, and 403 protect against unsuitable opinions on ultimate issues. Modern courts continue to exclude ultimate-opinion testimony when it is not sufficiently helpful to jurors. For example, conclusions of law remain highly objectionable.

Consequently, any suggestion that Rule 704 is an affirmative reason to admit one kind of testimony over another is misguided. “The effect of Rule 704 is merely to remove the proscription against opinions on ‘ultimate issues' and to shift the focus to whether the testimony is ‘otherwise admissible.’” 2/ If conclusion-centric testimony is admissible, then so is the evidence-centric evaluation that lies behind it--with or without the conclusion.

In sum, there is no tension between Rule 704(a) and the recommendation to follow the evidence-centric approach. Repealing a speed limit on a road does not imply that drivers should put the pedal to the floor.

  1. This is the impression I received. The recording of the webinar should be available at the website of the Forensic Technology Center of Excellence in a week or two.
  2. Torres v. County of Oakland, 758 F.2d 147, 150 (6th Cir.1985).
UPDATED: 18 July 2019 6:22 AM

Saturday, July 6, 2019

Distorting Daubert and Parting Ways with PCAST in Romero-Lobato

United States v. Romero-Lobato 1/ is another opinion applying the criteria for admissibility of scientific evidence articulated in Daubert v. Merrell Dow Pharmaceuticals 2/ to uphold the admissibility of a firearms examiner's conclusion that the microscopic marks on recovered bullets prove that they came from a particular gun. To do so, the U.S. District Court for the District of Nevada rejects the conclusions of the President's Council of Advisors on Science and Technology (PCAST) on validating a scientific procedure.

This is not to say that the result in the case is wrong. There is a principled argument for admitting suitably confined testimony about matching bullet or ammunition marks. But the opinion from U.S. District Court Judge Larry R. Hicks does not contain such an argument. The court does not reach the difficult question of how far a toolmark expert may go in forging a link between ammunition and a particular gun. It did not have to. In what seems to be a poorly developed challenge to firearms-toolmark expertise, the defense sought to exclude all testimony about such an association.

This posting describes the facts of the case, the court's description of the law on the admissibility of source attributions by firearms-toolmark examiners, and its review of the practice under the criteria for admitting scientific evidence set forth by the Supreme Court in Daubert.


A grand jury indicted Eric Romero-Lobato for seven felonies. On March 4, 2018, he allegedly tried to rob the Aguitas Bar and Grill and discharged a firearm (a Taurus PT111 G2) into the ceiling. On May 14, he allegedly stole a woman's car at gunpoint while she was cleaning it at a carwash. Later that night, he crashed the car in a high-speed chase On the front passenger's seat, was a Taurus PT111 G2 handgun.

Steven Johnson, a supervising criminalist in the Forensic Science Division of the Washoe County Sheriff's Office," 3/ was prepared to testify that the handgun had fired a round into the ceiling of the bar. Romero-Lobato moved "to preclude the testimony." The district court held a pretrial hearing at which Johnson testified to his background, training, and experience. He explained that he matched the bullet to the gun using the "AFTE method" advocated by the Association of Firearm and Tool Mark Examiners.

Defendant's challenge rested "on the critical NAS and PCAST Reports as evidence that 'firearms analysis' is not scientifically valid and fails to meet the requisite threshold for admission under Daubert and Federal Rule of Evidence 702." Apparently, the only expert at the hearing was the Sheriff Department's criminalist. Judge Hicks denied the motion to exclude Johnson's expert opinion testimony and issued a relatively detailed opinion.


Skipping over the early judicial resistance to "this is the gun" testimony, 4/ the court noted that despite misgivings about such testimony on the part of several federal district courts, only one reported case has barred all source opinion testimony 5/ and the trend among the more critical courts is to search for ways to admit the conclusion with qualifications on its certainty.

Judge Hicks did not pursue the possibility of admitting but constraining the testimony, apparently because the defendant did ask for that. Instead, the court reasoned that to overcome the inertia of the current caselaw, a defendant must have extraordinarily strong evidence (although it also recognized that the burden is on the government to prove scientific validity under Daubert) .The judge wrote:
[T]he defense has not cited to a single case where a federal court has completely prohibited firearms identification testimony on the basis that it fails the Daubert reliability analysis. The lack of such authority indicates to the Court that defendant's request to exclude Johnson's testimony wholesale is unprecedented, and when such a request is made, a defendant must make a remarkable argument supported by remarkable evidence. Defendant has not done so here.
Defendant's less-than-remarkable evidence was primarily two consensus reports of scientific and other experts who reviewed the literature on firearms-mark comparisons. 6/  Both are remarkable. The first document was the highly publicized National Academy of Sciences committee report on improving forensic science. The committee expressed concerns about the largely subjective comparison process and the absence of studies to adequately measure the uncertainty in the evaluations. The court deemed these to be satisfied by a single research report submitted to Department of Justice, which funded the study:
The NAS Report, released in 2009, concluded that “[s]ufficient studies have not been done to understand the reliability and repeatability” of firearm and toolmark examination methods. ... The Report's main issue with the AFTE method was that it did not provide a specific protocol for determining a match between a shell casing or bullet and a specific firearm. ... Instead, examiners were to rely on their training and experience to determine if there was a “sufficient agreement” (i.e. match) between the mark patterns on the casing or bullet and the firearm's barrel. ... During the Daubert hearing, Johnson testified about his field's response to the NAS Report, pointing to a 2013 study from Miami-Dade County (“Miami-Dade Study”). The Miami-Dade Study was conducted in direct response to the NAS Report and was designed as a blind study to test the potential error rate for matching fired bullets to specific guns. It examined ten consecutively manufactured barrels from the same manufacturer (Glock) and bullets fired from them to determine if firearm examiners (165 in total) could accurately match the bullets to the barrel. 150 blind test examination kits were sent to forensics laboratories across the United States. The Miami-Dade Study found a potential error rate of less than 1.2% and an error rate by the participants of approximately 0.007%. The Study concluded that “a trained firearm and tool mark examiner with two years of training, regardless of experience, will correctly identify same gun evidence.”
A more complete (and accurate) reading of the Miami Dade Police Department's study, shows that it was not designed to measure error rates as they are defined in the NAS report and that the "error rate" was much closer to 1%. That's still small, and, with truly independent verification of an examiners' conclusions, the error rate should be smaller than that for examiners whose findings are not duplicated. Nonetheless, as an earlier posting shows, the data are not as easily interpreted and applied to case work as the report from the crime laboratory suggests.The research study, which has yet to appear in any scientific journal. has severe limitations.

The second report, released late in 2016 by the President's Council of Advisors on Science and Technology (PCAST) flatly maintained that microscopic firearms-marks comparisons had not been scientifically validated. Essentially dismissing the Miami Dade Police and earlier research as not properly designed to measure the ability of examiners to infer whether the same gun fired test bullets and ones recovered from a crime scene, PCAST reasoned that (1) AFTE-type identification had yet to be shown to be "reliable" within the meaning of Rule 702 (as PCAST interpreted the rule); (2) if courts disagreed with PCAST's legal analysis of the rule's requirements, they should at least require examiners associating ammunition with a particular firearm to give an upper bound, as ascertained from controlled experiments, on false-positive associations. (These matters are discussed in previous postings.)

The court did not address the second conclusion and gave little or no weight to the first one. It wrote that the 2016 report
concluded that there was only one study done that “was appropriately designed to test foundational validity and estimate reliability,” the Ames Laboratory Study (“Ames Study”). The Ames Study ... reported a false-positive rate of 1.52%. ... The PCAST Report did not reach a conclusion as to whether the AFTE method was reliable or not because there was only one study available that met its criteria.
All true. PCAST certainly did not write that there is a large body of high quality research that proves toolmark examiners cannot associate expended ammunition with specific guns. PCAST's position is that a single study is not a body of evidence that establishes a scientific theory--replication is crucial. If the court believed that there is such a body of literature, it should have explained the basis for its disagreement with the Council's assessment of the literature. If it agreed with PCAST that the research base is thin, then it should have explained why forensic scientists should be able to testify--as scientists--that they know which gun fired which bullet. This opinion does neither. (I'll come to court's discussion of Daubert below.)

Instead, the court repeats the old news that
the PCAST Report was criticized by a number of entities, including the DOJ, FBI, ATF, and AFTE. Some of their issues with the Report were its lack of transparency and consistency in determining which studies met its strict criteria and which did not and its failure to consult with any experts in the firearm and tool mark examination field.
Again, all true. And all so superficial. That prosecutors and criminal investigators did not like the presidential science advisors' criticism of their evidence is no surprise. But exactly what was unclear about PCAST's criteria for replicated, controlled, experimental proof? In fact, the DOJ later criticized PCAST for being too clear--for having a "nine-part" "litmus test" rather than more obscure "trade-offs" with which to judge what research is acceptable. 7/

And what was the inconsistency in PCAST's assessment of firearms-marks comparisons? Judge Hicks maintained that
The PCAST Report refused to consider any study that did not meet its strict criteria; to be considered, a study must be a “black box” study, meaning that it must be completely blind for the participants. The committee behind the report rejected studies that it did not consider to be blind, such as where the examiners knew that a bullet or spent casing matched one of the barrels included with the test kit. This is in contrast to studies where it is not possible for an examiner to correctly match a bullet to a barrel through process of elimination.
This explanation enucleates no inconsistency. The complaint seems to be that PCAST's criteria for a validating a predominantly subjective feature-comparison procedure are too demanding or restrictive, not that these criteria were applied inconsistently. Indeed, no inconsistency in applying the "litmus test" for an acceptable research design to firearms-mark examinations is apparent.

Moreover, the court's definition of "a 'black box' study" is wrong. All that PCAST meant by "black box" is that the researchers are not trying to unpack the process that examiners use and inspect its components. Instead, they say to the examiner, "Go ahead, do your thing. Just tell us your answer, and we'll see if you are right." The term is used by software engineers who test complex programs to verify that the outputs are what they should be for the inputs. The Turing test for the proposition that "machines can think" is a kind of black box test.

Nonetheless, this correction is academic. The court is right about the fact that PCAST gave no credence to "closed tests" like those in which an examiner sorts bullets into pairs knowing in advance that every bullet has a mate. Such black-box experiments are not worthless. They show a nonzero level of skill, but they are easier than "open tests" in which an examiner is presented with a single pair of bullets to decide whether they have a common source, then another pair, and another, and so on. In Romero-Lobato, the examiner had one bullet from the ceiling to compare to a test bullet he fired from one suspect gun. There is no "trade-off" that would make the closed-test design appropriate for establishing the examiner's skill at the task he performed.

All that remains of the court's initial efforts to avoid the PCAST report is the tired complaint about a "failure to consult with any experts in the firearm and tool mark examination field." But what consultation does the judge think was missing? The scientists and technologists who constitute the Council asked the forensic science community for statements and literature to support their practices. It shared a draft of its report with the Department of Justice before finalizing it. After releasing the report, it asked for more responses and issued an addendum. Forensic-services providers may complain that the Council did not use the correct criteria, that its members were closed-minded or biased, or that the repeated opportunities to affect the outcome were insufficient or even a sham. But a court needs more than a throw-away sentence about "failure to consult" to justify treating the PCAST report as suspect.


Having cited a single, partly probative police laboratory study as if it were a satisfactory response to the National Academy's concerns and having colored the President's Council report as controversial without addressing the limited substance of the prosecutors' and investigators' complaints, the court offered a "Daubert analysis." It marched through the five indicia that the Supreme Court enumerated as factors that courts might consider in assessing scientific validity and reliability.

A. It Has Been Tested

The Romero-Lobato opinion made much of the fact that "[t]he AFTE methodology has been repeatedly tested" 8/ through "numerous journals [sic] articles and studies exploring the AFTE method" 9/ and via Johnson's perfect record on proficiency tests as proved by his (hearsay and character evidence) testimony. Einstein once expressed impatience "with scientists who take a board of wood, look for its thinnest part and drill a great number of holes where drilling is easy." 10/ Going through the drill of proficiency testing does not prove much if the tests are simple and unrealistic. A score of trivial or poorly designed experiments should not engender great confidence. The relevant question under Daubert is not simply "how many tests so far?" It is how many challenging tests have been passed. The opinion makes no effort to answer that question. It evinces no awareness of the "10 percent error rate in ballistic evidence" noted in the NAS Report, that prompted corrective action in the Detroit Police crime laboratory.

Instead of responding to PCAST's criticisms of the design of the AFTE Journal studies, the court wrote that "[a]lthough both the NAS and PCAST Reports were critical of the AFTE method because of its inherent subjectivity, their criticisms do not affect whether the technique they criticize has been repeatedly tested. The fact that numerous studies have been conducted testing the validity and accuracy of the AFTE method weighs in favor of admitting Johnson's testimony."

But surely the question under Daubert is not whether there have been "numerous studies." It is what these studies have shown about the accuracy of trained examiners to match a single unknown bullet with control bullets from a single gun. The court may have been correct in concluding that the testing prong of Daubert favors admissibility here, but its opinion fails to demonstrate that "[t]here is little doubt that the AFTE method of identifying firearms satisfies this Daubert element."

B. Publication and Peer Review

Daubert recognizes that, to facilitate the dissemination, criticism, and modification of theories, modern science relies on publication in refereed journals that members of the scientific community read. Romero-Lobato deems this factor to favor admission for two reasons. First, the AFTE Journal in which virtually all the studies dismissed by PCAST appear, uses referees. That it is not generally regarded as a significant scientific journal -- it is not available through most academic libraries, for example -- went unnoticed.

Second, the court contended that "of course, the NAS and PCAST Reports themselves constitute peer review despite the unfavorable view the two reports have of the AFTE method. The peer review and publication factor therefore weighs in favor of admissibility." The idea that the rejection in consensus reports of a series of studies as truly validating a theory "weighs in favor of admissibility" is difficult to fathom. Some readers might find it preposterous.

C. Error Rates

Just as the court was content to rely on the absolute number of studies as establishing that the AFTE method has been adequately tested, it takes the error rates reported in the questioned studies at face value. Finding the numbers to be "very low," and implying (without explanation) that PCAST's criteria are too "strict," it concludes that Daubert's "error rate" factor too "weighs in favor of admissibility."

A more plausible conclusion is that a large body of studies that fail to measure the error rates (false positive and negative associations) appropriately but do not indicate very high error rates is no more than weakly favorable to admission. (For further discussion, see the previous postings on the court's discussion of the Miami Dade and Ames Laboratory technical reports.)

D. Controlling Standards

The court cited no controlling standards for the judgment of  "'sufficient agreement' between the 'unique surface contours' of two toolmarks." After reciting the AFTE's definition of "sufficient agreement," Judge Hicks decided that "matching two tool marks essentially comes down to the examiner's subjective judgment based on his training, experience, and knowledge of firearms. This factor weighs against admissibility."

However, the opinion adds that "the consecutive matching striae ('CMS') method," which Johnson used after finding "sufficient agreement," is "an objective standard under Daubert." It is "objective" because an examiner cannot conclude that there is a match unless he "observes two or more sets of three or more consecutive matching markings on a bullet or shell casing." The opinion did not consider the possibility that this numerical rule does little to confine discretion if no standard guides the decision of  whether a marking matches. Instead, the opinion debated whether the CMS method should be considered objective and confused that question with how widely the method is used.

The relevant inquiry is not whether a method is subjective or objective. For a predominantly subjective method, the question is whether standards for making subjective judgments will produce more accurate and more reliable (repeatable and reproducible) decisions and how much more accurate and reliable they will be.

E. General Acceptance

Finally, the court found "widespread acceptance in the scientific community." But the basis for this conclusion was flimsy. It consisted of statements from other courts like "the AFTE method ... is 'widely accepted among examiners as reliable'" and "[t]his Daubert factor is designed to prohibit techniques that have 'only minimal support' within the relevant community." Apparently, the court regarded the relevant community as confined to examiners. Judge Hicks wrote that
it is unclear if the PCAST Report would even constitute criticism from the “relevant community” because the committee behind the report did not include any members of the forensic ballistics community ... . The acceptance factor therefore weighs in favor of admitting Johnson's testimony.
If courts insulate forensic-science service providers from the critical scrutiny of outside scientists, how can they legitmately use the general-acceptance criterion to help ascertain whether examiners are presenting "scientific knowledge" à la Daubert or something else?

  1. No. 3:18-cr-00049-LRH-CBC, 2019 WL 2150938 (D. Nev. May 16, 2019).
  2. 509 U.S. 579 (1993).
  3. For a discussion of a case involving inaccurate testimony from the same laboratory that caught the attention of the Supreme Court, see David H. Kaye, The Interpretation of DNA Evidence: A Case Study in Probabilities, National Academies of Science, Engineering and Medicine, Science Policy Decision-making Educational Modules, 2016, available at; McDaniel v. Brown: Prosecutorial and Expert Misstatements of Probabilities Do Not Justify Postconviction Relief — At Least Not Here and Not Now, Forensic Sci., Stat. & L., July 7, 2014,
  4. See David H. Kaye, Firearm-Mark Evidence: Looking Back and Looking Ahead, 68 Case W. Res. L. Rev. 723, 724-25 (2018), available at The court relied on the article's explication of more modern case law.
  5. The U.S. District Court for the District of Colorado  excluded toolmark conclusions in the prosecutions for the bombing of the federal office building in Oklahoma City. The toolmarks there came from a screwdriver. David H. Kaye et al., The New Wigmore, A Treatise on Evidence: Expert Evidence 686-87 (2d ed. 2011).
  6. The court was aware of an earlier report from a third national panel of experts raising doubts about the AFTE method, but it did not cite or discuss that report's remarks. Although the 2008 National Academies report on the feasibility of establishing a ballistic imaging database only considered the forensic toolmark analysis of firearms in passing, it gave the practice no compliments. Kaye, supra note 2, a 729-32.
  7. Ted Robert Hunt, Scientific Validity and Error Rates: A Short Response to the PCAST Report, 86 Fordham L. Rev. Online Art. 14 (2017),
  8. Quoting United States v. Ashburn, 88 F.Supp.3d 239, 245 (E.D.N.Y. 2015).
  9. Citing United States v. Otero, 849 F.Supp.2d 425, 432–33 (D.N.J. 2012), for "numerous journals [sic] articles and studies exploring the AFTE method."
  10. Philipp Frank, Einstein's Philosophy of Science, Reviews of Modern Physics (1949).
MODIFIED: 7 July 2019 9:10 EST

Sunday, June 23, 2019

The Miami Dade Bullet-matching Study Surfaces in United States v. Romero-Lobato

Last month, the US District Court for the District of Nevada rejected another challenge to firearms toolmark comparisons. The opinion in United States v. Romero-Lobato, 1/ written by Judge Larry R. Hicks, relies in part on a six-year-old study that has yet to appear in any scientific journal. 2/ The National Institute of Justice (the research-and-development arm of the Department of Justice) funded the Miami-Dade Police Department Crime Laboratory "to evaluate the repeatability and uniqueness of striations imparted by consecutively manufactured EBIS barrels with the same EBIS pattern to spent bullets as well as to determine the error rate for the identification of same gun evidence." 3/ Judge Hicks describes the 2013 study as follows:
The Miami-Dade Study was conducted in direct response to the NAS Report and was designed as a blind study to test the potential error rate for matching fired bullets to specific guns. It examined ten consecutively manufactured barrels from the same manufacturer (Glock) and bullets fired from them to determine if firearm examiners (165 in total) could accurately match the bullets to the barrel. 150 blind test examination kits were sent to forensics laboratories across the United States. The Miami-Dade Study found a potential error rate of less than 1.2% and an error rate by the participants of approximately 0.007%. The Study concluded that “a trained firearm and tool mark examiner with two years of training, regardless of experience, will correctly identify same gun evidence.”
The "NAS Report" was the work of a large committee of scientists, forensic-science practitioners, lawyers, and others assembled by the National Academy of Sciences to recommend improvements in forensic science. A federal judge and a biostatistician co-chaired the committee. In 2009, four years after Congress funded the project, the report arrived. It emphasized the need to measure the error probabilities in pattern-matching tasks and discussed what statisticians call two-by-two contingency tables for estimating the sensitivity (true-positive probability) and specificity (true-negative probability) of the classifications. However, the Miami-Dade study was not designed to measure these quantities. To understand what it did measure, let's look at some of the details in the report to NIJ as well as what the court gleaned from the report (directly or indirectly).

A Blind Study?

The study was not blind in the sense of the subjects not realizing that they were being tested. They surely knew that they were not performing normal casework when they received the unusual samples and the special questionnaire with the heading "Answer Sheet: Consecutively Rifled EBIS-2 Test Set" asking such questions as "Is your Laboratory ASCLD/Lab Accredited?" That is not a fatal flaw, but it has some bearing -- not recognized in the report's sections on "external validity" -- on generalizing from the experimental findings to case work.  4/

Volunteer Subjects?

The "150 blind examination kits" somehow went to 201 examiners, not just in the United States, but also in "4 international countries." 5/ The researchers did not consider or reveal the performance of 36 "participants [who] did not meet the two year training requirement for this study." (P. 26). How well they did in comparison to their more experienced colleagues would have been worth knowng, although it would have been hard to draw a clear concolusions since there so few errors on the test. In any event, ignoring the responses from the trainees "resulted in a data-producing sample of 165 participants." (P. 26).

These research subjects came from emails sent to "the membership list for the Association of Firearm and Tool Mark Examiners (AFTE)." (Pp. 15-16). AFTE members all "derive[] a substantial portion of [their] livelihood from the examination, identification, and evaluation of firearms and related materials and/or tool marks." (P. 15). Only 35 of the 165 volunteers were certified by AFTE (p. 30), and 20 worked at unaccredited laboratories (P. 31).

What Error Rates?

Nine of the 165 fully trained subjects (5%) made errors (treating "inconclusive" as a correct response). The usual error rates (false positives and false negatives) are not reported because of the design of the "blind examination kits." The obvious way to obtain those error rates is to ask each subject to evaluate pairs of items -- some from the same source and some from different sources (with the examiners blinded to the true source information known to the researchers). Despite the desire to respond to the NAS report, the Miami Dade Police Department Laboratory did not make "kits" consisting of such a mixture of pairs of same-source and different-source bullets.

Instead, the researchers gave each subject a single collection of ten bullets produced by firing one manufacturer's ammunition in eight of the ten barrels. (Two of these "questioned bullets," as I will call them, came from barrel 3 and two from barrel 9; none came from barrel 4.) Along with the ten questioned bullets, they gave the subjects eight pairs of what we can call "exemplar bullets." Each pair of exemplar bullets came from two test fires of the same eight of the ten consecutively manufactured barrels (barrels 1-3 and 5-9). The task was to associate each questioned bullet with an exemplar pair or to decide that it could not be associated with any of the eight pairs. Or, the research subjects could circle "inconclusive" on the questionnaire. Notice that almost all the questioned bullets came from the barrels that produced the exemplar bullets -- only two such barrels were not a source of an unknown -- and bullets from only one barrel that produced a questioned bullet was not in the exemplar set.

This complicated and unbalanced design raises several questions. After associating an unknown bullet with an exemplar pair, will an examiner seriously consider the other exemplar pairs? After eliminating a questioned bullet as originating from, say seven exemplar-pair barrels, would he be inclined to pick one of the remaining three? Because of the extreme overlap in the sets, on average, such strategies would pay off. Such interactions could make false eliminations less probable, and true associations more probable, than with the simpler design of a series of single questioned-to-source comparisons.

The report to NIJ does not indicate that the subjects received any instructions to prevent them from having an expectation that most of the questioned bullets would match some pair of exemplar bullets. The only instructions it mentions are on a questionnaire that reads:
Please microscopically compare the known test shots from each of the 8 barrels with the 10 questioned bullets submitted. Indicate your conclusion(s) by circling the appropriate known test fired set number designator on the same line as the alpha unknown bullet. You also have the option of Inconclusive and Elimination. ...
Yet, the report confidently asserts that "[t]he researchers utilized an 'open set' design where the participants had no expectation that all unknown tool marks should match one or more of the unknowns." (P. 28).

To be sure, the study has some value in demonstrating that the subset of the subjects could perform a presumably difficult task in associating unknown bullets with exemplar ones. Moreover, whatever one thinks of this alleged proof of "uniqueness," the results imply that there are microscopic (or other) features of marks on bullets that vary with the barrel through which they traveled. But the study does not supply a good measure of examiner skill at making associations in fully "open" situations.

A 0.007% Error Rate?

As noted above, but not in the court's opinion, 5% of the examiners made some kind of error. That said, there were only 12 false-positive associations or false-negative ones (outright eliminations) out of 165 x 10 = 1,650 answers. (I am assuming that every subject completed the questionnaire for every unknown bullet.) That is an overall error proportion of 12/1650 = 0.007 = 0.7%.

The researchers computed the error rate slightly differently. They only reported the average error rate for the 165 experienced examiners. The vast majority (156) made no errors. Six made 1 error, and 3 made 2. So the average examiner's proportion of errors was [156(0) + 6(0.1) + 3(0.2)]/165 = 0.007. No difference at all.

This 0.007 figure is 100 times the number the court gave. Perhaps the opinion had a typographical error -- an adscititious  percentage sign that the court missed when it reissued its opinion (to correct other typographical errors). The error rate is still small and would not affect the court's reasoning.

But the overall proportion of errors and the average-examiner error rate could diverge. The report gives the error proportions for the 9 examiners who made errors as 0.1 (6 of the examiners) and 0.2 (another 3 examiners). Apparently, all of the 9 erroneous examiners evaluated all 10 unknowns. What about the other 156 examiners? Did all of them evaluate all 10? The worst-case scenario is that every one of the 156 error-free examiners answered only one question. That supplies only 156 correct answers. Add this number to the 12 incorrect answers, and we have an error proportion of 12/168 = 0.7 = 7% -- another 100 times larger than the court's number.

However, this worst-case scenario did not occur. The funding report states that "[t]here were 1,496 correct answers, 12 incorrect answers and 142 inconclusive answers." (P. 15). The sum of these numbers of answers is 1,650. Did every examiner answer every question? Apparently so. For this 100% completion rate, the report's emphasis on the examiner average (which is never larger and often smaller than the overall error proportion) is a distinction without a difference.

There is a further issue with the number itself. "Inconclusives" are not correct associations. If every examiner came back with "inconclusive" for every questioned bullet, the researchers hardly could report the zero error rate as validating bullet-matching. 7/ From the trial court's viewpoint, inconclusives just do not count. They do not produce testimony of false associations or of false eliminations. The sensible thing to do, in ascertaining error rates for Daubert purposes, is to toss out all "inconclusives."

Doing so here makes little difference. There were 142 inconclusive answers. (P. 15). If these were merely "not used to calculate the overall average error rates," as the report claims (p. 32), the overall error proportion was 12/(1605 - 142) = 12/1508 = 0.008 -- still very small (but still difficult to interpret in terms of the parameters of accuracy for two-by-two tables).

The report to NIJ discussed another finding that, at first blush, could be relevant to the evidence in this case: "Three of these 35 AFTE certified participants reported a total of four errors, resulting in an error rate of 0.011 for AFTE Certified participants." (P. 30). Counter-intuitively, this 1% average is larger than the reported average error rate of 0.007 for all the examiners.

That the certified examiners did worse than the uncertified ones may be a fluke. The standard error in the estimate of the average-examiner error rate was 0.32 (p. 29), which indicates that, despite the observed difference in the sample data, the study does not reveal whether certified examiners generally do better or worse than uncertified ones. 7/

A Potential Error Rate?

Finally, the court's reference to "a potential error rate of less than 1.2%" deserves mention. The "potential error rate" is tricky. Potentially, the error rate of individual practitioners like the ones who volunteered for the study, with no verification step by another examiner, could be larger (or smaller). There is no sharp and certain line that can be drawn for the maximum possible error rate. (Except that it could not be 100%.)

In this case, 1.2% is the upper limit of a two-sided confidence interval. The Miami Dade authors wrote that:
A 95% confidence interval for the average error rate, based on the large sample distribution of the sample average error rate, is between 0.002 and 0.012. Using a confidence interval of 95%, the error rate is no more than 0.012, or 1.2%.
A 95% confidence interval means that if there had been a large number of volunteer studies just like this one, making random draws from an unchanging population of volunteer-examiners and having these examiners perform the same task in the same way, about 95% of the many resulting confidence intervals would encompass the true value for the entire population. But the hypothetical confidence intervals would vary from one experiment to the next. We have a statistical process -- a sort of butterfly net -- that is broad enough to capture the unknown butterfly in about 95% of our swipes. The weird thing is that with each swipe, the size and center of the net change. On the Miami Dade swipe, one end of the net stretched out to the average error rate of 1.2%.

So the court was literally correct. There is "a potential error rate" of 1.2%. There is also a higher potential error rate that could be formulated -- just ask for 99% "confidence." Or lower -- try 90% confidence. And for every confidence interval that could be constructed by varying the confidence coefficient, there is the potential for the average error rate to exceed the upper limit. Such is the nature of a random variable. Randomness does not make the upper end of the estimate implausible. It just means that it is not "the potential error rate," but rather a clue to how large the actual rate of error for repeated experiments could be.

Contrary to the suggestion in Romero-Lobato, that statistic is not the "potential rate of error" mentioned in the Supreme Court's opinion in Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993). The opinion advises judges to "ordinarily ... consider the known or potential rate of error, see, e.g., United States v. Smith, 869 F. 2d 348, 353-354 (CA7 1989) (surveying studies of the error rate of spectrographic voice identification technique)." The idea is that along with the validity of an underlying theory, how well "a particular scientific technique" works in practice affects the admissibility of evidence generated with that technique. When the technique consists of comparing things like voice spectrograms, the accuracy with which the process yields correct results in experiments like the ones noted in Smith are known error rates. That is, they are known for the sample of comparisons in the experiment. (The value for all possible examiners' comparisons is never known.)

These experimentally determined error rates are also a "potential rate of error" for the technique as practiced in case work. The sentence in Daubert that speaks to "rate of error" continues by adding, as part of the error-rate issue, "the existence and maintenance of standards controlling the technique's operation, see United States v. Williams, 583 F. 2d 1194, 1198 (CA2 1978) (noting professional organization's standard governing spectrographic analysis)." The experimental testing of the technique shows that it can work -- potentially; controlling standards ensure that it will be applied consistently and appropriately to achieve this known potential. Thus, Daubert's reference to "potential" rates does not translate into a command to regard the upper confidence limit (which merely accounts for sampling error in the experiment) as a potential error rate for practical use.

  1. No. 3:18-cr-00049-LRH-CBC, 2019 WL 2150938 (D. Nev. May 16, 2019).
  2. That is my impression anyway. The court cites the study as Thomas G. Fadul, Jr., et al., An Empirical Study to Improve the Scientific Foundation of Forensic Firearm and Tool Mark Identification Utilizing Consecutively Manufactured Glock EBIS Barrels with the Same EBIS Pattern (2013), available at The references in Ronald Nichols, Firearm and Toolmark Identification: The Scientific Reliability of the Forensic Science Discipline 133 (2018) (London: Academic Press), also do not indicate a subsequent publication.
  3. P. 3. The first of the two "research hypotheses" was that "[t]rained firearm and tool mark examiners will be able to correctly identify unknown bullets to the firearms that fired them when examining bullets fired through consecutively manufactured barrels with the same EBIS pattern utilizing individual, unique and repeatable striations." (P. 13). The phrase "individual, unique and repeatable striations" begs a question or two.
  4. The researchers were comforted by the thought that "[t]he external validity strength of this research project was that all testing was conducted in a crime laboratory setting." (P. 25). As secondary sources of external validity, they noted that "[p]articipants utilized a comparison microscope," "[t]he participants were trained firearm and tool mark examiners," "[t]he training and experience of the participants strengthened the external validity," and "[t]he number of participants exceeded the minimum sample size needed to be statistically significant." Id. Of course, it is not the "sample size" that is statistically significant, but only a statistic that summarizes an aspect of the data (other than the number of observations).
  5. P. 26 ("A total of 201 examiners representing 125 crime laboratories in 41 states, the District of Columbia, and 4 international countries completed the Consecutively Rifled EBIS-2 Test Set questionnaire/answer sheet.").
  6. Indeed, some observers might argue that an "inconclusive" when there is ample information to reach a conclusion is just wrong. In this context, however, that argument is not persuasive. Certainly, "inconclusives" can be missed opportunities that should be of concern to criminalists, but they are not outright false positives or false negatives.
  7. The opinion does not state whether the examiner in the case -- "Steven Johnson, a supervising criminalist in the Forensic Science Division of the Washoe County Sheriff's Office" -- is certified or not, but it holds that he is "competent to testify" as an expert.

Tuesday, June 11, 2019

Junk DNA (Literally) in Virginia

The Washington Post reported yesterday on a motion in Alexandria Circuit Court to suppress "all evidence flowing from the warrantless search of [Jesse Bjerke's] genetic profile." 1/ Mr. Bjerke is accused of raping a 24-year-old lifeguard at gunpoint at her home after following her from the Alexandria, Va., pool where she worked. She "could describe her attacker only as a thin man she believed was 35 to 40 years old and a little over 6 feet tall." 2/ Swabs taken by a nurse contained sperm from which the Virginia Department of Forensic Sciences obtained a standard STR profile.

Apparently, the STR profile was in neither the Virginia DNA database not the national one (NDIS). So the police turned to the Virginia bioinformatics company, Parabon Labs, which has had success with genetic genealogy searches of the publicly available genealogy database, GEDmatch. Parabaon reported that
[T]he subject DNA file shares DNA with cousins related to both sides of Jesse's family tree, and the ancestral origins of the subject are equivalent to those of Jesse. These genetic connections are very compelling evidence that the subject is Jesse. The fact that Jesse was residing in Alexandria, VA at the time of the crime in 2016 fits the eyewitness description and his traits are consistent with phenotype predictions, further strengthens the confidence of this conclusion.
Recognizing the inherent limitations in genetic genealogy, Parabon added that
Unfortunately, it is always possible that the subject is another male that is not identifiable through vital records or other research means and is potentially unknown to his biological family. This could be the result if an out-of-wedlock birth, a misattributed paternity, an adoption, or an anonymous abandonment.
The motion suggests that the latter paragraph, together with the firm's boiler-plate disclaimer of warranties and the fact that the report contains hearsay, means that police lacked even probable cause to believe that the sperm came from the defendant. This view of the information that the police received is implausible, but regardless of whether "the facts contained in the Parabon report do not support probable cause," 3/ the police did not use the information either to arrest Mr. Bjerke immediately or to seek a warrant to compel him to submit to DNA sampling. Instead,
Police began following Bjerke at his home and the hospital where he worked as a nurse. They took beer bottles, soda cans and an apple core from his trash. They tracked him to a Spanish restaurant ... and, after he left, bagged the straws he had used.

The DNA could not be eliminated as a match for the sperm from the rape scene, a forensic analysis found, leading to Bjerke’s indictment and arrest in February. With [a] warrant, law enforcement again compared his DNA with the semen at the crime scene. The result: a one in 7.2 billion chance it was not his. 4/
A more precise description of the "one in 7.2 billion chance" is that if Mr. Bjerke is not the source, then an arbitrarily selected unrelated man would have that tiny a chance of having the STR profile. The probability of the STR match given the hypothesis that another man is the source is not necessarily the same as the probability of the source given the match. But for a prior probability reflecting the other evidence so far revealed about Mr. Bjerke, there would not be much difference between the conditional probability the laboratory supplied and the article's transposed one.

Faced with such compelling evidence, Mr. Bjerke wants it excluded at trial. The motion states that
For the purposes of this motion, there are three categories of DNA testing. (1) DNA testing conducted before Jesse Bjerke was a suspect in the case; (2) DNA testing conducted without a warrant after Jesse Bjerke became a suspect in the case; and (3) DNA testing conducted with a warrant after Jesse Bjerke's arrest. This motion seeks to suppress all DNA evidence in categories two and three that relate to Jesse Bjerke.
An obstacle is the many cases -- not mentioned in the motion -- holding that shed or "abandoned" DNA is subject to warrantless collection and analysis for identifying features on the theory that the procedure is not a "search" under the Fourth Amendment. The laboratory analysis is not an invasion of Mr. Bjerke's reasonable expectation of privacy -- at least, not if we focus solely on categories (2) and (3), as the motion urges. This standard STR typing was done after the genetic genealogy investigation was completed. The STR profile (which the motion calls a "genetic profile" even though it does not characterize any genes) provides limited information about an individual. For that reason, the conclusion of the majority of courts that testing shed DNA is not a search is supportable, though not ineluctable. ("Limited" does not mean "zero.")

Indeed, most laboratory tests on or for traces from crimes are not treated as searches covered by the warrant and probable cause protections. Is it a search to have the forensic lab analyze a fingerprint from a glass left at a restaurant? Suppose a defendant tosses a coat in a garbage bin on the street, and the police retrieve it, remove glass particles, and analyze the chemical composition to see they match the glass from a broken window in a burglary? Did they need a warrant to study the glass particles?

The underlying issue is how much the constitution constrains the police in using trace evidence that might associate a known suspect with a crime scene or victim. When the analysis reveals little or nothing more than the fact of the association, I do not see much of an argument for requiring a warrant. That said, there is a little additional information in the usual STR profile, so there is some room for debate here.

However, this case might be even more debatable (although the defense motion does not seem to recognize it) because of category (1) -- the genetic genealogy phase of the case. The police, or rather the firm they hired to derive a genome-wide scan for the genetic genealogy, have much more information about Mr. Bjerke at their disposal. They have on the order of a million SNPs. In theory, Parabon or the police could inspect the SNP data for medical or other sensitive information on Mr. Bjerke now that he has been identified as the probable source of those sperm.

Nevertheless, I do not know why the police or the lab would want to do this, and it has always been true that once a physical DNA sample is in the possession of the police, the possibility exists for medical genetic testing using completely different loci. Testing shed DNA in that way should be considered a search. Bjerke is a step in that direction, but are we there yet?

The Post's online story has 21 comments on it. Not one supported the idea that there was a significant invasion of privacy in the investigation. These comments are a decidedly small sample that does not represent any clear population, but the complete lack of support for the argument that genetic genealogy implicates important personal privacy was striking.

  1. Defendant's Motion to Suppress, Commonwealth v. Bjerke, No. CF19000031 (Cir. Ct., Alexandria, Va. May 20, 2019).
  2. Rachel Weiner, Alexandria Rape Suspect Challenging DNA Search Used to Crack Case, Wash, Post, June 10, 2019, at 1:16 PM.
  3. Defendant's Motion, supra note 1.
  4. Weiner, supra note 2.
  • Thanks to Rachel Weiner for alerting me to the case and providing a copy of the defendant's motion.

Friday, June 7, 2019

Aleatory and Epistemic Uncertainty

An article in the Royal Society's Open Science journal on "communicating uncertainty about facts, numbers and science" is noteworthy for the sheer breadth of the fields it surveys and its effort to devise a taxonomy of uncertainty for the purpose of communicating its nature or degree. The article distinguishes between "aleatory" and "epistemic" uncertainty:

[A] large literature has focused on what is frequently termed 'aleatory uncertainty' due to the fundamental indeterminacy or randomness in the world, often couched in terms of luck or chance. This generally relates to future events, which we can't know for certain. This form of uncertainty is an essential part of the assessment, communication and management of both quantifiable and unquantifiable future risks, and prominent examples include uncertain economic forecasts, climate change models and actuarial survival curves.

By contrast, our focus in this paper is uncertainties about facts, numbers and science due to limited knowledge or ignorance—so-called epistemic uncertainty. Epistemic uncertainty generally, but not always, concerns past or present phenomena that we currently don't know but could, at least in theory, know or establish.

The distinction is of interest to philosophers, psychologists, economists, and statisticians. But it is a little hard to pin down with the definition in the article. Aleatory uncertainty applies on the quantum mechanical level, but is it true that "in theory" predictions like weather and life span cannot be certain? Chaos theory shows that the lack of perfect knowledge about initial conditions of nonlinear systems makes long-term predictions very uncertain, but is it theoretically impossible to have perfect knowledge? The card drawn from a well-shuffled deck is a matter of luck, but if we knew enough about the shuffle, couldn't we know the card that is drawn? Thus, I am not so sure that the distinction is between (1) "fundamental ... randomness in the world" and (2) ignorance that could be remedied "in theory."

Could the distinction be between (1) instances of a phenomenon that has variable outcomes at the level of our existing knowledge of the world and (2) a single instance of a phenomenon that we do not regard as the outcome of a random process or that already has occurred, so that the randomness is gone? The next outcome of rolling a die (an alea in Latin) is always uncertain (unless I change the experimental setup to precisely fix the conditions of the roll), 1/ but whether the last roll produced a 1 is only uncertain to the extent that I cannot trust my vision or memory. I could reduce the latter, epistemic uncertainty by improving my system of making observations. For example, I could have several keen and truthful observers watch the toss, or I could film it and study the recording thoroughly. From this perspective, the frequency and propensity conceptions of probability concern aleatory uncertainty, and the subjective and logical conceptions traffic in both aleatory and epistemic uncertainty.

When it comes to the courtroom, epistemic uncertainty is usually in the forefront, and I may get to that example at a later date. For now, I'll just note that, regardless of whether the distinction offered above between aleatory and epistemic uncertainty is philosophically rigorous, people's attitudes toward aleatory and epistemic risk defined in this way do seem to be somewhat different. 2/

  1. Cf. P. Diaconis, S. Holmes & R. Montgomery, Dynamical Bias in the Coin Toss, 49(2) SIAM Rev. 211-235 (2007),
  2. Gülden Ülkümen, Craig R.  Fox & B. F. Malle, Two Dimensions of Subjective Uncertainty: Clues from Natural Language, 145(10) Journal of Experimental Psychology: General, 1280-1297.; Craig R. Fox & Gülden Ülkümen, Distinguishing Two Dimensions of Uncertainty, in Perspectives on Thinking, Judging, and Decision Making (Brun, W., Keren, G., Kirkebøen, G., & Montgomery, H.  eds. 2011).

Saturday, June 1, 2019

Frye-Daubert Flip Flops in Florida

For years, the Florida Supreme Court rebuffed suggestions that it adopt the standard for scientific evidence that the U.S. Supreme Court articulated for the federal judiciary in Daubert v. Merrell Dow Pharmaceuticals, 1/ Instead, it "repeatedly reaffirmed [its] adherence to the Frye standard for admissibility of evidence." 2/

In 2013, the Florida legislature passed a statute to replace Frye with the "reliability" wording of Federal Rule of Evidence 702 -- wording intended to codify Daubert and its progeny. Some Florida courts concluded that this brought Florida into the ranks of jurisdictions that use the Daubert standard of "evidentiary reliability" based on "scientific validity." 3/ However, the Florida Supreme Court has held that only it can implement "procedural" changes to the Florida Rules of Evidence (FLRE). 4/ The legislature has the power to promulgate "substantive" rules of evidence, but it may not force "procedural" ones down the judiciary's throat.

So the Florida Bar's Code and Rules of Evidence Committee reviewed the law. By a narrow margin, it recommended leaving Frye in place. The Florida Supreme Court agreed. It declined to adopt the Daubert amendment "due to the constitutional concerns raised" by certain Committee members and commenters. 5/ "Those concerns," the court explained, "include undermining the right to a jury trial and denying access to the courts." 6/ The next year, in DeLisle v. Crane Co., 7/ the court confirmed that the legislative switch to Daubert was purely procedural. Because the court did not bless it, the law was constitutionally ineffective.

Then last month, the court flip-flopped. It adopted the "Daubert amendments" under its "exclusive rule-making authority." 8/ Although the amendment to Rule 702 did not percolate through the rules committee a second time, the court decided that its earlier reservations about switching to Daubert "appear unfounded." 9/

Indeed, the arguments that the court considered "grave" two years ago are anything but. The two standards -- general scientific acceptance (Frye) and evidentiary reliability encompassing scientific validity (Daubert) -- each seek to screen out expert evidence that is insufficiently validated to warrant its use in court in light of the danger that it will be given too much weight. One standard (Daubert) asks judges to assess directly the validity of scientific theories. The other (Frye) has them do so indirectly, by looking only for a consensus in the scientific community. This difference in the mode of analysis does not make one approach constitutional and the other unconstitutional. Daubert does not create an inherently more demanding test than Frye. 10/ It describes more criteria for answering the same underlying question -- is the proposed evidence probative enough to come in as "science" (or some other form of expertise).

Certainly, there is room to debate the relative merits of the two approaches -- and room for different jurisdictions to go their own ways -- but the choice between Daubert and Frye (or other reasonable standards) does not pose a serious constitutional question.

  1. 509 U.S. 579 (1993).
  2. Marsh v. Valyou, 977 So.2d 543, 547 (Fla. 2007) (holding that the "general acceptance" standard fashioned in Frye v. United States, 293 F. 1013 (D.C.Cir.1923), and expressly adopted in Florida in Bundy v. State, 471 So.2d 9, 18 (Fla.1985), and Stokes v. State, 548 So.2d 188, 195 (Fla.1989), does not even apply to "pure opinion" testimony "causally linking trauma to fibromyalgia ... based on the experts' experience and training").
  3. Perez v. Bell So. Telecommunications, Inc., 138 So.3d 492, 497 (Fla. Dist. Ct. App. 2014). The phrases "evidentiary reliability" and "scientific validity" appear in the Daubert opinion.
  4. DeLisle v. Crane Co., 258 So.3d 1219 (Fla. 2018).
  5. In re Amendments to Florida Evidence Code, 210 So.3d 1231, 1239 (Fla. 2017).
  6. Id.
  7. 258 So.3d 1219 (Fla. 2018).
  8. In re Amendments to the Florida Evidence Code, No. SC19-107, 2019 WL 2219714 (Fla. May 23, 2019). Thanks are due to Ed Imwinkelried for calling the case to my attention.
  9. Id.
  10. The Florida Supreme Court had previously written that Frye imposed a "higher standard of reliability" than the "more lenient standard" in Daubert. Brim v. State, 695 So.2d 268, 271–72 (Fla. 1997). It is tempting to ask how Daubert's "more lenient" reliability requirement could be unconstitutional when Frye's more exacting standard is constitutionally sound. I suppose one could argue that because Frye (as construed in Florida)  does not bar "pure opinion" testimony that has not been shown to be scientifically reliable, it has less of an impact on "access to the courts." However, as discussed in The New Wigmore on Evidence: Expert Evidence (2d ed. 2011),  the "pure opinion" exception to either Frye or Daubert is untenable.

Sunday, May 19, 2019

Shoeprints in Indiana: Confronting a "Skilled Witness" with the PCAST Report

Last week, in Hughes v. State, 1/ the Indiana Court of Appeals wrote an opinion on the admissibility of shoeprint evidence and a defense attempt to present part of the 2016 PCAST report on feature-matching evidence. Mark Adrian Hughes was convicted for breaking into two newly constructed homes and stealing the appliances in them. "Sean Matusko, a forensic scientist with the ISP laboratory's latent-print unit" 2/ testified "that shoeprints found at both crime scenes were made by Hughes's shoes." The trial court overruled defendant's objection to this testimony and barred him from introducing into evidence a part of the PCAST report and from cross-examining Matusko about the content of the report. It reasoned "that Matusko was a "skilled witness" but not an expert one (preventing cross-examination), and that the report was hearsay (preventing its use as evidence).

The unpublished court of appeals opinion, penned by Judge Robert R. Altice, Jr., reversed defendant's convictions, but not because of these rulings. The appellate court determined that the prosecutor improperly introduced evidence of earlier, similar crimes. In remanding the case for a new trial, the court of appeals also discussed the shoeprint rulings. Its analysis is puzzling. The court wrote that
Hughes challenges the trial court's treatment of the State's shoeprint examiner, Matusko, as a skilled witness. Here, Matusko did not simply testify based on his personal experience ... . Rather, ... Matusko identified himself as a forensic scientist assigned to the latent print identification unit of the Indiana State Police, set out his academic background, detailed his training with regard to shoeprint identification, and explained in detail the process he used to identify shoeprints at both crime scenes as being made by Hughes's shoes. [O]ur Supreme Court has indicated that it is not inclined to consider all testimony relating to shoeprint identification to be opinion testimony governed by Evid. R. 702. In light of such precedent and our standard of review, we cannot say that the trial court abused its discretion in admitting Matusko's testimony under Evid. R. 701.
It is inconceivable that a witness who represents himself as a scientist applying a process with which lay jurors are unfamiliar and thereby deducing that a specific pair of shoes left the impressions is not testifying as an expert under Rule 702. He was not there, and he did not see what happened. If he knows anything about the source of the shoprints, it is because of he possesses special knowledge and skill beyond the ken or ordinary witnesses. Indiana Rule of Evidence 702(a) governs all witnesses with "specialized knowledge" who rely on their unusual "knowledge, skill, experience, training, or education [to] help the trier of fact to understand the evidence or to determine a fact in issue." Rule 701, on the other hand, governs opinions from "lay witnesses." It limits them to inferences that would be difficult or tedious to present as more primitive statements of the details the witness perceived. 3/ The division these rules create reflects an ancient distinction in the common law between ordinary fact witnesses -- the Rule 701 category -- and expert witnesses -- the Rule 702 group.

To be sure, a witness sometimes can testify in both capacities. A physician can be an ordinary fact witness in part -- "I saw that the patient was having trouble breathing" -- and a skilled witness in part -- "My diagnosis was pneumonia." But the latter opinion must satisfy Rule 702 to be admissible, and the doctor is subject to cross-examination to suggest that his or her diagnosis is unfounded. If a medical journal states that the diagnosis is not warranted without additional symptoms, the doctor can be asked about that as long as "the publication is established as a reliable authority" under Rule 803(18)(c), for "learned treatises."

Similarly, Matusko could testify -- as an ordinary fact witness under Rule 701 -- that the shoes he was asked to compare to the shoeprints were "Nike Air Jordan athletic shoes with a Jumpman logo molded into the soles." But Rule 701 would not let him speak as a "skilled witness" giving an opinion as to origin of the footprint based on his special skill as a shoeprint examiner. That task always has been reserved for expert witnesses. 4/

Now there might be a reason to present an expert (who does not appear as a "scientist") as merely a "skilled" witness. Indiana Rule of Evidence 702(b) codifies the rule of heightened scrutiny for scientific expert testimony articulated by the U.S. Supreme Court for the federal courts in Merrell Daubert v. Merrell Dow Pharmaceuticals. 5/ Like Daubert, Indiana Rule 702 specifies that "[e]xpert scientific testimony is admissible only if the court is satisfied that the expert testimony rests upon reliable scientific principles."

Whether Matsuko's source attribution can pass that bar is doubtful. At least, the scientists and engineers on the President's Council of Advisors doubted it. They concluded that identifying a particular shoe as the source of a print has yet to be scientifically validated. Because "[t]he entire process—from choice of features to include (and ignore) and the determination of rarity—relies entirely on an examiner’s subjective judgment," PCAST wanted to see studies that tested the performance of criminalists with impressions from the same shoes and from different shoes. Because no such studies exist, PCAST reported that
there are no appropriate empirical studies to support the foundational validity of footwear analysis to associate shoeprints with particular shoes based on specific identifying marks (sometimes called “randomly acquired characteristics"). Such conclusions are unsupported by any meaningful evidence or estimates of their accuracy and thus are not scientifically valid.
The last sentence, if correct, means that the shoeprint testimony cannot be introduced against Hughes on retrial -- not if the witness claims to be acting as a scientist applying a scientific procedure. Furthermore, even if the police criminalist distances himself from scientific titles and airs, the court must decide whether footwear identification is reliable enough to qualify as a different form of expert testimony. The absence of scientific validation will not be determinative, but it is a relevant factor.

Let's assume, though, that the trial court decides that a highly "de-scientized" version of Mr. Matusko's opinion is admissible under Rule 702. Can he be impeached with a scientific report? The Indiana Court of Appeals thought so. A footnote states that
As set out above, Hughes sought but was denied the opportunity to cross-examine Matusko with findings set out in the PCAST publication concerning reliability of shoeprint identification. Although we did not reach the merits of the admissibility of the PCAST publication, cross-examination regarding the findings therein was permissible regardless of whether Matusko was an expert or a skilled witness.
If correct, this conclusion -- that a criminalist can be impeached by confronting him with the PCAST report -- would be a boon to defendants across the nation. Without the trouble and expense of calling an expert witness, defense counsel can wave the devastating critique in front of the witness (and the jury). But the rule on introducing "learned treatises" requires proof that the material is authoritative before it can be used for impeachment. 6/ This limitation makes sense because, unlike impeachment by self-contradiction, the out-of-court statement -- what the President's advisors had to say -- has value only to the extent that it is true. Thus, the document is hearsay.

Nevertheless, even without an expert to establish that PCAST is a reliable authority, the report could be admissible over a hearsay objection. It is, after all, a government report. The public records exception to the rule against hearsay extends to "factual findings from a legally authorized investigation," 7/ as long as "neither the source of information nor other circumstances indicate a lack of trustworthiness." 8/ Law enforcement groups have loudly proclaimed that this particular report is not trustworthy, but much of the criticism is more reflexive than reasoned. Very little of it focuses on footwear analysis. 9/

  1. No. 18A-CR-1007, 2019 WL 2094045 (Ind. Ct. App. May 14, 2019) (unreported, available at
  2. The witness is featured in an educational state police YouTube video.
  3. Indiana Rule of Evidence 701 applies to "Opinion Testimony by Lay Witnesses." It provides that "If a witness is not testifying as an expert, testimony in the form of an opinion is limited to one that is: (a) rationally based on the witness's perception; and (b) helpful to a clear understanding of the witness's testimony or to a determination of a fact in issue."
  4. Indeed, in Buchman v. State, 59 Ind. 1, 26 Am.Rep. 75 (Ind. 1877), the Indiana Supreme Court held that a physician could not be compelled to testify to a professional opinion without special compensation. Its opinion used the words "skilled witness" and "expert" as synonyms.
  5. 509 U.S. 579 (1993).
  6. David H. Kaye, David Bernstein & Jennifer L. Mnookin, The New Wigmore on Evidence: Expert Evidence ch. 5 (2d ed. 2011).
  7. Ind. R. Evid. 803(8)(A)(i)(c).
  8. Ind. R. Evid. 803(8)(A)(ii).
  9. The Department of Justice's more thoughtful disagreements with the report's general approach to ascertaining scientific validity are presented in Ted Robert Hunt, Scientific Validity and Error Rates: A Short Response to the PCAST Report, 86 Fordham L. Rev. Online 24 (2018). After the initial frosty reception of its report from prosecutors, police, and forensic practitioners, PCAST requested input from the forensic-science community for a second time. It then issued an addendum to its report. With regard to shoeprints, this document stated that
        In its report, PCAST considered feature-comparison methods for associating a shoeprint with a specific shoe based on randomly acquired characteristics (as opposed to with a class of shoes based on class characteristics). PCAST found no empirical studies whatsoever that establish the scientific validity or reliability of the method.
        The President of the International Association for Identification (IAI), Harold Ruslander, responded to PCAST’s request for further input. He kindly organized a very helpful telephonic meeting with IAI member Lesley Hammer. (Hammer has conducted some of the leading research in the field—including a 2013 paper, cited by PCAST, that studied whether footwear examiners reach similar conclusions when they are presented with evidence in which the identifying features have already been identified.)
        Hammer confirmed that no empirical studies have been published to date that test the ability of examiners to reach correct conclusions about the source of shoeprints based on randomly acquired characteristics. Encouragingly, however, she noted that the first such empirical study is currently being undertaken at the West Virginia University. When completed and published, this study should provide the first actual empirical evidence concerning the validity of footwear examination. The types of samples and comparisons used in the study will define the bounds within which the method can be considered reliable.
    An Addendum to the PCAST Report on Forensic Science in Criminal Courts, Jan. 2017, at 5-6.

Saturday, May 11, 2019

Likelihoods, Paternity Probabilities, and the Presumption of Innocence in People v. Gonis

In People v. Gonis, 1/ Illinois prosecuted Kenneth Gonis for sexual penetration with his daughter, T.G., when she was 16 years old. T.G. had two children. The first, J.G., was born when she was 17 years old; A.G. was born two years later. To investigate the sexual assault charge, the Illinois State Police Joliet laboratory conducted DNA tests of Gonis, T.G, and the two children. The lab sent the results to the Northeastern Illinois Regional Crime Laboratory for interpretation. That laboratory’s DNA technical leader, Kenneth Pfoser, “entered the DNA profiles into a computer containing a statistical calculator.” He learned that
  • “at least 99.9999% of the North American Caucasian/White men would be excluded as being the biological father of [J.G. and A.G.]”;
  • the “paternity index” with respect to J.G. was about 195,000,000 and with respect to A.G., it was 26,000,000; and
  • “the probability that defendant was the biological father of J.G. and A.G. was 99.9999%.”
In a bench trial before Judge Lance Peterson, the court admitted these findings and convicted Gonis. On appeal, Gonis argued that the trial court erred in denying a pretrial motion to exclude the DNA test results. In an opinion written by Justice Daniel Schmidt, Illinois’ intermediate court of appeals described the motion as asserting only that
[T]he tests were inconsistent with the presumption of innocence because a statistical formula used in the testing assumed a prior probability of paternity. Specifically, the motion alleged:
Assuming that the Northeastern Illinois Regional Crime Laboratory tested the DNA sample using widely accepted practices in the scientific community, said testing was conducted using a statistical mathematical formula. These formulae, as their basis, include a component to determine paternity which by its nature ‘assumes’ that sexual intercourse has in fact taken place.
In other words,
The motion alleged that to allow such paternity test results would violate the presumption of innocence because “the state would be allowed to introduce statistical evidence presuming sexual intercourse, in order to prove an act of sexual intercourse.”
The argument is fallacious for three reasons. First, the probability pertains to the chance that the child was conceived by the mother and the accused man. Conception—the fertilization of an ovum—can occur without penetration.  In the hearing on the motion to exclude, the technical leader referred to artificial insemination, but as insemination and hence pregnancy can occur without penetration by natural mechanisms as well.

Second, even if conception were not merely improbable, but impossible without penetration, it would not follow that a probability of paternity presumes penetration. After all, a probability is not a certainty. To say that an electron has a probability Ψ*ΨdV of being located in a small volume dV is not to presume that the electron is actually located there. To say that the probability of an extended trade war between the U.S. and China is 0.5 (or some other number less than 1) does not presume that this event will occur. That the paternity probability for the defendant is 0.5 (or 0.99999, or any other other number less than 1) also does not presume that the defendant truly is the source of the fertilizing spermatazoon.

Finally, the evidentiary aspect of the presumption of innocence merely directs the judge or jury not to use the fact of the indictment as evidence of guilt. The probabilities in question do not change depending on whether or not a man is indicted.

The opinion in Gonis seems to rely on the activity-level possibility of artificial insemination to reject the defendant's presumption-of-innocence objection. It also comes close to recognizing the second rejoinder, for it states that "Logically, since Bayes's Theorem allowed for the possibility that defendant may not be the father of T.G.'s children, it did not assume that defendant necessarily had sexual intercourse with T.G."

But the court thought that the details of Bayes' Theorem rather than the very definition of probability made the computation compatible with the presumption of innocence. The opinion states that
Pfoser testified that Bayes's Theorem was a likelihood ratio based on two competing hypotheses: (1) defendant was the father, or (2) a random, unrelated individual was the father. Pfoser stated that Bayes's Theorem took “the assumed probability that the person in question is the father of the child” and divided it “by the probability that some unrelated person within the same race group in the general population is the father of the child.” Thus, Pfoser's testimony indicated that Bayes's Theorem posited that either defendant or an individual other than defendant could have been the father of T.G.'s children. Logically, since Bayes's Theorem allowed for the possibility that defendant may not be the father of T.G.'s children, it did not assume that defendant necessarily had sexual intercourse with T.G.
Although a likelihood ratio appears in Bayes' rule, that is not all there is to it, and the description of how the rule works is garbled. The probability that the defendant is the father is not obtained by dividing a probability that he is the father by the probability that an unrelated man is the father. If the expert knew the probability that an unrelated man is the father (and no other alternatives to the defendant's paternity were worthy of consideration), Bayes' rule would be surperfluous. The probability not assigned to the random man is the defendant's probability, so if we have the random-man probability, all we need to do is to subtract it from 1. What remains is the defendant's probability.

The technical leader used Bayes' rule because he did not know the probability that a random man was the father. Let’s look at his explanation of the computation, as presented by the appellate court. The court starts by recounting that
Pfoser testified that DNA paternity testing had three components. The first component of the test involved an exclusion analysis where Pfoser entered the DNA profiles into a computer containing a statistical calculator. If there were any inconsistencies between the alleged father and the child, the computer would give a result of “0” for paternity index.
Apparently, each child shared at least one allele per locus with the defendant, so the computer program did not report an approximate probability of zero, 2/ and the opinion continued:
The next stage involved the calculation of the paternity index, which was a formula used to determine “the likelihood that the assumed alleged father in question is in fact a father as opposed to a random individual that's unrelated in the general population.”
If "likelihood" has the technical meaning of statistical "support" for a hypothesis, this statement could be literally true. But if "likelihood" means probability, as the court evidently and understandably thought, then the explanation is either meaningless or misleading. There is no probability that the defendant is the father "as opposed to a random individual that's unrelated." There is a probability that the defendant is the father (as opposed to everyone else in the population, given all the evidence in the case). And, the paternity index is not even a probability, let alone that one. It is a ratio of two different probabilities. As the court wrote, "the paternity index is the ratio of 'the probability of the alleged father transmitting the alleles and the probability of selecting these alleles at random from the gene pool.' ... (quoting Ivey v. Commonwealth, 486 S.W.3d 846, 851 (Ky. 2016) (quoting D.H. Kaye, The Probability of an Ultimate Issue: The Strange Cases of Paternity Testing, 75 Iowa L. Rev 75, 89 (1989))."

The important aspect of the paternity index is that it is a likelihood ratio that expresses the support that the reported DNA profiles of the mother-child-defendant trio provide (if correctly determined) for the hypothesis that the defendant is the biological father relative to the hypothesis that an unrelated man is the father. The idea is that if the profiles are some number L times more probable under one hypothesis than the other, then they support that hypothesis L times more than they support the alternative. This ratio does not assume that one hypothesis is true and the other false. Rather, it treats both hypotheses as equally worthy of consideration and addresses the probability of the evidence when each one is considered. Thus, the use of the ratio to describe the strength of the evidence for the better supported hypothesis does not conflict with the presumption of innocence. Had the expert simply given the paternity index and spoken of relative support, the defendant's objection would have had even less traction that it did.

But the technical leader did not describe the paternity index in this “likelihoodist” way. Instead, to quote from the opinion,
Pfoser testified that the third component of DNA paternity testing converted the paternity index into a probability of paternity percentage using a statistical, mathematical formula called “Bayes' Theorem.” Pfoser explained:
“Bayes' Theorem is essentially a basis for a likelihood ratio. Like I kind of described before, you're basing it on two conflicting hypotheses or two conflicting assumptions. One is that the individual in question is in fact the father as opposed to a completely random unrelated individual could be the father.”
Pfoser further explained:
“[S]o you're taking two, essentially two, calculations, one calculation is * * * the prior probability or the assumed probability that the person in question is the father of the child and that is divided by the probability that some unrelated person within the same race group in the general population is the father of the child.”
And     "Pfonis testified that the prior probability of paternity was set at 50%.”

As the earlier remarks on Bayes' rule indicate, this explanation of Bayes' rule cries out for corrections at every turn, but now I will just focus on the last sentence because the concept of a prior probability is what triggers worries about the presumption of innocence. The idea of a prior probability is intuitive but not easily mapped onto the legal setting. If I want to infer whether a furry animal that I glimpsed running outside my window is a squirrel (as opposed to a groundhog, a rabbit, a chipmunk, a possum, a cat, a skunk, a bear, or any other furry critter in this neck of the woods), I can start by asking how often various creatures go by. Based on my past observations, I would order the possibilities as squirrel, chipmunk, rabbit, groundhog, cat, and so on. If squirrels account for half of the past sightings, I might select 50% for the probability of a squirrel. This is my prior probability.

Now I think about the details of what I saw in the periphery of my vision. How small was it? What color? Did it seem to have short legs? Was it scurrying or hopping? To the extent that the set of characteristics I was able to discern are more probable for squirrels than for other creatures, I should adjust my probability upwards to arrive at my posterior probability.

Bayes’ rule is a prescription for making the adjustment. It instructs us to multiply the prior odds by the likelihood ratio. Then, voilà, the posterior odds emerge. Suppose my likelihood ratio is 3. I think the characteristics I perceived are 3 times as probable when a squirrel zips by than when the average non-squirrel does. 3/ If the prior probability is ½, the prior odds are 1 to 1, and the posterior odds are 3 × 1:1 = 3:1. Odds of 3:1 correspond to a posterior probability of 3/(3+1) = 3/4. Following Bayes’ rule, I moved from a prior probability of ½ to a posterior of 3/4.

The expert in Gonis arrived at his posterior probability of paternity by making up a set of prior odds — he chose 1:1 — for defendant’s paternity and multiplying them by the paternity index. This looks like a Bayesian calculation. 4/ But in the squirrel-sighting case, there was an empirical basis for the prior odds. I know something about the animals in my neighborhood. The DNA technical leader apparently offered no such justification for his choice of the same number. And how could he? His expertise does not extend to the sexual and criminal conduct of the defendant and everyone else in the male population. The judge or jury, not the DNA profiling expert, is supposed to consider the nongenetic evidence in the case and to rely on its general background information in processing the totality of the evidence in the case to reach its best verdict.

In Gonis, trial judge, who was the factfinder in the case, was explicit about why he found the prior probability of ½ to be acceptable:
The court noted that the cases cited by the State explained why “the .5 number presumption that they start off with is actually just a truly neutral number. It assumes the same likelihood that the defendant was not the father of the child as it does that he would be the father of the child.
This rationale is specious. For a Bayesian, starting with a probability of ½ amounts to believing, before learning about the DNA profiles, that the defendant owns half the probability and that the other half is distributed across every else in the population. Maybe the other evidence in the case would justify that belief, but it hardly seems “neutral” toward the defendant. It treats him very differently from every other man in the population. The more “neutral” position might be to assign the same per capita probability to everyone, including the defendant, and then make adjustments according to the specifics of the case.

The appellate court took no stand on whether the trial court’s conception of neutrality was scientifically or legally tenable. Construing the defendant’s objection narrowly, the court did "not reach the issue of whether a 50% prior probability is a neutral number."

A bona fide Bayesian procedure would be to display the posterior probability for many values of the prior probability. This “variable prior odds approach” avoids the need for the expert to tell the judge or jury which prior probability is correct. 5/

That said, the uncontested likelihood ratios in Gonis, as Justice Schmidt observed, would swamp most prior probabilities. Even if we regarded all the men in the Chicago metropolitan area as equally likely, a priori, to have fathered the two children, the posterior odds of paternity still would be substantial. There are fewer than five million men (of all ages) living in the metropolitan area. So the per capita prior odds are 1:5 million. For the likelihood ratios of 195 million and 26 million, the posterior odds would be more than 39:1 for the paternity of J.G. and 5:1 for the paternity of A.G.

  1. 2018 IL App (3d) 160166, No. 3-16-0166, 2018 WL 6582850 (Ill. App. Ct. Dec. 13, 2018).
  2. Particularly at a single locus, an exclusion does not mean that the probability of paternity is strictly zero. Mutations at some of the STR loci are known to occur at nonzero rates.
  3. The phrasing about an "average non-squirrel" is imprecise. There are n+1 mutually exclusive hypotheses H0, H1, H2, ..., Hn, about the animal. Each Hj has a prior probability Pr(Hj) and a likelihood Pr(E|Hj). Let H0 be the squirrel hypothesis. The appropriate factor for the multiplication of the prior odds is the squirrel likelihood Pr(E|H0) divided by a weighted average of the other likelihoods. The weight for each non-squirrel hypothesis Hj (j = 1, .., n) is my prior probability on that hypothesis renormalized to reflect that it is conditional on ~H0. In other words, the Bayes factor is Pr(E|H0) × [1−Pr(H0)] divided by Pr(H1) × Pr(E|H1) + ... + Pr(Hn) × Pr(E|Hn).
  4. By limiting attention to an unrelated man as the only possible alternative, the technical leader was ignoring the terms in the denominator of the Bayes factor for possible related men. See supra note 3. As a result, the Bayesian interpretation he provided was not strictly correct.
  5. For discussions of such proposals and their reception in court and in the scholarly literature, see David H. Kaye, David E. Bernstein & Jennifer Mnookin, The New Wigmore on Evidence: Expert Evidence ch. 15 (2d ed. 2011) (updated annually).
Last updated: 16 May 2019, 1:20 PM