tag:blogger.com,1999:blog-53545677658971358042024-03-13T00:43:24.814-04:00Forensic Science, Statistics & the LawCommentary on news and publications at the intersections of scientific evidence, forensic science, and statistics.DH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.comBlogger406125tag:blogger.com,1999:blog-5354567765897135804.post-5798703943343274142024-01-12T13:56:00.004-05:002024-01-13T00:57:15.405-05:00What's Uniqueness Got to Do with It? <p>Columbia University has announced that "<b>AI Discovers That Not Every Fingerprint Is Unique</b>"! The subtitle of the <a href="https://www.engineering.columbia.edu/news/ai-discovers-not-every-fingerprint-unique" target="_blank">press release</a> of January 10, 2024, boldly claims that</p>
<blockquote>
<div style="background-color: #ffe9ec; border: 1px solid black; padding-bottom: 4px; padding-left: 10px; padding-right: 5px; padding-top: 4px; padding: 4px 5px 4px 10px;">
Columbia engineers have built a new AI that shatters a long-held belief in forensics–that fingerprints from different fingers of the same person are unique. It turns out they are similar, only we’ve been comparing fingerprints the wrong way!
</div>
</blockquote>
<p><i>Forensic Magazine</i> immediately and uncritically rebroadcast (quoting verbatim without acknowledgment from the press release) the confused statements about uniqueness. According to the Columbia release and <i>Forensic Magazine</i>, "It’s a well-accepted fact in the forensics community that fingerprints of different fingers of the same person—or intra-person fingerprints—are unique and therefore unmatchable." <i>Forensics Magazine</i> adds that "Now, a new study shows an AI-based system has learned to correlate a person’s unique fingerprints with a high degree of accuracy."
</p>
<p>Does this mean that the "well-accepted fact" and "long-held belief" in uniqueness been shattered or not? Clearly, not. The study is about similarity, not uniqueness. In fact, uniqueness has essentially nothing to do with it. I can classify equilateral triangles drawn on a flat surface as triangles rather than as other regular polygons whether or not the triangles are each different enough from one another (uniqueness within the set of triangles) that I notice these differences. To say that objects "are unique and therefore unmatchable" is a nonsequitur. A human genome is probably unique to that individual, but forensic geneticists know that six-locus STR profiles are "matchable" to those of other individuals in the population. A cold hit to a person who could not have been the source of the six-locus profile in the U.K. database occurred long ago (as was to be expected for the random-match probabilities of the genotypes).
</p>
<p>Perhaps the myth that <a href="https://www.science.org/doi/10.1126/sciadv.adi0329" target="_blank">the study</a> shatters is that it is impossible to distinguish fingerprints left by <i>different fingers of the same individual</i> X from fingerprints left by <i>fingers of different individuals</i> (not-X). But there is no obvious reason why this would be impossible even if every print is distinguishable from every other print (uniqueness).
</p>
<p>The Columbia press release describes the study design this way:</p>
<blockquote>
[U]ndergraduate senior Gabe Guo ... who had no prior knowledge of forensics, found a public U.S. government database of some 60,000 fingerprints and fed them in pairs into an artificial intelligence-based system known as a deep contrastive network. Sometimes the pairs belonged to the same person (but different fingers), and sometimes they belonged to different people.<br />
<br />Over time, the AI system, which the team designed by modifying a state-of-the-art framework, got better at telling when seemingly unique fingerprints belonged to the same person and when they didn’t. The accuracy for a single pair reached 77%. When multiple pairs were presented, the accuracy shot significantly higher, potentially increasing current forensic efficiency by more than tenfold. </blockquote>
<p>The press release reported the following odd facts about the authors' attempts to publish their study in a scientific journal:</p>
<blockquote>
Once the team verified their results, they quickly sent the findings to a well-established forensics journal, only to receive a rejection a few months later. The anonymous expert reviewer and editor concluded that “It is well known that every fingerprint is unique,” and therefore it would not be possible to detect similarities even if the fingerprints came from the same person.<br />
<br />
The team ... fed their AI system even more data, and the system kept improving. Aware of the forensics community's skepticism, the team opted to submit their manuscript to a more general audience. The paper was rejected again, but [Professor Hod] Lipson ... appealed. “I don’t normally argue editorial decisions, but this finding was too important to ignore,” he said. “If this information tips the balance, then I imagine that cold cases could be revived, and even that innocent people could be acquitted.” ...<br />
<br />
After more back and forth, the paper was finally accepted for publication by <i>Science Advances</i>. ... One of the sticking points was the following question: What alternative information was the AI actually using that has evaded decades of forensic analysis? ... “The AI was not using ... the patterns used in traditional fingerprint comparison,” said Guo ... . “Instead, it was using something else, related to the angles and curvatures of the swirls and loops in the center of the fingerprint.”
</blockquote>
<p>Proprietary fingerprint matching algorithms also do not arrive at matches the way human examiners do. They "see" different features in the patterns and tend to rank the top candidates for true matches in a database trawl differently than the human experts. Again, however, these facts about automated systems neither prove nor disprove claims of uniqueness. And, theoretical uniqueness has little or nothing to do with the actual probative value of assertions of matches by humans, automated systems, or both.
</p>
<p>Although not directly applicable, the day after the publicity on the Guo et al. paper, I came across the following report on "Limitations of AI-based predictive models" in a <a href="https://www.science.org/doi/10.1126/science.adn9412" target="_blank">weekly survey</a> of papers in <i>Science</i>:</p>
<blockquote>
A central promise of artificial intelligence (AI) in health care is that large datasets can be mined to predict and identify the best course of care for future patients. Unfortunately, we do not know how these models would perform on new patients because they are rarely tested prospectively on truly independent patient samples. Chekroud et al. showed that machine learning models routinely achieve perfect performance in one dataset even when that dataset is a large international multisite clinical trial (see the Perspective by Petzschner). However, when that exact model was tested in truly independent clinical trials, performance fell to chance levels. Even when building what should be a more robust model by aggregating across a group of similar multisite trials, subsequent predictive performance remained poor. -- Science p. 164, 10.1126/science.adg8538; see also p. 149, 10.1126/science.adm9218
</blockquote>
<p><span style="font-size: x-small;">Note: This posting was last modified on 1/12/24 2:45 PM </span>
</p>DH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.com0tag:blogger.com,1999:blog-5354567765897135804.post-75157274723956244292023-11-18T21:19:00.003-05:002023-11-19T09:23:16.941-05:00SWGDE's Best Practices for Remote Collection of Digital Evidence from a Networked Computing Environment<p><a href="https://www.swgde.org/documents/published-by-committee/forensics" target="_blank">SWGDE 22-F-003-1.0</a>, Best Practices for Remote Collection of Digital Evidence from a Networked Computing Environment, is a forensic-science standard proposed for inclusion on the Organization of Scientific Area Committees for Forensic Science (OSAC) <a href="https://www.nist.gov/organization-scientific-area-committees-forensic-science/osac-registry" target="_blank">Registry</a>—"a repository of selected published and proposed standards … to promote valid, reliable, and reproducible forensic results.” <br /></p>
<p>The best practices “may not be applicable in all circumstances.” In fact, “[w]hen warranted, an examiner may deviate from these best practices and still obtain reliable, defensible results.” I guess that is why they are called best practices rather than required practices. But what circumstances would justify using anything but the best practices? On this question, the standard is silent. It merely says that “[i]f examiners encounter situations warranting deviation from best practices, they should thoroughly document the specifics of the situation and actions taken.” </p><p>Likewise, the best practices for “preparation” seem rather rudimentary. “Examiners should ascertain the appropriate means of acquiring data from identified networked sources.” No doubt, but how could they ever prepare to collect digital information without ascertaining how to acquire data? What makes a means “appropriate”? All that a digital evidence expert can glean from this document is that he or she “should be aware of the limitations of each acquisition method and consider actions to mitigate these limitations if appropriate” and should consider “methods and limitation variables as they relate to various operating systems.” How does such advice regularize or improve anything?</p>
<p>Same thing with a recommendation that “[p]rior to the acquisition process, examiners should prepare their destination media”? What steps for preparing the destination media are best? Well, [s]terilization of destination media [whatever the process of “sterilization” is in this context] is not generally required.” But it is required “when needed to satisfy administrative or organizational requirements or when a specific analysis process makes it a prudent practice.”
When would sterilization be prudent? The drafters do not seem to be very sure. “[E]xaminers may need to sanitize destination media provided to an external recipient to ensure extraneous data is not disclosed.” Or maybe they don’t? “Examiners may also be required to destroy copies of existing data to comply with legal or regulatory requirements.” Few people would dispute that the best practice is to follow the law, but examiners hardly need best practices documents from standards developing organizations to know that.</p>
<p>The standard is indeterminate when it comes to what it calls “triage”—“preview[ing] the contents of potential data sources prior to acquisition.” We learn that “[e]xaminers may need to preview the contents of potential data sources prior to acquisition” to “reduce the amount of data acquired, avoid acquiring irrelevant information, or comply with restrictions on search authority.” What amount of data makes "triage" a best practice? How does the examiner know that irrelevant information may be present? Why can "triage" sometimes be skipped? When it is desirable and how should it be done? The standard merely observes that “[t]here may be multiple iterations of triage … .” When are multiple iterations advisable? Well, it “depend[s] on the complexity of the investigation.” Equally vague is the truism that “[e]xaminers should use forensically sound processes to conduct triage to the extent possible.” </p><p>Finally, designating steps like “perform acquisition” and “validate collected data” as “best practices” does little to inform examiners of how to collect digital evidence from a network. To be fair, a few parts of the standard are more concrete, and, possibly, other SWGDE standards fill in the blanks. But, on its face, much of this remote acquisition standard simply gestures toward possible best practices. It does not expound them. In this respect, it resembles other forensic-science standards that emerge from forensic-science standards developing organizations only to be criticized as vague at critical points.<br /></p>DH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.com0tag:blogger.com,1999:blog-5354567765897135804.post-25710272849793186882023-11-18T18:33:00.002-05:002023-11-19T11:06:20.442-05:00"Conditions Regarding the Use of SWGDE Documents"<p>SWGDE is the Scientific Working Group on Digital Evidence. Its website describes it as a meta-organization—a group that “brings together organizations actively engaged in the field of digital and multimedia evidence to foster
communication and cooperation as well as to ensure quality and consistency within the forensic community.” Structured as a non-profit corporation, it solicits "your donations or
sponsorship." \1/ Its 70 “member organizations” consist of (by a quick and possibly error-prone categorization and count):
</p>
<ul>
<li>16 local, state, and federal police agencies; \2/</li>
<li>4 digital forensics software companies; \3/</li>
<li>18 training and consulting organizations; \4/ </li>
<li>6 prosecutors' offices; \5/</li>
<li>8 crime laboratories and coroners' or medical examiners' offices; \6/</li>
<li>3 major corporations; \7/</li>
<li>3 universities; \8/</li>
<li>A swath of federal executive agencies (or parts of them), including NASA, NIST, and the Departments of Defense, Homeland Security, Interior, Justice, Labor, and Treasury. \9/</li>
</ul>
<p>SWGDE has produced “countless academic papers,” although none are listed on its website. SWGDE "encourages the use and redistribution of our documents," but it regards them as private property. It states that "The Disclaimer and Redistribution policies (also included in the cover pages to each document) also establish what is considered SWGDE's Intellectual Property."</p>
<p>These policies are unusual, if not unique, among among standards developing organizations. An IP lawyer would find it odd, I think, to read that admonitions such as the following are part of an author's copyright:</p>
<blockquote>Individuals may not misstate and/or over represent [sic] duties and responsibilities of SWGDE work. This includes claiming oneself as a contributing member without actively participating in SWGDE meetings; claiming oneself as an officer of SWGDE without serving as such ... .</blockquote>
<p>With respect to actual IP rights, SWGDE purports to control not only the specific expression of ideas—as allowed by copyright law—but all "information" contained in its documents—a claim that far exceeds the scope of copyright. It imposes the following "condition to the use of this document (and the information contained herein) in any judicial, administrative, legislative, or other adjudicatory proceeding in the United States or elsewhere":</p>
<blockquote>notification by e-mail before or contemporaneous to the introduction of this document, or any portion thereof, as a marked exhibit offered for or moved into evidence in such proceeding. The notification should include: 1) The formal name of the proceeding, including docket number or similar identifier; 2) the name and location of the body conducting the hearing or proceeding; and 3) the name, mailing address (if available) and contact information of the party offering or moving the document into evidence. Subsequent to the use of this document in the proceeding please notify SWGDE as to the outcome of the matter.</blockquote>
<p>As author (or otherwise), an SDO certainly can ask readers to do anything it would like them to do with its publications—and the SWGDE "conditions regarding use" do contain the phrase "the SWGDE requests." Even reformulating the paragraph as a polite request rather than a demand supposedly supported by copyright law, however, one might ask what legislative proceeding with a "formal name" would have a forensic-science standard "offered or moved into evidence." Impeachment and subsequent trial, I guess.</p>
<p><b>Notes</b></p>
<ol>
<li><span style="font-size: x-small;">Neither its full name nor its acronym turned up in a search of the IRS list of tax-exempt 501(c)(3) organizations, so donors seeking a charitable deduction on their taxes might need to inquire further.</span></li>
<li><span style="font-size: x-small;">As listed on the website, they are the Columbus, Ohio Police Department; Eugene Police Department; Florida Department of Law Enforcement (FDLE); Lawrence, KS Police Department; Johnson County, KS Sheriff's Office; Los Angeles County, CA Sheriff's Department; Louisville, KY Metro Police Department; Massachusetts State Police; Oklahoma State Bureau of Investigation; New York State Police; New York City Police Department (NYPD); Plano, TX Police Department; Seattle Police Department; Weld County, CO Sheriff's Office; US Department of Justice - Federal Bureau of Investigation (FBI); US Department of Homeland Security - US Secret Service (USSS); and the US Postal Inspection Service (USPIS).</span></li>
<li><span style="font-size: x-small;">Amped Software USA Inc.; AVPreserve; BlackRainbow; SecurCube.</span></li>
<li><span style="font-size: x-small;">National White Collar Crime Center (NW3C); Digital Forensics.US LLC / Veritek Cyber Solutions; MetrTech Consultancy; Midwest Forensic Consultants LLC; Hexordia; Forensic Data Corp; Forensic Video & Audio Associates, Inc; Laggui And Associates, Inc.; Loehrs Forensics; N1 Discovery; Precision Digital Forensics, Inc. (PDFI); Premier Cellular Mapping & Analytics; Primeau Forensics, Recorded Evidence Solutions, LLC; AVPreserve; LTD; BEK TEK; TransPerfect Legal Solutions; VTO Labs; Unique Wire, Inc</span></li>
<li><span style="font-size: x-small;">Adams County, CO District Attorney's Office; Burlington County, NJ Prosecutor's Office; Dallas County, TX District Attorneys Office; Middlesex County, NJ Prosecutor's Office; State of
Wisconsin Department of Justice; US Department of Justice - Executive Office
United States Attorney Generals Office.</span></li>
<li><span style="font-size: x-small;">City of Phoenix, AZ Crime Lab; Houston Forensic Science Center; Boulder County Coroner's Office; Miami-Dade County, FL; Medical Examiner Department; Virginia Department of Forensic Science; Westchester County, NY Forensic Lab; North Carolina State Crime Laboratory; and the US Department of Defense - Army Criminal Investigation Laboratory (Army CID).</span></li>
<li><span style="font-size: x-small;">Carrier Corporation; Target Corporation; and Walmart Stores Inc.</span></li>
<li><span style="font-size: x-small;">San Jose State University; University of Colorado Denver - National Center for Media Forensics (NCMF); University of Wisconsin Stevens Point.</span></li>
<li><span style="font-size: x-small;">NASA Office of Inspector General - Computer Crimes Division; National Institute of Standards and Technology; Treasury Inspector General for Tax Administration; US Department of Defense - Defense Cyber Crimes Center (DC3); US Department of Homeland Security - Homeland Security Investigations (HSI); US Department of Justice - Office of the Inspector General (DOJ OIG); US Department of Labor - Office of Inspector General (DOL OIG); US Department of the Interior - Office of the Inspector General (DOI OIG); US Department of Treasury - Internal Revenue Service (IRS); US Postal Service - Office of Inspector General (Postal OIG). Yet another organizational member is the Puerto Rico Office of the Comptroller, Division of Database Analysis, Digital Forensic and Technological Development.</span></li>
</ol>
DH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.com0tag:blogger.com,1999:blog-5354567765897135804.post-18722178695186006832023-09-27T09:21:00.003-04:002023-09-28T13:32:59.157-04:00How Accurate Is Mass Spectrometry in Forensic Toxicology?<p>Mass spectrometry (MS) is the "[s]tudy of matter through the formation of gas-phase ions
that are characterized using mass spectrometers by their mass, charge,
structure, and/or physicochemical properties." ANSI-ASB Standard 098 for Mass Spectral Analysis in Forensic Toxicology § 3.11 (2023). MS has become "the preferred technique for the confirmation of drugs, drug metabolites, relevant xenobiotics, and endogenous analytes in forensic toxicology." Id. at Foreword.
</p>
<p>
But no "criteria for the acceptance of mass spectrometry data have been ... universally applied by practicing forensic toxicologists." Id. Therefore, the American Academy of Forensic Sciences' Academy Standards Board (ASB) promulgated a "consensus based forensic standard[] within a framework accredited by the American National Standards Institute (ANSI)," id., that provides "minimum requirements." Id. § 1.
</p>
<p>
To a nonexpert reader (like me), the minimum criteria for the accuracy of MS "confirmation" are not apparent. Consider Section 4.2.1 on "Full-Scan Acquisition using a Single-Stage Low-Resolution Mass Analyzer." It begins with the formal requirement that
</p>
<blockquote>
[T]he following shall be met when using a single-stage low-resolution mass analyzer in full-scan mode.
<br />a) A minimum of a single diagnostic ion shall be monitored.
</blockquote>
<p>
It is hard to imagine an MS test method that would not meet the single-ion minimum. Perhaps what makes this requirement meaningful is that the one or more ions must be "diagnostic." However, this adjective begs the question of what the minimum requirement for diagnositicity should be. A "diagnostic ion" is a "molecular ion or fragment ion whose presence and relative abundance are characteristic of the targeted analyte." Id. § 3.4. So what makes an ion "characteristic"? Must it always be present (in some relative abundance) when the "targeted analyte" is in the specimen (at or above some limit of detection)? That would make the ion a marker for the analyte with perfect sensitivity: Pr(ion|analyte) = 1. Even so, it would not be characteristic of the analyte unless its presence is highly specific, that is, unless Pr(no-such-ion|something-else) ≅ 1. But the standard contains no minimum values for sensitivity, specificity, or the likelihood ratio Pr(ion|analyte) / Pr(ion|something-else), which quantifies the positive diagnostic value of a binary test. \1/
</p>
<p>This is not to say that there are no minimum requirements in the standard. There certainly are. For example, Section 4.2.1 continues:
</p>
<blockquote>
b) When monitoring more than one diagnostic ion:<br />
1. ratios of diagnostic ions shall agree with those calculated from a concurrently analyzed
reference material given the tolerances shown in Table 1; OR<br />
2. the spectrum shall be compared using an appropriate library search and be above a pre-defined match factor as demonstrated through method validation.
</blockquote>
<p>
But the standard does not explain how the tolerances in Table 1 were determined. What are the conditional error probabilities that they produce?
</p>
<p>
Likewise, establishing a critical value for the "match factor" \2/ before using it is essential to a frequentist decision rule, but what are the operating characteristics of the rule? "Method validation" is governed (to the extent that voluntary standards govern anything) by ANSI-ASB 036, <a href="https://www.aafs.org/sites/default/files/media/documents/036_Std_e1.pdf" target="_blank">Standard Practices for Method Validation in Forensic Toxicology</a> (2019). This standard requires testing to establish that a method is "fit for purpose," but it gives no accuracy rates that would fulfill this vague directive.
</p>
<p>
Firms that sell antibody test kits for detecting Covid-19 infections no longer can sell whatever they deem is fit for purpose. In May 2020, the FDA stopped issuing emergency use permits for these diagnostic tests without validation showing that they "are 90% 'sensitive,' or able to detect coronavirus antibodies, and 95% 'specific,' or able to avoid false positive results." \3/ Forensic toxicologists do not seem to have proposed such minimum requirements for MS tests.
</p>
<p>
NOTES
</p>
<ol>
<li>Other toxicology standards refer to ASB 098 as if it indicates what it required to apply the label "diagnostic." ANSI/ASB 113, Standard for Identification Criteria in Forensic Toxicology, § 4.5.2 (2023) ("All precursor and product ions are required to be diagnostic per ASB Standard 098, Standard for Mass Spectral Data Acceptance in Forensic Toxicology (2022).").
</li>
<li>Section 3.13 defines "match factor" as a "mathematical value [a scalar?] that indicates the degree of similarity between an unknown spectrum and a reference spectrum."
</li>
<li><i>See</i> <a href="http://for-sci-law.blogspot.com/2020/05/how-do-forensic-science-tests-compare.html" target="_blank">How Do Forensic-science Tests Compare to Emergency COVID-19 Tests?</a>, Forensic Sci., Stat. & L., May 5, 2020 (quoting Thomas M. Burton, FDA Sets Standards for Coronavirus Antibody Tests in Crackdown on Fraud, Wall Street J., Updated May 4, 2020 8:24 pm ET, https://www.wsj.com/articles/fda-sets-standards-for-coronavirus-antibody-tests-in-crackdown-on-fraud-11588605373).
</li>
</ol>DH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.com0tag:blogger.com,1999:blog-5354567765897135804.post-63300353654408596942023-09-18T18:13:00.002-04:002023-09-18T18:13:21.464-04:00Use with Caution: NIJ's Training Course in Population Genetics and Statistics for Forensic Analysts<p>The National Institute of Justice (<a href="https://nij.ojp.gov/about-nij" target="_blank">NIJ</a>) "is the research, development and evaluation agency of the U.S. Department of Justice . . . dedicated to improving knowledge and understanding of crime and justice issues through science." It offers a series of webpages and video recordings (a "training course") on <a href="https://nij.ojp.gov/events/population-genetics-and-statistics" target="_blank">Population Genetics and Statistics for Forensic Analysts.</a> The course should be approached with caution. I have not worked through all the pages and videos, but here are a few things that rang alarm bells:</p>
<table>
<tbody>
<tr>
<td colspan="2"><hr /></td>
</tr>
<tr>
<th>
NIJ's Training
</th>
<th>
Comment
</th>
</tr>
<tr>
<td colspan="2"><hr /></td>
</tr>
<tr>
<td style="padding: 8px; vertical-align: top;">
Many statisticians have employed what is known as Bayesian probability ... which is based on probability as a measure of one's degree of belief. This type of probability is conditional in that the outcome is based on knowing information about other circumstances and is derived from Bayes Theorem.
</td>
<td style="padding: 8px; vertical-align: top;">
Bayes' rule applies to both objective and subjective probabilities. Both types of probability include conditional probabilities. The "type of probability" is not derived from Bayes' Theorem.
</td>
</tr>
<tr>
<td colspan="2"><hr /></td>
</tr>
<tr>
<td style="padding: 8px; vertical-align: top;">
Conditional probability, by definition, is the probability P of an event A given that an event B has occurred. ... Take the example of a die with six sides. If one was to throw the die, the probability of it landing on any one side would be 1/6. This probability, however, assumes that the die is not weighted or rigged in any way, and that all of the sides contain a different number. If this were not true, then the probability would be conditional and dependent on these other factors.
</td>
<td style="padding: 8px; vertical-align: top;">
The "other factors" are nothing more than part of the description of the experiment whose outcomes are the events that are observed. They are not conditioning events in a sample space.
</td>
</tr>
<tr>
<td colspan="2"><hr /></td>
</tr>
<tr>
<td style="padding: 8px; vertical-align: top;">
The following equation can be used to determine the probability of the evidence given that a presumed individual is the contributor rather than a random individual in the population: LR = P(E/H<sub>1</sub>) / P(E/H<sub>0</sub>) ... . In the case of a single source sample, the hypothesis for the numerator (the suspect is the source of the DNA) is a given, and thus reduces to 1. This reduces to: LR = 1/ P(E/H<sub>0</sub>) which is simply 1/P, where P is the genotype frequency.
</td>
<td style="padding: 8px; vertical-align: top;">
The hypothesis for the numerator of a likelihood ratio is always "a given"--that is, it goes on the right-hand-side of the expression for a conditional probability. So is the hypothesis in the denominator. Neither probability "reduces to 1" for that reason. Only if the "evidence" is the true genotype in both the recovered sample and the sample from the defendant can it be said that P(E|H<sub>1</sub>) = 1. In other words, to say that the probability of a reported match is 1 if the defendant is the source treats the probability of laboratory error as zero. That may be acceptable as a simplifying assumption, but the assumption should be made visible in a training course.
</td>
</tr>
<tr>
<td colspan="2"><hr /></td>
</tr>
<tr>
<td style="padding: 8px; vertical-align: top;">
Although likelihood ratios can be used for determining the significance of single source crime stains, they are more commonly used in mixture interpretation. ... The use of any formula for mixture interpretation should only be applied to cases in which the analyst can reasonably assume "that all contributors to the mixed profile are unrelated to each other, and that allelic dropout has no practical impact."
</td>
<td style="padding: 8px; vertical-align: top;">
This limitation does not apply to modern probabilistic genotyping software!
</td>
</tr>
<tr>
<td colspan="2"><hr /></td>
</tr>
</tbody></table>
DH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.com0tag:blogger.com,1999:blog-5354567765897135804.post-55873335601087382182023-09-18T10:34:00.001-04:002023-09-18T18:15:32.949-04:00Is "Match Form" Testimony Poor Form?<p>
The likelihood ratio (LR) is essentially a number that expresses how many times more probable the <i>data</i> from an experiment are if one <i>hypothesis</i> is true than if another hypothesis is true. For example, suppose we make a single measurement of the height of a known individual. Then we do the same for an individual who is covered from head to foot by a sheet. We want know if we have measured the same individual twice or two different individuals once. The closer the two measured heights are to one another, the more the measurements support the same-source hypothesis as opposed to the different-source hypothesis.
</p>
<p>
Why? Because closer measurements are more probable for same-source pairs than for different-source pairs. This implies that in repeated experiments with some proportion of same-source and different-source pairs, the closer measurements will tend to filter out the different-source pairs (which tend to have more distance between the two measurements) and to include more same-source pairs (which tend to be marked by the more similar measurements).
</p>
<p>
By quantifying the relative probability for the data given each hypothesis, the LR indicates how well a given degree of similarity discriminates between the hypotheses. Its value is
</p>
<p style="text-align: center;">LR = Probability(data | H<sub>1</sub>) / Probability(data | H<sub>2</sub>),
</p>
<p>where H<sub>1</sub> is the same-source hypothesis and H<sub>2</sub> is the different-source hypothesis.
</p>
<p>Likelihood ratios are routinely reported in cases with samples from crime scenes or victims that contain DNA from several individuals. A DNA analyst might testify that the electropherograms are ten thousand times more probable if the defendant's DNA is present than if an unrelated person's DNA is there. \1/ We may call such statements "relative-probability-of-the-data" testimony.
</p>
<p>But some DNA experts prefer what they call a "match form" for the presentation. \2/ An example of a "match form" statement is that “[a] match between the shoes … and [the defendant] is 9.67 thousand times more probable than a coincidental match to an unrelated African-American person.” \3/ More generally, a match-form presentation states that “a match between the evidence and reference [samples] is (some number) times more probable than coincidence.” \4/
</p>
<p>This formulation has been criticized as highly misleading. According to William Thompson, it is
</p>
<p style="margin-left: 40px; text-align: left;">likely to mislead lay people and foster misunderstandings that are detrimental to people accused of a crime. I recommend that Cybergenetics immediately cease using this misleading language and find a better way to explain its findings. Standards development organizations such as OSAC should consider developing standards that address the appropriateness, or inappropriateness, of such presentations. Courts
should refuse to admit PG [probabilistic genotyping] evidence when it is mischaracterized in this manner. Lawyers involved in cases in which defendants were convicted based on this misleading language should consider the appropriateness of appellate remedies. \5/
</p>
<p>
The main concern is that juxtaposing “match” and “coincidence” will lead judges and jurors to think that the "match statistic" pertains to the probabilities of hypotheses (H<sub>1</sub> and H<sub>2</sub>) about the source of the DNA rather than probabilities about the laboratory’s data. In simpler terms, the concern is that most people will understand "coincidence" and "coincidental match" as an assertion that the observed match is the result of coincidence; moreover, they will think that "match" is an assertion that the defendant is the matcher. If that happens, then the assertion that a match is 10,000 times more likely than coincidence would be (mis)understood as a statement that the odds against a coincidence having occurred are 10,000 to 1.
</p>
<p>
Instead, LR = 10,000 should be understood (according to Bayes' rule) as a statement about <i>the change</i> in the odds that defendant, as opposed to some unknown, unrelated person, is the matcher. For example, if defendant has a strong alibi—strong enough, in conjunction with other evidence, to establish that the prior odds of H<sub>1</sub> as opposed to H<sub>2</sub> are only 1 to 5,000—then this LR raises the odds to 10,000 x 1:5,000 = 2:1. Such final odds are far from overwhelming.
</p>
<p>
Cybergenetics does not seems disposed to abandon "match form" testimony. Dr. Thompson claims that for fingerprint comparisons, "'[m]atch' is shorthand for source identification, [s]o, it is predictable that many lay people will interpret the term 'match,' when used to describe DNA evidence, to mean that the person of interest has been identified either definitively or with a high degree of certainty as a contributor." Pointing to a dictionary, Cybergenetics angrily responds that this is just "Thompson’s private language." \6/ But a tradition in forensic science is to equate a "match" with an identification, as shown by the title of articles such as "Is a Match Really a Match? A Primer on the Procedures and Validity of Firearm and Toolmark Identification." \7/ In popular culture, the term may have a similar connotation. Perhaps <a href="https://www.youtube.com/watch?v=ScmJvmzDcG0" target="_blank">Youtube</a> trumps Merriam-Webster. \8/
</p>
<p>
As far as I know, no studies compare the comprehensibility of relative-probability-of-the-data testimony to match-form testimony. Therefore, the law and the practice has to be guided by intuition. My sense is that avoiding the transposition of the probabilities in a likelihood ratio requires special care if the match-versus-coincidence approach is used. The witness must explain not only that a "DNA match" is merely a degree of similarity between the electropherograms being compared, but also that "coincidence" or "coincidental match" is shorthand for the proposition that the "match" is a match to an unrelated person (or other specified source)—<i>and that it is not a conclusion that a coincidence has occurred</i>. The phrase "coincidental match" is too ambiguous to be left undefined.
</p>
<p>
In short, I am not sure that an absolute rule against match-form testimony is necessary, but I see no clear benefit to the phraseology. Relative-probability-of-the-data testimony seems to be a more straightforward description of a DNA likelihood ratio. However, it too needs explanation to reduce the risk of blindly transposing the conditional probabilities for the data into conditional probabilities for the hypotheses. Cases announcing that a likelihood ratio is a ratio of source-hypothesis probabilities are legion. \9/
</p>
<p>
<b>Notes</b>
</p>
<ol>
<li>
<i>Cf</i>. Commonwealth v. McClellan, 178 A.3d 874 (Pa. Super. Ct. 2018) ("[I]t was determined that the DNA sample taken from the gun's grip was at least 384 times more probable if the sample originated from Appellant and two unknown, unrelated individuals than if it originated from a relative to Appellant and two unknown, unrelated individuals").
</li>
<li>
Mark Perlin, Explaining the Likelihood Ratio in DNA Mixture Interpretation, <i>in</i> Proceedings of Promega's Twenty First International Symposium on Human Identification at 7 (Dec. 29, 2010); <i>cf</i>. Mark W. Perlin, Joseph B. Kadane & Robin W. Cotton, Match Likelihood Ratio for Uncertain Genotypes, 8 Law, Probability & Risk 289 (2009), https://doi.org/10.1093.
</li>
<li>
United States v. Anderson, No. 4:21-CR-00204, 2023 WL 3510823, at *3 (M.D. Pa. Apr. 26, 2023). For additional instances of “match form” testimony or reporting, see Howell v. Schweitzer, No. 1:20-cv-2853, 2023 WL 1785530 (N.D. Ohio Jan. 11, 2023); Sanford v. Russell, No. 17-13062, 2021 WL 1186495 (E.D. Mich. Mar. 30, 2021); State v. Anthony, 266 So.3d 415 (La. Ct. App. 2019).
</li>
<li>
Mark W. Perlin et al., TrueAllele Casework on Virginia DNA Mixture Evidence: Computer and Manual Interpretation in 72 Reported Criminal Cases, 9 PLOS ONE e92837, at 8 (2014).
</li>
<li>
William C. Thompson, Uncertainty in Probabilistic Genotyping of Low Template DNA: A Case Study Comparing STRMix™ and TrueAllele™, 68 J. Forensic Sci. 1049, 1059 (2023), doi:10.1111/1556-4029.15225.
</li>
<li>
Mark W. Perlin et al., Reporting Exclusionary Results on Complex DNA Evidence, A Case Report Response to 'Uncertainty in Probabilistic Genotyping of Low Template DNA: A Case Study Comparing Strmix™ and Trueallele®' Software 31 (May 18, 2023), available at SSRN: <a href="https://ssrn.com/abstract=4449313">https://ssrn.com/abstract=4449313</a> or <a href="http://dx.doi.org/10.2139/ssrn.4449313" target="_blank">http://dx.doi.org/10.2139/ssrn.4449313</a>.
</li>
<li>
Stephen G. Bunch et al., Is a Match Really a Match? A Primer on the Procedures and Validity of Firearm and Toolmark Identification, 11 Forensic Science Communications, No. 3 (2009), <a href="https://archives.fbi.gov/archives/about-us/lab/forensic-science-communications/fsc/july2009/review/2009_07_review01.htm" target="_blank">https://archives.fbi.gov/archives/about-us/lab/forensic-science-communications/fsc/july2009/review/2009_07_review01.htm</a>.
</li>
<li>
In addition, a dictionary definition of "match" (<a href="https://www.merriam-webster.com/dictionary/match">https://www.merriam-webster.com/dictionary/match</a>) is "a pair suitably associated." Suitable association suggests that a hypothesis about the nature of the association is true.
</li>
<li>
<i>E.g.</i>, State v. Pickett, 246 A.3d 279 (N.J. App. 2021) (The "likelihood ratio [is] a statistic measuring the probability that a given individual was a contributor to the sample against the probability that another, unrelated individual was the contributor.") (citing Justice Ming W. Chin et al., Forensic DNA Evidence § 5.5 (2020)).
</li>
</ol>DH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.com0tag:blogger.com,1999:blog-5354567765897135804.post-12537856430619038952023-07-07T15:45:00.005-04:002023-07-13T11:16:29.396-04:00No "Daubert Hearing" on Latent Fingerprint Matching in US v. Ware<p>Last month, in <i>United States v. Ware</i>, 69 F.4th 830 (11th Cir. 2023), the U.S. Court of Appeals for the <a href=" https://www.ca11.uscourts.gov/about-court " target="_blank">Eleventh Circuit</a> "carefully review[ed]" the convictions of Dravion Sanchez Ware arising out of a month-long crime spree near Atlanta, in 2017. He was found to have participated "in robbing ... three spas, four massage parlors, a nail salon, and a restaurant." The opinion recounts the nine brutal robberies in luxuriant detail. It also discusses Mr. Ware's argument that the district court erred "by not holding a formal <i>Daubert</i> hearing before admitting expert fingerprint evidence."
</p>
<p>
In a word, the Eleventh Circuit rejected the argument as "unpersuasive." No surprise there. More surprising is the opinion's incoherent discussion of the 2009 NRC report on forensic science and the 2016 PCAST follow-up report. \1/ On the one hand, we are told that "[t]he science could not possibly have been so unreliable as to be inadmissible." On the other hand, "[t]he District Court here could have held a <i>Daubert</i> hearing to assess the relatively new reports Ware presented." So which is it? If a type of evidence cannot possibly be excluded as scientifically invalid under <i>Daubert</i>, how can it be proper to hold a pretrial testimonial hearing on admissibility under <i>Daubert</i>? And, was the court of appeals correct in concluding that the two reports do not impeach, to the point of requiring a hearing, the traditional practice of admitting latent fingerprint comparisons?
</p>
<p>
During <i>Ware</i>'s trial, an unnamed "crime lab scientist with the Georgia Bureau of Investigation Division of Forensic Sciences" "outlined the science behind fingerprints themselves, including their uniqueness" and explained the four-step process the lab follows ... : 'Analysis, Comparison, Evaluation, and Verification,' or ACEV.” The last step "involves another examiner completing the whole process a second time." The opinion does not indicate whether the verifying analyst is blinded to the knowledge of the main examiner's finding. Interestingly as well (think Confrontation Clause), the opinion implies that the testifying expert in <i>Ware</i> was not the main examiner. "[S]he was the verifying examiner," and "she testified that the lab concluded the latent print ... led to an identification conclusion matched to Ware's left middle finger." After that,
</p>
<blockquote>
Defense counsel specifically asked about the PCAST report [and] vigorously cross-examined ... discussing the possibility of a latent fingerprint not being usable ... , the subjectiveness of every step ... , and the bias that may creep into the verification process ... . The expert and defense counsel discussed ... the potential for false positives and negatives. On cross, the defense also attacked the expert's claim that she did not know of the Georgia Bureau of Investigation ever misidentifying someone with a fingerprint comparison, and that she did not know the rate at which a verifier disagrees with the original assessment.
</blockquote>
<p>
To preclude such testimony about his unique fingerprint on an item stolen in one of the robberies, Ware had moved before the trial for an order excluding fingerprint-comparison evidence. Of course, such a ruling would have been extraordinary, but the defense contended that the 2009 and the 2016 reports required nothing less \2/ and asked for a full-fledged pretrial hearing on the matter. In response, "[t]he District Court conditionally denied the motion ... unless Ware's counsel could produce before trial a case from this Court or a district court in this Circuit that favors excluding fingerprint expert evidence under <i>Daubert</i>." \3/</p>
<p>
The court of appeals correctly observed that "[f]ingerprint comparison has long been accepted as a field worthy of expert opinions in this Circuit, as well as in almost every one of our sister circuits." The only problem is that all the opinions cited to show this solid wall of precedent predate the NRC or the PCAST reports. A more complete analysis has to establish that the scientists' reviews of friction-ridge pattern matching do not raise enough of a doubt to expect that a hearing would let the defense breach the wall. </p><p>Along these lines, the court of appeals wrote that
</p>
<blockquote>
<div style="background-color: #ffe9ec; border-radius: 10px; border: 1px solid black; padding-bottom: 4px; padding-left: 10px; padding-right: 5px; padding-top: 4px; padding: 4px 5px 4px 10px;">
The [District] Court considered the reports and arguments presented and found that fingerprint evidence was reliable enough as a general matter to be presented to the jury. Many of the critiques of fingerprint evidence found in the PCAST report go to the weight that ought to be given fingerprint analysis, not to the legitimacy of the practice as a whole. Appellant Br. at 25 (“The studies collectively demonstrate that many examiners can, under <i>some</i> circumstances, produce correct answers at <i>some</i> level of accuracy.” (emphasis in original)).
</div>
</blockquote>
<p>This quotation from the PCAST report is faint praise. Although the court of appeals was sure that "Ware's contrary authority even says that fingerprint evidence can be reliable," the depth of its knowledge about the PCAST (and the earlier NRC committee) reports is open to question. The circuit court had trouble keeping track of the names of the groups. It transformed the National Research Council (the operating arm of the National Academies of Science, Engineering, and Medicine) into a "United States National Resource Council" (69 F.4th at 840) and then imagined an "NCAST report[]" (id. at 848). \4/
</p>
<p>
Deeper inspection of "Ware's contrary authority" is in order. The 2009 NRC committee report quoted with approval the searing conclusion of Haber & Haber that “[w]e have reviewed available scientific evidence of the validity of the ACE-V method and found none.” It reiterated the Habers' extreme claim that because "the standards upon which the method’s conclusions rest have not been specified quantitatively ... the validity of the ACE-V method cannot be tested." To be sure, the committee agreed that fingerprint examiners had something going for them. It wrote that "more research is needed regarding the discriminating value of the various ridge formations [to] provide examiners with a solid basis for the intuitive knowledge they have gained through experience." But does "intuitive knowledge" qualify as "scientific knowledge" under <i>Daubert</i>? Is a suggestion that friction-ridge comparisons need a more solid basis equal to a statement that the comparisons are "reliable" within the meaning of that opinion? The response to "NCAST" was underwhelming.
</p>
<p>
But research has progressed since 2009. The second "contrary authority," the PCAST report, reviewed this research. At first glance, this report supports the court's conclusion that no hearing was necessary. It assures courts that "latent fingerprint analysis is a foundationally valid subjective methodology." In doing so, it rejects the NRC committee's notion that the absence of quantitative match rules precludes testing whether examiners can reach valid conclusions. It discusses two so-called black-box studies of the work of examiners operating in the "intuitive" mode. Yet, the <i>Ware</i> court does not cite or quote the boxed and highlighted finding (Number 5).
</p>
<p>
Perhaps the omission reflects the fact that the PCAST finding is so guarded. PCAST added that "additional black-box studies are needed to clarify the reliability of the method," undercutting the initial assurance, which was "[b]ased largely on two ... studies." Furthermore, according to PCAST, to be "scientifically valid," latent-print identifications must be accompanied by admissions that "false positive rates" could be very high (greater than 1 in 18). \5/</p><p>The <i>Ware</i> court transforms all of this into a blanket and bland assertion that the report establishes reliability even though it "may cast doubt on the error rate of fingerprint analysis and comparison." The latter concern, it says, goes not to admissibility, but only to "weight" or "credibility." </p><p>Can it really be this simple? Are not "error rates" an explicit factor affecting admissibility (as well as weight) under <i>Daubert</i>? Certainly, the Eleventh Cicuit's view that the problems with fingerprint comparisons articulated in the two scientific reports are not profound enough to force a wave of pretrial hearings is defensible, but the court's explanation of its position in <i>Ware</i> is sketchy.
</p>
<p>
At bottom, the problem with the fingerprint evidence introduced against Ware (as best as one can tell from the opinion) is not that it is speculative or valueless. The difficulty is that the judgments are presented as if they were <i>scientific</i> truths. The <i>Ware</i> court is satisfied because "Defense counsel put the Government's expert through his paces during cross-examination, and counsel specifically asked the expert about the findings in the PCAST report." But would it be better to moderate the presentations to avoid overclaiming in the first place? </p><p>The impending amendment to Rule 702 of the Federal Rules of Evidence is supposed to encourage this kind of "gatekeeping." Defense counsel might be more successful in constraining overreaching experts than in excluding them altogether. That too should be part of the "considerable leeway" granted to district courts seeking to reconcile expert testimony.with modern scientific knowledge.
</p>
<p>
Notes
</p>
<ol>
<li>President's Council of Advisors on Sci. & Tech., Exec. Office of the President, <a href="https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/PCAST/pcast_forensic_science_report_final.pdf" target="_blank">Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods</a> (2016), [https://perma.cc/R76Y-7VU]
</li>
<li>
In <i>Ware</i>
<blockquote>
The pretrial motion to exclude the fingerprint identification "relied on the 2009 United States National Resource Counsel (“NRC”) report and subsequent 2016 President's Counsel of Advisors on Science and Technology (“PCAST”) report, which supposedly revealed a dearth of "proper scientific studies of fingerprint comparison evidence" and claimed that "there is no scientific basis for concluding a fingerprint was left by a specific person," positing that "because fingerprint analysis involves individual human judgement, the resulting [fingerprint comparison] conclusion can be influenced by cognitive bias."
</blockquote>
</li>
<li>
Why insist on a pre-existing determination in one particular geographic region that scientific validity is lacking in order to grant a hearing on whether scientific validity is present? Is the "science" underlying fingerprint comparisons different in Georgia and the other southeastern states comprising the 11th Circuit different from that in the rest of the country?
</li>
<li>OK, these peccadillos are not substantive, but one would have thought that three circuit court judges, after "carefully reviewing the record," could have gotten the names and acronyms straight. <a href="https://en.wikipedia.org/wiki/Gerald_Bard_Tjoflat" target="_blank">Senior Judge Gerald Tjoflat </a> wrote the panel opinion. At one point, he was a serious contender for the Supreme Court seat filled by Justice Anthony Kennedy. After Judge Tjoflat announced that he would retire to senior status on the bench in 2019, President Donald Trump nomined <a href="https://en.wikipedia.org/wiki/Robert_J._Luck" target="_blank">Robert J. Luck</a> to the court. In addition to Judge Luck, Judge <a href="https://en.wikipedia.org/wiki/Kevin_Newsom" target="_blank">Kevin C. Newsom</a>, a 2017 appointee of President Trump was on the panel. Judicial politics being what it is, over 30 senators voted against the confirmation of Judges Newsom and Luck. </li>
<li>PCAST suggested that if a court agreed that what it called "foundational validity" were present, then to achieve "validity as applied" some very specific statements about "error rates" would be required:
<blockquote>
Overall, it would be appropriate to inform jurors that (1) only two properly designed studies of the accuracy of latent fingerprint analysis have been conducted and (2) these studies found false positive rates that could be as high as 1 in 306 in one study and 1 in 18 in the other study. This would appropriately inform jurors that errors occur at detectable frequencies, allowing them to weigh the probative value of the evidence.
</blockquote>
The studies actually found conditional false-positive proportions of 6/3628 (0.17%) and 42/995 (4.2%, or 7/960 = 1.4% if one discards "clerical errors.") (P. 98, tbl. 1). Earlier postings discuss these FBI-Noblis and Miami Dade police department numbers.</li>
</ol>
<p></p>DH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.com0tag:blogger.com,1999:blog-5354567765897135804.post-2839680775646083132023-06-24T21:40:00.007-04:002023-06-29T07:14:37.340-04:00Maryland Supreme Court Resists "Unqualified" Firearms-toolmark Testimony<p>
This week, the Maryland Supreme Court became the first state supreme court to hold, unequivocally, that a firearms-toolmark examiner may not testify that a bullet was fired from a particular gun without a disclaimer indicating that source attribution is not a scientific or practical certainty. \1/ The opinion followed two trials, \2/ two evidentiary hearings (one on general scientific acceptance \3/ and one on the scientific validity of firearms-toolmark identifications \4/) and affidavits from experts in research methods or statistics. The Maryland court did not discuss the content of the required disclaimer. It merely demanded that the qualified expert's opinion not be "unqualified." In addition, the opinions are limited to source attributions via the traditional procedure of judging presumed "individual" microscopic features with no standardized rules for concluding that the markings match.
</p>
<p>
The state contended that Kobina Ebo Abruquah murdered a roommate by shooting him five times, including once in the back of the head. A significant part of the state's case came from a firearms examiner for the Prince George’s County Police Department. The examiner "opined that four bullets and one bullet fragment ... 'at some point had been fired from [a Taurus .38 Special revolver belonging to Mr. Abruquah].'" A bare majority of four justices agreed that admission of the opinion was an abuse of the trial court's discretion. Three justices strongly disputed this conclusion. Two of the three opinions in the case included tables displaying counts or percentages from experiments in which analysts compared sets of either bullets or cartridge casings fired from a few types of handguns to ascertain how frequently their source attributions and exclusions were correct and how often they were wrong.
</p>
<p>
There is a lot one might say about these opinions, but here I attend only to the statistical parts. \5/ As noted below (endnotes 3 an 4), neither party produced any statisticians or research scientists with training or extensive experience in applying statistical methods. The court did not refer to the recent, burgeoning literature on "error rates" in examiner-performance studies. Instead, the opinions drew on (or questioned) the analysis in the 2016 report of the President's Council of Advisors on Science and Technology (PCAST). The report essentially dismissed the vast majority of the research on which one expert for the state (James Hamby, a towering figure in the firearms-toolmark examiner community) relied. These studies, PCAST explained, usually asked examiners to match a set of questioned bullets to a set of guns that fired them. </p><p>A dissenting opinion of <a href="https://en.wikipedia.org/wiki/Steven_B._Gould" target="_blank">Justice Steven Gould</a> argued—with the aid of a couple of probability calculations—that the extremely small number of false matches in these "closed set" studies demonstrated that examiners were able to perform much better than would be expected if they were just guessing when they lined up the bullets with the guns. \6/
</p>
<p>That is a fair point. "Closed set" studies can show that examiners are extracting some discriminating information. But they do not lend themselves to good estimates of the probability of false identifications and false exclusions. For answering the "error rate" question in <i>Daubert</i>, they indicate lower bounds—the conditional error probabilities for examiners under the test conditions could be close to zero, but they could be considerably larger.
</p>
<p>
More useful experiments simulate what examiners do in cases like <i>Abruquah</i>—where they decide whether a specific gun fired the bullet that could have come from guns beyond those in an enumerated set of known guns. To accomplish this, the experiment can pair test bullets fired from a known gun with a "questioned" bullet and have the examiner report whether the questioned bullet did or did not travel through the barrel of the tested gun.
</p>
<p>The majority opinion, written by <a href="https://en.wikipedia.org/wiki/Matthew_J._Fader" target="_blank">Chief Justice Matthew Fader</a>, discussed two such experiments known as "Ames I" and "Ames II" (because they were done at the <a href="https://www.ameslab.gov/about-ames-laboratory" target="_blank">Ames National Laboratory</a>, "a government-owned, contractor-operated national laboratory of the U.S. Department of Energy, operated by and located on the campus of Iowa State University in Ames, Iowa"). The first experiment, funded by the Department of Defense and completed in 2014, "was designed to provide a better understanding of the error rates associated with the forensic comparison of fired cartridge cases." The experiment did not investigate performance with regard to toolmarks on the projectiles (the <a href="https://en.wikipedia.org/wiki/Cartridge_(firearms)" target="_blank">bullets</a> themselves) propelled from the cases, through the barrel of a gun, and beyond. Apparently referring to the closed-set kind of studies, the researchers observed that "[five] previous studies have been carried out to examine this and related issues of individualization and durability of marks ... , but the design of these previous studies, whether intended to measure error rates or not, did not include truly independent sample sets that would allow the unbiased determination of false-positive or false-negative error rates from the data in those studies." \7/
</p>
<p>However, their self-published technical report does not present the results in the kind of classification table that statisticians would expect. Part of such a table is <a href="http://for-sci-law.blogspot.com/2016/11/pcast-and-ames-study-will-real-error.html" target="_blank">on this blog</a>: \8/
</p>
<blockquote>
<div style="border: 1px solid black; padding-bottom: 4px; padding-left: 10px; padding-right: 5px; padding-top: 4px; padding: 4px 5px 4px 10px;">
The researchers enrolled 284 volunteer examiners in the study, and 218 submitted answers (raising an issue of selection bias). The 218 subjects (who obviously knew they were being tested) “made ... l5 comparisons of 3 knowns to 1 questioned cartridge case. For all participants, 5 of the sets were from known same-source firearms [known to the researchers but not the firearms examiners], and 10 of the sets were from known different-source firearms.” <u>3</u>/ Ignoring “inconclusive” comparisons, the performance of the examiners is shown in Table 1.<br />
<br />
<center>
<table border="1">
<tbody>
<tr>
<td align="center" colspan="4">Table 1. Outcomes of comparisons<br />
(derived from pp. 15-16 of Baldwin et al.)</td>
</tr>
<tr align="center">
<th><br /></th>
<th><i>~S</i></th>
<th><i>S</i></th>
<th><br /></th>
</tr>
<tr align="center">
<td><b>–<i>E</i></b></td>
<td>1421</td>
<td>4</td>
<td>1425</td>
</tr>
<tr align="center">
<td><b>+<i>E</i></b></td>
<td>22</td>
<td>1075</td>
<td>1097</td>
</tr>
<tr align="center">
<td><br /></td>
<td>1443</td>
<td>1079</td>
<td><br /></td>
</tr>
<tr>
<td colspan="4" style="padding-left: 10px; padding-right: 10px;">–<i>E</i> is a negative finding (the examiner decided there was no association).<br />
+<i>E</i> is a positive finding (the examiner decided there was an association).<br />
<i>S</i> indicates that the cartridges came from bullets fired by the same gun.<br />
~<i>S</i> indicates that the cartridges came from bullets fired by a different gun.</td>
</tr>
</tbody></table>
</center>
<br />
<i>False negatives</i>. Of the 4 + 1075 = 1079 judgments in which the gun was the same, 4 were negative. This false negative rate is <i>Prop</i>(–<i>E</i> |<i>S</i>) = 4/1079 = 0.37%. ("Prop" is short for "proportion," and "|" can be read as "given" or "out of all.") Treating the examiners tested as random samples of all examiners of interest, and viewing the performance in the experiment as representative of the examiners' behavior in casework with materials comparable to those in the experiment, we can estimate the portion of false negatives for all examiners. The point estimate is 0.37%. A 95% confidence interval is 0.10% to 0.95%. These numbers provide an estimate of how frequently all examiners would declare a negative association in all similar cases in which the association actually is positive.Instead of false negatives, we also can describe true negatives, or specificity. The observed specificity is <i>Prop</i>(<i>E</i>|~<i>S</i>) = 99.63%. The 95% confidence interval around this estimate is 99.05% to 99.90%.<br />
<br />
<i>False positives</i>. The observed false-positive rate is <i>Prop</i>(+<i>E </i>|~<i>S</i>) = 22/1443 = 1.52%, and the 95% confidence interval is 0.96% to 2.30%. The observed true-positive rate, or sensitivity, is 98.48%, and its 95% confidence interval is 97.7% to 99.04%.<br />
<br />
Taken at face value, these results seem rather encouraging. On average, examiners displayed high levels of accuracy, both for cartridge cases from the same gun (better than 99% specificity) and from different guns (better than 98% sensitivity).<br />
</div>
</blockquote>
<p>I did not comment on the implications of the fact that analysts often opted out of the binary classifications by declaring that an examination was inconclusive. This reporting category has generated a small explosion of literature and argumentation. There are two extreme views. Some organizations and individuals maintain that just because specimens did or not come from the same source, the failure to discern which of these two states of nature applies is an error. A more apt term would be "missed signal"; \9/ it is hardly obvious that <i>Daubert</i>'s fleeting reference to "error rates" was meant to encompass not only false positives and negatives but also test results that are neither positive nor negative. At the other pole are claims that all inconclusive outcomes should be counted as correct in computing the false-positive and false-negative error proportions seen in an experiment. </p><p>Incredibly, the latter is the only way in which the Ames laboratory computed the error proportions. I would like to think that had the report been subject to editorial review at a respected journal, a correction would have been made. Unchastened, the Ames laboratory again only counted inconclusive responses as if they were correct when it wrote up the results of its second study.
</p>
<p>
This lopsided treatment of inconclusives was an issue in <i>Abruquah</i>. The majority opinion described the two studies as follows (citations and footnotes omitted):
</p>
<blockquote>
<div style="background-color: #d9ead3; border-radius: 10px; border: 1px solid black; padding-bottom: 4px; padding-left: 10px; padding-right: 5px; padding-top: 4px; padding: 4px 5px 4px 10px;">
Of the 1,090 comparisons where the “known” and “unknown” cartridge cases were fired from the same source firearm, the examiners [in the Ames I study] incorrectly excluded only four cartridge cases, yielding a false-negative rate of 0.367%. Of the 2,180 comparisons where the “known” and “unknown” cartridge cases were fired from different firearms, the examiners incorrectly matched 22 cartridge cases, yielding a false-positive rate of 1.01%. However, of the non-matching comparison sets, 735, or 33.7%, were classified as inconclusive, id., a significantly higher percentage than in any closed-set study.
<br /><br />
The Ames Laboratory later conducted a second open-set, black-box study that was completed in 2020 ... The Ames II Study ... enrolled 173 examiners for a three-phase study to test for ... foundational validity: accuracy (in Phase I), repeatability (in Phase II), and reproducibility (in Phase III). In each of three phases, each participating examiner received 15 comparison sets of known and unknown cartridge cases and 15 comparison sets of known and unknown bullets. The firearms used for the bullet comparisons were either Beretta or Ruger handguns and the firearms used for the cartridge case comparisons were either Beretta or Jimenez handguns. ... As with the Ames I Study, although there was a “ground truth” correct answer for each sample set, examiners were permitted to pick from among the full array of the AFTE Range of Conclusions—identification, elimination, or one of the three levels of “inconclusive.”
<br /><br />
The first phase of testing was designed to assess accuracy of identification, “defined as the ability of an examiner to correctly identify a known match or eliminate a known nonmatch.” In the second phase, each examiner was given the same test set examined in phase one, without being told it was the same, to test repeatability, “defined as the ability of an examiner, when confronted with the exact same comparison once again, to reach the same conclusion as when first examined.” In the third phase, each examiner was given a test set that had previously been examined by one of the other examiners, to test reproducibility, “defined as the ability of a second examiner to evaluate a comparison set previously viewed by a different examiner and reach the same conclusion.”
<br /><br />
In the first phase, ... [t]reating inconclusive results as appropriate answers, the authors identified a false negative rate for bullets and cartridge cases of 2.92% and 1.76%, respectively, and a false positive rate for each of 0.7% and 0.92%, respectively. Examiners selected one of the three categories of inconclusive for 20.5% of matching bullet sets and 65.3% of nonmatching bullet sets. [T]he results overall varied based on the type of handgun that produced the bullet/cartridge, with examiners’ results reflecting much greater certainty and correctness in classifying bullets and cartridge cases fired from the Beretta handguns than from the Ruger (for bullets) and Jimenez (for cartridge cases) handguns.
</div>
</blockquote>
<p>
The opinion continues with a description of some statistics for the level of intra- and inter-examiner reliability observed in the Ames II study, but I won't pursue those here. The question of accuracy is enough for today. \10/ To some extent, the majority's confidence in the reported low error proportions (all under 3%) was shaken by the presence of inconclusives: "if at least some inconclusives should be treated as incorrect responses, then the rates of error in open-set studies performed to date are unreliable. Notably, if just the 'Inconclusive-A' responses—those for which the examiner thought there was almost enough agreement to identify a match—for non-matching bullets in the Ames II Study were counted as incorrect matches, the 'false positive' rate would balloon from 0.7% to 10.13%."
</p>
<p>
But <i>should</i> any of the inconclusives "be treated as incorrect," and if so, how many? Doesn't it depend on the purpose of the studies and the computation? If the purpose is to probe what the PCAST Report neoterically called "foundational validity"—whether a procedure is at least capable of giving accurate source conclusions when properly employed by a skilled examiner—then inconclusives are not such a problem. They represent lost opportunities to extract useful information from the specimens, but they do not change the finding that, within the experiment itself, in those instances in which the examiner is willing to come down on one side or the other, the conclusion is usually correct.
</p>
<p>
One justice stressed this fact. Justice Gould insisted that
</p>
<blockquote>
<div style="background-color: #ffe9ec; border-radius: 10px; border: 1px solid black; padding-bottom: 4px; padding-left: 10px; padding-right: 5px; padding-top: 4px; padding: 4px 5px 4px 10px;">
[T]he focus of our inquiry should not be the reliability of the AFTE Theory in general, but rather the reliability of conclusive determinations produced when the AFTE Theory is applied. Of course, an examiner applying the AFTE Theory might be unable to declare a match (“identification”) or a non-match (“elimination”), resulting in an inconclusive determination. But that's not our concern. Rather, our concern is this: when the examiner does declare an identification or elimination, we want to know how reliable that determination is.
</div>
</blockquote>
<p>
He was unimpressed with the extreme view that every failure to recognize "ground truth" is an "error" for the purpose of evaluating an identification method under <i>Daubert</i>. \11/ He argued for error proportions like those used by PCAST: \12/
</p>
<blockquote>
<div style="background-color: #ffe9ec; border-radius: 10px; border: 1px solid black; padding-bottom: 4px; padding-left: 10px; padding-right: 5px; padding-top: 4px; padding: 4px 5px 4px 10px;">
This brings us to a different way of looking at error rates, one that received no consideration by the Majority ... I am referring to calculating error by excluding inconclusives from both the numerator and the denominator. .... [C]ontrary to Mr. Faigman's unsupported criticism, excluding inconclusives from the numerator and denominator accords with both common sense and accepted statistical methodologies. ... PCAST ... contended that ... false positive rates should be based only on conclusive examinations “because evidence used against a defendant will typically be based on conclusive, rather than inconclusive, determinations.” ... So, far from being "crazy" ... , excluding inconclusives from error rate calculations when assessing the reliability of a positive identification is not only an acceptable approach, but the preferred one, at least according to PCAST. Moreover, from a mathematical standpoint, excluding inconclusives from the denominator actually penalizes the examiner because errors accounted for in the numerator are measured against a smaller denominator, i.e., a smaller sample size.
</div>
</blockquote>
So what happens when the error proportions for the subset of positive and negative conclusions are computed with the Ames data? The report's denominator is too large, but the resulting bias is not so great in this particular case. For Ames I, Justice Gould's opinion tracks Table 1 above:
<blockquote>
<div style="background-color: #ffe9ec; border-radius: 10px; border: 1px solid black; padding-bottom: 4px; padding-left: 10px; padding-right: 5px; padding-top: 4px; padding: 4px 5px 4px 10px;">
With respect to matching [cartridge] sets, the number of inconclusives was so low that whether inconclusives are included in the denominator makes little difference to error rates. Of the 1,090 matching sets, only 11, or 1.01 percent, were inconclusives. Of the conclusive determinations, 1,075 were correctly identified as a match (“identifications”) and four were incorrectly eliminated (“eliminations”). ... Measured against the total number of matching sets (1,090), the false elimination rate was 0.36 percent. Against only the conclusive determinations (1,079), the false elimination rate was 0.37 percent. ...
<br /><br />
Of 2,178 non-matching sets, examiners reported 735 inconclusives for an inconclusive rate of 33.7 percent, 1,421 sets as correct eliminations, and 22 sets as incorrect identifications (false positives). ... As a percentage of the total 2,178 non-matching sets, the false positive rate was 1.01 percent. As a percentage of the 1,443 conclusive determinations, however, the false positive rate was 1.52 percent. Either way, the results show that the risk of a false positive is very low
</div>
</blockquote>
For Ames II,
<blockquote>
<div style="background-color: #ffe9ec; border-radius: 10px; border: 1px solid black; padding-bottom: 4px; padding-left: 10px; padding-right: 5px; padding-top: 4px; padding: 4px 5px 4px 10px;">
There were 41 false eliminations. As a percentage of the 1,405 recorded results, the false elimination rate was 2.9 percent. As a percentage of only the conclusive results, the false elimination rate increased to 3.7 percent ... .
<br /><br />
... There were 20 false positives. Measured against the total number of recorded results (2,842), the false positive rate was 0.7 percent. Measured against only the conclusive determinations, however, the false positive rate increases to 2.04 percent.
</div>
</blockquote>
<p>
In sum, on the issue of whether a substantial number of firearms-toolmark examiners <i>can</i> generally avoid erroneous source attributions and exclusions when tested as in Ames I and Ames II, the answer seems to be that, yes, they can. Perhaps this helps explain the Chief Justice's concession that "[t]he relatively low rate of 'false positive' responses in studies conducted to date is by far the most persuasive piece of evidence in favor of admissibility of firearms identification evidence." But the court was quick to add that "[o]n balance, however, the record does not demonstrate that that rate is reliable, especially when it comes to actual casework."
</p>
<p>Extrapolating from the error proportions in experiments to those in casework is difficult indeed. Is the largely self-selected sample of examiners who enroll in and complete the study representative of the general population of examiners doing casework? Does the fact that the enrolled examiners know they are being tested make them more (or less) careful or cautious? Do examiners have expectations about the prevalence of true sources in the experiment that differ from those they have in casework? Are the specimens in the experiment comparable to those in casework? \13/ Do error probabilities for comparing marks on cartridge cases apply to the marks on the bullets they house? Does it matter if the type of gun used in the experiment is different from the type in the case?
</p>
<p>Most of the questions are matters of external validity. Some of them are the subject of explicit discussion in the opinions in <i>Abruquah</i>. For example, Justice Gould rejects, as a conjecture unsupported by the record, the concern that examiners might be more prone to avoid a classification by announcing an "inconclusive" outcome in an experiment than in practice.
</p>
<p>To different degrees, the generalizability questions interact with the legal question being posed. As I have indicated, whether the scientific literature reveals that a method practiced by skilled analysts <i>can</i> produce conclusions that are generally correct for evidence like that in a given case is one important issue under <i>Daubert</i>. Whether the same studies permit accurate estimates of error probabilities in general casework is a distinct, albeit related, scientific question. How to count or adjust for inconclusives in experiments is but a subpart of the latter question.
</p>
<p>And, how to present source attributions in the absence of reasonable error-probability estimates for casework is a question that <i>Abruquah</i> barely begins to answer. No opinion embraced the defendant's argument that only a limp statement like "unable to exclude as a possible source" is allowed. But neither does the case follow other courts that allow statements such as the awkward and probably ineffectual "reasonable degree of ballistic certainty" for expressing the difficult-to-quantify uncertainty in toolmark source attributions. After <i>Abruquah</i>, if an expert makes a source attribution in Maryland, some kind of qualification or caveat is necessary. \14/ But what will that be?
</p>
<p>Toolmark examiners are trained to believe that their job is to provide source conclusions for investigators and courts to use, but neither law nor science compels this job description. Perhaps it would be better to replace conclusion-centered testimony about the (probable) truth of sources conclusions with evidence-centered statements about the degree to which the evidence supports a source conclusion. The <i>Abruquah</i> court wrote that
</p>
<blockquote>
<div style="background-color: #d9ead3; border-radius: 10px; border: 1px solid black; padding-bottom: 4px; padding-left: 10px; padding-right: 5px; padding-top: 4px; padding: 4px 5px 4px 10px;">
The reports, studies, and testimony presented to the circuit court demonstrate that the firearms identification methodology employed in this case can support reliable conclusions that patterns and markings on bullets are consistent or inconsistent with those on bullets fired from a particular firearm. Those reports, studies, and testimony do not, however, demonstrate that that methodology can reliably support an unqualified conclusion that such bullets were fired from a particular firearm.</div>
</blockquote>
<p>The expert witness's methodology provides "support" for a conclusion, and the witness could simply testify about the direction and magnitude of the support without opining on the truth of the conclusion itself. \15/ "Consistent with" testimony is a statement about the evidence, but it is a minimal, if not opaque, description of the data. Is it all that the record in <i>Abruquah</i>—not to mention the record in the next case—should allow? Only one thing is clear—fights over the legally permissible modes for presenting the outcomes of toolmark examinations will continue.</p>
<p>
Notes
</p>
<ol>
<li>In <i>Commonwealth v. Pytou Heang</i>, 942 N.E.2d 927 (Mass. 2011), the Massachusetts Supreme Judicial Court upheld source-attribution testimony "to a reasonable degree of scientific certainty" but added "that [the examiner] could not exclude the possibility that the projectiles were fired by another nine millimeter firearm." The Court proposed "guidelines" to allow source attribution to no more than "a reasonable degree of ballistic certainty."
</li>
<li>The defendant was convicted at a first trial in 2013, then retried in 2018. In 2020, the Maryland Supreme Court changed its standard for admitting scientific evidence from a requirement of general acceptance of the method in the relevant scientific communities (denominated the "<i>Frye-Reed</i> standard" in Maryland) to a more direct showing of scientific validity described in <i>Daubert v. Merrell Dow Pharmaceuticals</i>, 509 U.S. 579 (1993), and an advisory committee note accompanying amendments in 2000 to Federal Rule of Evidence 702 (called the "<i>Daubert-Rochkind</i> standard" in <i>Abruquah</i>).
</li>
<li>The experts at the "<i>Frye-Reed</i> hearing" were William Tobin (a "Principal of Forensic Engineering International," <a href="https://forensicengineersintl.com/about/william-tobin/">https://forensicengineersintl.com/about/william-tobin/</a>, and "former head of forensic metallurgy operations for the FBI Laboratory" (Unsurpassed Experience, <a href="https://forensicengineersintl.com/">https://forensicengineersintl.com/</a>); James Hamby ("a laboratory director who has specialized in firearm and tool mark identification for the past 49 years" and who is "a past president of AFTE [the Association of Firearm and Tool Mark Examiners] ... and has trained firearms examiners from over 15 countries worldwide," Speakers, International Symposium on Forensic Science, Lahore, Pakistan, Mar. 17-19, 2020, <a href="https://isfs2020.pfsa.punjab.gov.pk/james-edward">https://isfs2020.pfsa.punjab.gov.pk/james-edward</a>); Torin Suber ("a forensic scientist manager with the Maryland State Police"); and Scott McVeigh (the firearms examiner in the case).
</li>
<li>The experts at the supplemental "<i>Daubert-Rochkind</i> hearing" were James Hamby (a repeat performance), and <a href="https://www.uchastings.edu/people/david-faigman/" target="_blank">David Faigman</a>, "Chancellor & Dean, William B. Lockhart Professor of Law and the John F. Digardi Distinguished Professor of Law" at the University of California College of the Law, San Francisco. </li>
<li>Remarks on the legal analysis will appear in the 2024 cumulative supplement to <a href="https://law-store.wolterskluwer.com/s/product/new-wigmore-expert-evidence-3e/01t4R00000OUTuJQAX" target="_blank">The New Wigmore, A Treatise on Evidence: Expert Evidence</a>.</li>
<li>The opinion gives a simplified example:
<blockquote>
The test administrator fires two bullets from each of 10 consecutively manufactured handguns. The administrator then gives you two sets of 10 bullets each. One set consists of 10 “unknown” bullets—where the source of the bullet is unknown to the examiner—and the other set consists of 10 “known” bullets—where the source of the bullet is known. You are given unfettered access to a sophisticated crime lab, with the tools, supplies, and equipment necessary to conduct a forensic examination. And, like the vocabulary tests from grade school requiring you to match words with pictures, you must match each of the 10 unknown bullets to the 10 known bullets.
<br /><br />
Even though you know that each of the unknowns can be matched with exactly one of the knowns, you probably wouldn't know where to begin. If you had to resort to guessing, your odds of correctly matching the 10 unknown bullets to the 10 knowns would be one out of 3,628,800. [An accompanying note 11 explains that: "[w]ith 10 unknown bullets and 10 known bullets, the odds of guessing the first pair correctly are one out of 10. And if you get the first right, the odds of getting the second right are one out of nine. If you get the first two right, the odds of getting the third right are one out of eight, and so on. Thus, the odds of matching each unknown bullet to the correct known is represented by the following calculation: (1/10) x (1/9) x (1/8) x (1/7) x (1/6) x (1/5) x (1/4) x (1/3) x (1/2) x (1/1)."] Even if you correctly matched five unknown bullets to five known bullets and guessed on the remaining five unknowns, your odds of matching the remaining unknowns correctly would be one out of 120. [Note 12: "(1/5) x (1/4) x (1/3) x (1/2) x (1/1)."] Not very promising.
<br /><br />
The closed-set and semi-closed-set studies before the trial court—the studies which PCAST discounted—show that if you were to properly apply the AFTE Theory, you would be very likely to match correctly each of the 10 unknowns to the corresponding knowns. See Validation Study; Worldwide Study; Bullet Validation Study. ... Your odds would thus improve from virtually zero (one in 3,628,800) to 100 percent. Yet according to PCAST, those studies provide no support for the scientific validity of the AFTE Theory. ...
</blockquote>
</li>
<li>David P. Baldwin, Stanley J. Bajic, Max Morris & Daniel Zamzow, A Study of False-positive and False-negative Error Rates in Cartridge Case Comparisons, Ames Laboratory, USDOE, Tech. Rep. #IS-5207 (2014), at <a href="https://afte.org/uploads/documents/swggun-false-postive-false-negative-usdoe.pdf">https://afte.org/uploads/documents/swggun-false-postive-false-negative-usdoe.pdf</a> [https://perma.cc/4VWZ-CPHK].
</li>
<li>David H. Kaye, PCAST and the Ames Bullet Cartridge Study: Will the Real Error Rates Please Stand Up?, Forensic Sci., Stat. & L., Nov. 1, 2016, <a href="http://for-sci-law.blogspot.com/2016/11/pcast-and-ames-study-will-real-error.html">http://for-sci-law.blogspot.com/2016/11/pcast-and-ames-study-will-real-error.html</a>
</li>
<li>David H. Kaye et al., Toolmark-comparison Testimony: A Report to the Texas Forensic Science Commission, May 2, 2022, available at <a href="http://ssrn.com/abstract=4108012">http://ssrn.com/abstract=4108012</a>.
</li>
<li>I will note, however, that the report apparently strains to make the attained levels for reliability seem high. Alan H. Dorfman & Richard Valliant, A Re-Analysis of Repeatability and Reproducibility in the Ames-USDOE-FBI Study, 9 Stat. & Pub. Pol'y 175 (2022).
</li>
<li>The opinion attributes this view to
<blockquote>
Mr. Abruquah's expert, Professor David Faigman, [who declared] that "in the annals of scientific research or of proficiency testing, it would be difficult to find a more risible manner of measuring error." To Mr. Faigman, the issue was simple: in Ames I and II, the ground truth was known, thus "there are really only two answers to the test, like a true or false exam[ple]." Mr. Faigman explained that "the common sense of it is if you know the answer is either A or B and the person says I don't know, in any testing that I've ever seen that's a wrong answer." He argued, therefore, that inconclusives should be counted as errors.
</blockquote>
</li>
<li>See also NIST Expert Working Group on Human Factors in Latent Print Analysis, Latent Print Examination and Human Factors: Improving the Practice Through a Systems Approach (David H. Kaye ed. 2012), available at <a href="http://ssrn.com/abstract=2050067">ssrn.com/abstract=2050067</a> (arguing against counting inconclusives in error proportions that are supposed to indicate the probative value of actual conclusions).
</li>
<li>Testing examiner performance in the actual flow of cases would help address the last three questions. A somewhat confusing analysis of results in such an experiment is described in a posting last year. David H. Kaye, Preliminary Results from a Blind Quality Control Program, Forensic Sci., Stat. & L., July 9, 2022, <a href="http://for-sci-law.blogspot.com/2022/07/preliminary-results-from-blind-quality.html">http://for-sci-law.blogspot.com/2022/07/preliminary-results-from-blind-quality.html</a>.
</li>
<li>The court wrote that:
<blockquote>
It is also possible that experts who are asked the right questions or have the benefit of additional studies and data may be able to offer opinions that drill down further on the level of consistency exhibited by samples or the likelihood that two bullets or cartridges fired from different firearms might exhibit such consistency. However, based on the record here, and particularly the lack of evidence that study results are reflective of actual casework, firearms identification has not been shown to reach reliable results linking a particular unknown bullet to a particular known firearm.
</blockquote>
</li>
<li>See, e.g., David H. Kaye, <a href="https://papers.ssrn.com/abstract_id=3177752" target="_blank">The Nikumaroro Bones: How Can Forensic Scientists Assist Factfinders?</a>, 6 Va. J. Crim. L. 101 (2018).
</li>
</ol>
<p>
LAST UPDATED 29 June 2023
</p>DH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.com0tag:blogger.com,1999:blog-5354567765897135804.post-67861573282576915852023-06-11T15:58:00.002-04:002023-06-11T16:34:25.283-04:00Is "Bitemark Analysis" Better than "Bitemark Comparisons"? <p>
In October 2022, NIST released a <a href="https://www.nist.gov/spo/forensic-science-program/bitemark-analysis-nist-scientific-foundation-review" target="_blank">draft report</a> entitled "Bitemark Analysis: A NIST Scientific Foundation Review." A <a href="https://www.nist.gov/news-events/news/2022/10/forensic-bitemark-analysis-not-supported-sufficient-data-nist-draft-review" target="_blank">press release</a> announced "Forensic Bitemark Analysis Not Supported by Sufficient Data, NIST Draft Review Finds." In March 2023, the <a href="https://www.nist.gov/spo/forensic-science-program/bitemark-analysis-nist-scientific-foundation-review" target="_blank">final version</a> reaching the same conclusions was released. Soon afterward, the NIST-supported Organization of Scientific Area Committees for Forensic Science (<a href="https://www.nist.gov/organization-scientific-area-committees-forensic-science" target="_blank">OSAC</a>) revised the scope of the work that its forensic odontology subcommittee can undertake. The <a href="https://www.nist.gov/organization-scientific-area-committees-forensic-science/forensic-odontology-subcommittee" target="_blank">description</a> now specifies, in italics no less, that "<i>The Forensic Odontology Subcommittee does not develop standards on bitemark recognition, comparison, and identification</i>."
</p>
<p>
Yet, some medical examiners believe that "analysis" of marks on the skin "frequently yields valuable information that forensic odontologists testify to in courts of law, just as forensic pathologists do with respect to their objective findings and their interpretations of those findings based on experience, training and the circumstances of the event." Richard Souviron & Leslie Haller, Bite Mark Evidence: Bite Mark Analysis Is Not the Same as Bite Mark Comparison or Matching or Identification, 4 J. L. & Biosci. 617, 618 (2017). They distinguish between "analysis" and "comparison," recognizing that the latter is not scientifically well founded, and seeking to preserve the former as a legitimate expert endeavor. They propose that
</p>
<blockquote>
The analysis process involves answering basic, crucial, questions such as whether or not the pattern injury is a human bite mark. This question can be the most difficult part of the entire process. After establishing whether a patterned injury is, indeed, a bite mark, other questions must be asked. Is it a human bite mark? Was it made by an adult or a child? Was it swabbed for DNA? Was it made through clothing? If so, was the clothing swabbed for DNA? Where is it located on the victim and in what position was the victim when it happened? Could it have been self-inflicted? What was the position of the biter? Was it offensive or defensive? Was it affectionate or does it demonstrate violence? Will it produce a permanent injury? If so, simple battery may become aggravated battery. When was the bite inflicted in relation to the time of death? Is it fresh, a scar or somewhere in between? Was the person bitten alive or dead at the time? Are there any unique dental characteristics that could be used to exclude possible suspects? In cases of multiple bites, did the same biter make them all? Were they all made at the same time or do they establish a pattern of long-term abuse?
<br /><br />
These questions, and more, are the essential core of the analysis of every bite mark, and produce a large amount of information that can be of considerable value to an investigation before any suspects are identified or charged.
</blockquote>
<p>Id. So where are the experiments or other studies to show that most of these "essential" parts of bitemark analysis can be done validly and reliably? Can medical examiners correctly classify "pattern injuries" as bitemarks? As human bitemarks? As the mark of a child or an adult? As affectionate? As unique? As coming from the same biter? </p>
<p>
Bite (and other) marks will be encountered in autopsies. They need to be photographed and examined along with other injuries or characteristics. But odontologists and medical examiners should think hard before they claim the ability to do all these things "and more" as part of "analysis."<br /></p>DH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.com0tag:blogger.com,1999:blog-5354567765897135804.post-49844211372161270762023-01-19T17:17:00.000-05:002023-01-19T17:17:11.101-05:00"Double Blind Peer Review" of Research on Hair Fibers<p>Yesterday I heard about some new publications on forensic hair microscopy published in, of all places, a journal on pharmaceuticals. My first thought was that the journal might be a predatory one with deceptive advertising designed to con scholars into paying for publication in what appears to be a reputable scientific journal. But that was too cynical.
</p>
<p>The papers are
</p>
<ul style="text-align: left;">
<li>S. Sneha Harshini, Vishnu Priya Veeraraghavan, Abirami Arthanari1, R. Gayathri, S. Kavitha, J. Selvaraj, P. K. Reshma & Y. Dinesh, <a href="https://www.japtr.org/article.asp?issn=2231-4040;year=2022;volume=13;issue=5;spage=297;epage=301;aulast=Harshini" target="_blank">Comparative Study of Male and Female Human Hair: A Microscopic Analysis</a>, 13 J. Advanced Pharm. Tech. & Rsch. S297-S301 (2022);</li>
<li>S. Nehal Safiya, Vishnu Priya Veeraraghavan, Abirami Arthanari, R. Gayathri, J. Selvaraj, S. Kavitha & Y. Dinesh, J., <a href="https://www.japtr.org/article.asp?issn=2231-4040;year=2022;volume=13;issue=5;spage=112;epage=116;aulast=Safiya" target="_blank">Comparison of Human and Animal Hair – A Microscopical Analysis</a>, 13 Advanced Pharm. Tech. & Rsch. S112–S116 (2022); and</li>
<li>S. Nehal Safiya, Vishnu Priya Veeraraghavan, Abirami Arthanari, R. Gayathri, J. Selvaraj, S. Kavitha & Y. Dinesh, <a href="https://www.japtr.org/article.asp?issn=2231-4040;year=2022;volume=13;issue=5;spage=117;epage=120;aulast=Rajaselin" target="_blank">A Comparative Study of Different Animal Hairs: A Microscopic Analysis</a>, 13 Advanced Pharm. Tech. & Rsch. S117–S120 (2022) <br /></li>
</ul>
<p>The <i>Journal of Advanced Pharmaceutical Technology & Research</i> has a scientific society and a respectable publisher behind it. The former is the “Society of Pharmaceutical Education & Research (<a href="https://www.sperpharma.org/" target="_blank">SPER</a>),” which is one of the leading pharmaceutical association in the country [of India] ... with a member base of around 3,500, it spread [sic] across the country and have [sic] 13 state branches.”
</p>
<p>The latter is Wolters Kluwer’s <a href="https://www.medknow.com/" target="_blank">Medknow</a>. Located in Mumbai, Medknow “provides publishing services for peer-reviewed, online and print-plus-online journals in medicine on behalf of learned societies and associations with a focus on emerging markets.” <a href="https://www.medknow.com/EthicalGuidelines.asp" target="_blank">Wolters Kluwer</a> insists that Medknow “journals employ a double-blind review process, in which the author identities are concealed from the reviewers, and vice versa, throughout the review process.”
</p>
<p>Although the journal is not indexed in Medline, the research comes from the <a href="https://saveethadental.com/" target="_blank">Saveetha Dental College</a>, Chennai, Tamil Nadu, India—”one of the finest institutions in the world with a unique curriculum that is a spectacular fusion of the best practices of the east and west.”
</p>
<p>So I read the papers. Unbelievable.
</p>
<p>The abstract and the conclusion of "A Comparative Study of Male and Female Human Hair" announce that “[t]his study can be concluded that the structural comparison between male and female hair specimens can be used as evidence for forensic analysis at crime scenes.” How so? Well, for one thing, “[i]n this study, it is observed that the color of human male hair is completely black, while it is black on the proximal end and brown at the distal end of human female hair.”
</p>
<p>Astonishingly, the sample of hairs is never described. How many men and women provided hair? Where did they come from? How many hairs were taken from each subject and compared? Were the examinations blind? Without this elementary information, no one can understand or assess the reported results.
</p>
<p>The “Comparison of Human and Animal Hair – A Microscopical Analysis” is similarly devoid of any meaningful description of the research.
</p>
<p>The “Comparative Study of Different Animal Hairs: A Microscopic Analysis” appears to be a description of four hairs – one each from a dog, a cat, a horse, and a rat. The researchers found some differences among them. This they found encouraging: "The present study might be used in forensic investigations."
</p>
<p>So much for "double blind peer review."
</p>DH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.com0tag:blogger.com,1999:blog-5354567765897135804.post-83676069676190587402022-08-02T15:22:00.000-04:002022-08-02T15:22:53.468-04:00Grand Jury Subpoenas for Newborn Screening Blood Spots<p>On July 10, the New Jersey Office of the Public Defender and the <i>New Jersey Monitor</i> sued the state department of health "to obtain redacted copies of [grand jury] subpoenas ... so that they can learn more about how the State Newborn Screening Laboratory has effectively turned into a warrantless DNA collection facility for State criminal prosecutions." \1/
</p>
<p>New Jersey's neonatal screening program, like that in other states, uses a few drops of blood from the newborn’s heel to test "for certain genetic, endocrine, and metabolic disorders ... prior to discharge from a hospital or birthing center." \2/ The Department of Health explains that "[e]arly detection and treatment of the disorders on the newborn screening panel can prevent lifelong disabilities, including intellectual and developmental disabilities, and life threatening infections." \3/ Like many other states, New Jersey health officials retain a "Guthrie card" (named after Dr. Robert Guthrie, who in the 1960s, successfully championed mandatory screening laws for a metabolic disease that causes preventable intellectual disability). \4/
</p>
<p>The complaint alleges that the Office of the Public Defender (OPD) "became alarmed" that State Police "are utilizing the residual blood spot samples" and that the health department rebuffed requests to provide information on subpoenas the department may have received from grand juries. The cause of the alarm is described as follows:
</p>
<blockquote>
The State Police had re-opened an investigation into a “cold case” of sexual assault that had occurred in 1996 and had genetically narrowed the suspects to one of three brothers and their male offspring. ... [They] served a subpoena upon the Newborn Screening Laboratory in or about August 2021 to obtain residual dried blood spot samples that had been collected from a male newborn in or about June 2012.<br /><br />
To ascertain which family member was the suspect, the State Police sought the blood spot sample that was taken from an approximately nine-year-old child when he was a newborn to compare it to the DNA it had collected at the crime scene in 1996. The State Police successfully obtained the child’s blood spot sample, sequenced the DNA, and then ran further analysis utilizing a technique known as investigative genetic genealogy. The State Police alleges those results showed the newborn blood spot sample belonged to the genetic child of the suspect. From there, the State Police used those results to form the basis of an affidavit of probable cause to acquire a warrant to obtain a buccal swab from OPD’s client, who is the child’s father. OPD’s client was then criminally charged.
</blockquote>
<p>OPD further asserted "a significant interest in knowing how expansive this law enforcement practice is so that it may better represent its clients who may be subject to such warrantless searches." It did not explain how learning the number of subpoenas would improve its ability to defend any particular client.</p><p>The other plaintiff, the <i>New Jersey Monitor</i>, described itself as "the eyes and ears of the public [with] an interest in reporting to the public about this practice that violates basic concepts of genetic privacy."
</p>
<p>The pleading claims that "law enforcement agencies are flouting search warrant requirements" and that "[b]ecause the Supreme Court of the United States and the New Jersey Supreme Court recognize that people have a right of privacy in their DNA and that the collection and analysis of that DNA is a search, a search warrant is generally required for such invasive actions."
</p>
<p>I have not researched New Jersey jurisprudence, but I strongly doubt that the U.S. Supreme Court's opinions constitutionalize any free-floating "basic concepts of genetic privacy." \5/ The allegation of "subversion of the warrant requirement" of the Fourth Amendment presupposes that a warrant is required. That could be, but this question is not directly covered by Supreme Court precedent. It is the conclusion of what has to be a more complex legal argument. How might that argument go?<br /></p><p>The Fourth Amendment declares that "[t]he right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no Warrants shall issue, but upon probable cause ... ." How do subpoenas for Guthrie cards come within this proscription? They are not quite seizures of any person or any person's papers or effects.
</p>
<p>Are they searches of the person? Certainly, a physical intrusion into the body to extract blood would be, and the state has done that with a warrantless heel prick. But that search is constitutional because of an exception to the warrant-preference rule. The "special needs" exception allows the government to conduct searches and seizures to advance important government interests other than collecting information for criminal cases. Compulsory neonatal screening is an important public health program for providing early treatment or prevention of suffering and impairment. It predates DNA testing for identification (and DNA testing for disease, for that matter). New Jersey's legislation dates back to 1964. That grand jury subpoenas can be issued today to investigate a crime does not make the original search or seizure does not transform the original interference with bodily integrity into one that required probable cause. \6/
</p>
<p>There is, however, a second search. The subpoena itself triggers Fourth Amendment protections -- but not to the extent of a physical entry to acquire information. The privacy and security interests are quite different, and the Supreme Court has held that the government may use an administrative subpoena to acquire documents so long as “the documents sought are relevant to the [investigation]” and the document request is “adequate, but not excessive,” for those purposes. \7/ Unlike the warrant process, a subpoena does not require probable cause.
</p>
<p>At least, not normally. A Guthrie-card subpoena might be different. In <i>Carpenter v. United States</i>, \8/ the Supreme Court held that probable cause was required for the government to compel wireless carriers to produce time-stamped records of cell-site location information (CSLI) on a robbery suspect that had 12,898 location points cataloging his cell phone's movements over 127 days. Courts had issued orders for these business records in an FBI investigation into a series of robberies, under the Stored Communications Act, which merely requires "specific and articulable facts showing that there are reasonable grounds to believe that ... the records ... [sought] are relevant and material to an ongoing criminal investigation." \9/ Cause to believe that a record is relevant to an investigation is not probable cause to believe that the record is evidence of a suspect's criminal conduct. The majority opinion in <i>Carpenter</i> emphasized that CSLI records added up to (or will, in the near future, amount to) "a detailed chronicle of a person's physical presence compiled every day, every moment, over several years." \10/ As such, it held the relevance-based orders in question were unreasonable searches.
</p>
<p>One can argue that the information that can be extracted from a DNA sample "implicates privacy concerns" at least as much as CSLI data. \11/ But the analogy requires attention to the kind of DNA information the government obtains (and the precautions it takes against other personal information being acquired from the DNA).</p><p>Until the blood is analyzed, no informational privacy is compromised. \12/ In the case mentioned in the complaint, the police "had genetically narrowed the suspects to one of three brothers and their male offspring." I would guess that they accomplished this by means of Y-STR typing combined with other leads. The police then obtained the Guthrie card for "an approximately nine-year-old child," "sequenced the DNA, and then ran further analysis utilizing a technique known as investigative genetic genealogy" to conclude that the child's "blood spot sample belonged to the genetic child of the suspect." </p><p>It is difficult to discern what DNA testing was done. "Investigative genetic genealogy" normally involves comparisons of haploblocks from crime-scene DNA and DNA in genetic genealogy databases that are open to the public in order to pick possible relatives to the unknown person whose DNA was at the crime-scene. With those findings, ordinary genealogical research may produce a list of suspects. In the case mention in the complaint, police already had the list of suspects. Why perform the extensive haploblock analysis of "investigative genetic genealogy" if the three siblings and the child of one of them already are known? Would not comparing a number of autosomal STR loci not known to be medically informative have been able to show whether the child had a substantial probability of being the child of the man whose DNA was associated with the 1996 sexual assault that the police were investigating? That might be enough for probable cause for a court order compelling the implicated brother to provide a DNA sample for comparison to the one from the 1996 sexual assault. \13/
</p>
<p>Of course, it can be argued that the particular loci the police <i>actually</i> used for the investigation hardly matter -- that the very fact that the sample contains medically relevant information that the police <i>could</i> acquire from the Guthrie card makes the case similar enough to the location tracking in <i>Carpenter</i> to require probable cause. In <i>Carpenter</i>, the FBI was only interested in associating the defendant's cell phone with towers near the robberies that were under investigation. Did they assemble detailed itineraries of Carpenter's movements at all other locations that he (or, more precisely, his phone) visited? Perhaps the mere fact that the many cell-site records were in their possession was enough. </p><p>Yet, this argument resembles the one rejected in most cases on the constitutionality of forcing convicted offenders (or even arrestees) to surrender DNA for law-enforcement databases. Most judges, and the Supreme Court, rejected the argument that the potential to type all kinds of loci in itself required probable cause for collecting and profiling the DNA for identification only. \14/
</p>
<p>None of this means that New Jersey's Guthrie-card subpoenas are clearly or even probably constitutional. I merely suggest that there could be more to the issue than the complaint alleges. Also, it seems worth noting that the exact connection between the the public records request and the constitutional issue is not entirely apparent. \15/
</p>
<p><b>NOTES</b>
</p>
<p><span style="font-size: x-small;"> Thanks to Fred Bieber for news of the complaint.
</span></p><span style="font-size: x-small;">
</span><ol><span style="font-size: x-small;">
</span><li><span style="font-size: x-small;">N.J. Office of the Public Defender v. N.J. Dep't of Health, Civ. No. ___ (Complaint, July 10, 2022), available at <a href="https://www.theverge.com/2022/7/29/23283837/nj-police-baby-dna-crimes-lawsuit-public-defender">https://www.theverge.com/2022/7/29/23283837/nj-police-baby-dna-crimes-lawsuit-public-defender</a>.</span></li><span style="font-size: x-small;">
</span><li><span style="font-size: x-small;">Centers for Disease Control and Prevention, Newborn Screening Portal, Nov. 29, 2021, <a href="https://www.cdc.gov/newbornscreening/index.html">https://www.cdc.gov/newbornscreening/index.html</a>.</span></li><span style="font-size: x-small;">
</span><li><span style="font-size: x-small;">N.J. Dep't of Health, Newborn Screening and Genetic Services, Feb. 10, 2022, <a href="https://www.nj.gov/health/fhs/nbs/">https://www.nj.gov/health/fhs/nbs/</a>.</span></li><span style="font-size: x-small;">
</span><li><span style="font-size: x-small;">Harvey L. Levy, Robert Guthrie and the Trials and Tribulations of Newborn Screening, 7(1) Int’l J. Neonatal Screening 5 (2021), available at <a href="https://doi.org/10.3390/ijns7010005">https://doi.org/10.3390/ijns7010005</a>.</span></li><li><span style="font-size: x-small;">Cf. Dobbs v. Jackson Women's Health Organization, No. 19–1392 (U.S. June 24, 2022), available at <a href="https://www.supremecourt.gov/opinions/21pdf/19-1392_6j37.pdf">https://www.supremecourt.gov/opinions/21pdf/19-1392_6j37.pdf</a>.</span></li><span style="font-size: x-small;">
</span><li><span style="font-size: x-small;">Cf. Ferguson v. Charleston, 532 U.S. 67 (2001), available at <a href="https://scholar.google.com/scholar_case?case=12447804856380641716">https://scholar.google.com/scholar_case?case=12447804856380641716</a>. Another exception is consent. Although consent for Fourth Amendment purposes is far less onerous than medical informed consent, the only grounds for refusal in New Jersey are religious. 26 N.J. Stat. Ann. § 26:2-111. So the consent exception does not apply.</span></li><span style="font-size: x-small;">
</span><li><span style="font-size: x-small;">Okla. Press Publ’g Co. v. Walling, 327 U.S. 186, 209 (1946) (upholding an FTC order for the production of a newspaper publishing corporation’s books and records as request was made pursuant to statute and was reasonably relevant). The Fifth Amendment privilege against self-incrimination offers protection when the act of production itself would be incriminating as an admission. E.g., United States v. Hubbell, 530 U.S. 27 (2000).</span></li><span style="font-size: x-small;">
</span><li><span style="font-size: x-small;">138 S.Ct. 2206 (2018), available at <a href="https://scholar.google.com/scholar_case?case=14655974745807704559">https://scholar.google.com/scholar_case?case=14655974745807704559</a>.</span></li><span style="font-size: x-small;">
</span><li><span style="font-size: x-small;">18 U.S.C. § 2703(d).</span></li><span style="font-size: x-small;">
</span><li><span style="font-size: x-small;">Id. at 2220.</span></li><span style="font-size: x-small;">
</span><li><span style="font-size: x-small;">Id.</span></li><span style="font-size: x-small;">
</span><li><span style="font-size: x-small;">Cf. id. at 2266-67 (Gorsuch, J., dissenting and asking "Why is the relevant fact the seven days of information the government asked for instead of the two days of information the government actually saw? ... And in what possible sense did the government 'search' five days' worth of location information it was never even sent?").</span></li><span style="font-size: x-small;">
</span><li><span style="font-size: x-small;">See Maryland v. Pringle, 540 U.S. 366, 371-72 (2003) (finding probable cause for arresting three men in a car after finding $763 of rolled-up cash
in the glove compartment and five plastic glassine baggies of cocaine were behind the back-seat armrest).</span></li><li><span style="font-size: x-small;">See David H. Kaye, <a href="https://ssrn.com/abstract=2376467" target="_blank">Why So Contrived? DNA Databases After <i>Maryland v. King</i></a>, 104 J. Crim. L. & Criminology 535 (2014); David H. Kaye, <a href="http://ssrn.com/abstract=2043259" target="_blank">A Fourth Amendment Theory for Arrestee DNA and Other Biometric Databases</a>, 15 U. Pa. J. Const. L. 1095 (2013).</span></li><span style="font-size: x-small;">
</span><li><span style="font-size: x-small;">Whether accessing the Guthrie cards for criminal investigations is common or rare in New Jersey would not seem to affect the legality of the subpoenas. Of course, the extent of the access should be a matter of public concern, and widespread law enforcement use of the cards could prompt legislation to curtail the practice. But that is so whether or not the alleged invasions of "genetic privacy" are constitutional. Still, uncovering a widespread practice that is not only of general public interest, but also illegal, might add weight to the case for public disclosure under a balancing test for such disclosure. In that event, the allegations of unconstitutionality would not be superfluous to the complaint. Nonetheless, if the opinions on the state and federal law of search and seizure are overly rhetorical, one might wonder whether they go beyond a simple "statement of the facts on which the claim is based." Rules Governing the Courts of the State of New Jersey, Rule 4:5-2, available at <a href="https://www.njcourts.gov/attorneys/assets/rules/r4-5.pdf">https://www.njcourts.gov/attorneys/assets/rules/r4-5.pdf</a>.</span></li>
</ol>
DH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.com0tag:blogger.com,1999:blog-5354567765897135804.post-42291886369325086262022-07-09T18:54:00.003-04:002022-07-09T20:55:42.836-04:00Preliminary Results from a Blind Quality Control Program<p>The Houston Forensic Science Center recently reported the results of realistic, blind tests of its firearms examiners. Realism comes from disguising materials to look like actual casework and injecting these "mock evidence items" into the regular flow of business. The judgments of the examiners for the mock cases can be evaluated with respect to the true state of affairs (ammunition components from the same firearm as opposed to components from different firearms). Eagerly, I looked for a report of how often the examiners declared an association for pairs of items that were not associated with one another (false "identifications") and how often they declared that there was no association for pairs that were in fact associated (false "eliminations").
</p>
<p>These kinds of conditional "error rates" are by no means all there is to quality control and to improving examiner performance, which is the salutary objective of the Houston lab, but they are prominent in judicial opinions on the admissibility of firearms-toolmark evidence. So too, they (along with the cognate statistics of specificity and sensitivity) are established measures of the validity of tests for the presence or absence of a condition. Yet, I searched in vain for clear statements of these standard measures of examiner performance in the article by Maddisen Neuman, Callan Hundl, Aimee Grimaldi, Donna Eudaley, Darrell Stein and Peter Stout on "Blind Testing in Firearms: Preliminary Results from a Blind Quality Control Program," 67(3) J. Forensic Sci. 964-974 (2022).
</p>
<p>Instead, tables use a definition of "ground truth" that includes materials being intentionally "insufficient" or "unsuitable" for analysis, and they focus on whether "[t]he reported results either matched the ground truth or resulted in an inconclusive decision." (Here, "inconclusive" is different from insufficient" and "unsuitable." For the sake of readers who are unfamiliar with firearms argot, Table 1 defines--or tries to--the terminology for describing the outcomes of the mock cases.)</p>
<blockquote>
<div style="border: 1px solid black; padding-bottom: 4px; padding-left: 10px; padding-right: 5px; padding-top: 4px; padding: 4px 5px 4px 10px;">
<center><b>TABLE 1. Statements for the Outcome of an Examination</b>
<br />(adapted from p. 966 tbl. 1)</center>
<hr />
<center>Binary (Yes/No) Source Conclusions</center>
<br /><i>Identification</i>: A sufficient correspondence of individual characteristics will lead the examiner to the conclusion that both items (evidence and tests) originated from the same source.
<br /><i>Elimination</i>: A disagreement of class characteristics will lead the examiner to the conclusion that the items did not originate from the same source. In some instances, it may be possible to support a finding of elimination even though the class characteristics are similar when there is marked disagreement of individual characteristics.
<hr />
<center>Statements of No Source Conclusion</center>
<br /><i>Unsuitable</i>: A lack of suitable microscopic characteristics will lead the examiner to the conclusion that the items are unsuitable for identification.
<br /><i>Insufficient</i>: Examiners may render an opinion that markings on an item are insufficient when:
<br /><span style="margin-left: 5%;">• an item has discernible class characteristics but no individual characteristics</span>
<br /><span style="margin-left: 5%;">• an item does not exhibit class characteristics and has few individual characteristics of such poor quality that precludes an examiner from rendering an opinion;</span>
<br /><span style="margin-left: 5%;">• the examiner cannot determine if markings on an item were made by a firearm during the firing process; or</span>
<br /><span style="margin-left: 5%;">• the examiner cannot determine if markings are individual or subclass.</span>
<br /><i>Inconclusive</i>: An insufficient correspondence of individual and/or class characteristics will lead the examiner to the conclusion that no identification or elimination could be made with respect to the items examined.
<hr />
<u>Note on "identification"</u>: The identification of cartridge case/bullet toolmarks is made to the practical, not absolute, exclusion of all other firearms. This is because it is not possible to examine all firearms in the world, a prerequisite for absolute certainty. The conclusion that sufficient agreement for identification exists between toolmarks means that the likelihood that another firearm could have made the questioned toolmarks is so remote as to be considered a practical impossibility.
</div>
</blockquote>
<p>There were 51 mock cases containing anywhere from 2 to 41 items (median = 9). In the course of the five-and-a-half year study, 460 items were examined for a total of 570 judgments by only 11 firearms examiners, with experience ranging from 5.5 to 23 years. The mock evidence varied greatly in its informativeness, and the article suggests that the lab sought to use a greater proportion of challenging cases than might be typical.
</p>
<p>Whether or not the study is generalizable to other examiners, laboratories, and cases, the authors write that "no hard errors were observed; that is, no identifications were declared for true nonmatching pairs, and no eliminations were declared for true matching pairs." This sounds great, but how probative is the observation of "no hard errors"</p><p>Table 3 of the article states that there were 143 false pairs, of which 106 were designated inconclusive. It looks like the examiners were hesitant to make an elimination, even for a false pair. They made only 37 eliminations. Since there were no "hard errors," none of the false pairs were misclassified as identifications. Ignoring inconclusives, which are not presented as evidence for or against an association, the observed false-identification rate therefore was 0/37. Using the <a href="https://en.wikipedia.org/wiki/Rule_of_three_(statistics)" target="_blank">rule of three</a> for a quick approximation, we can estimate the 95% confidence interval as going from 0 to 3/37. To use phrasing like that in the 2016 PCAST Report, the false-positive rate could be as large as 1 in 9.
</p>
<p>Applying the same reasoning to the 386 true pairs, of which 119 were designated inconclusive, the observed false-elimination rate must have been 0/267. The 95% confidence interval for the false-elimination rate thus extends to about 3/267, or 1/89.
</p>
<p>These confidence intervals should not be taken too seriously. The simple binomial probability model implicit in the calculations does not hold for dependent comparisons. To quote the authors (p. 968), "Because the data were examined at the comparison level, an item of evidence can appear in the data set in multiple comparisons and be represented by multiple comparison conclusions. For example, Item 1 may have been compared to Item 2 and Item 3 with comparison conclusions of elimination and identification, respectively." Moreover, I could be misconstruing the tables. Finally, even if the numbers are all on target, they should not taken as proof that error rates are as high as the upper confidence limits. The intervals are merely indications of the uncertainty in using particular numbers as estimates of long-term error rates.
</p>
<p>In short, the "blind quality control" program is a valuable supplement to minimal-competency proficiency testing. The absence of false identifications and false eliminations is encouraging, but the power of this study to pin down the probability of errors at the Houston laboratory is limited.</p>
DH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.com0tag:blogger.com,1999:blog-5354567765897135804.post-82363665019493592852022-07-06T17:05:00.001-04:002022-07-12T10:21:02.874-04:00Why Did the Proposed Amendment to Rule 702 Scuttle the "Preponderance of the Evidence"?<p>After posting a <a href="http://for-sci-law.blogspot.com/2022/07/proposed-amendment-to-federal-rule-of.html" target="_blank">description of the changes</a> to the proposed amendment to Federal Rule of Evidence 702, I received the following inquiry:
</p>
<blockquote>
Which one is actually the proposal? "More likely than not" or "by a preponderance of the evidence"? The former seems to be a weakening, the latter (even if it is redundant for lawyers) puts forensic scientists on notice. Use of the word "evidence" in the latter is, however, potentially confusing. "Evidential reliability" is about the "reliability" [sic]
of the "evidence", i.e., the "scientific validity" of the methods applied to arrive at the "opinion". The proposed change (if it is the proposed change) seems to refer to "evidence" about the "reliability" of the "evidence" (in which the first and second instance of the word "evidence" do not refer to the same thing).
</blockquote>
<p>The first iteration of the amendment used "preponderance." It read, "[a]n [expert] witness ... may testify ... <span style="background-color: #f4cccc;">if the proponent has demonstrated by a preponderance of the evidence</span> that" the proposed evidence satisfies various requirements regarding what the Supreme Court in <i>Daubert v. Merrell Dow Pharmaceuticals</i>, <a href="https://supreme.justia.com/cases/federal/us/509/579/" target="_blank">509 U.S. 579</a> (1993), called "evidentiary reliability." Now the proposed text is, "An [expert] witness ... may testify ... <span style="background-color: #d9ead3;">if the proponent demonstrates to the court that it is more likely than not that</span>" the proposed evidence satisfies these requirements.
</p>
<p>Why the change? Partly because of the elliptical nature of the original formulation and partly because of the awkwardness of the construction "evidence that the evidence." As the rest of this posting explains, the new (green) version is better drafted, but the idea was never in doubt. <br /></p><p>The governing principle comes from Federal Rule of Evidence 104(a) as interpreted in <i>Bourjaily v. United States</i>, <a href="https://supreme.justia.com/cases/federal/us/483/171/" target="_blank">483 U.S. 171</a> (1987). The rule begins with a general observation that
</p>
<blockquote>The court must decide any preliminary question about whether a witness is qualified, a privilege exists, or evidence is admissible. In so deciding, the court is not bound by evidence rules, except those on privilege.</blockquote>
<p>Fed. R. Evid. 104(a). So to decide whether proffered evidence is admissible at trial, the court can consider all pertinent, non-privileged information presented to it, whether or not the information about admissibility would be admissible in a trial.
</p>
<p>But Rule 104 is silent on how confident the judge should be that the proposed evidence satisfies the requirements for admissibility. That is where <i>Bourjaily</i> comes in. In that case, the government wanted to introduce out-of-court statements of a coconspirator as evidence against the defendant. To avoid the rule against hearsay, it sought to persuade the court to apply the rule that certain statements of conspirators are admissible against everyone in the conspiracy. Defendant's membership in the conspiracy was thus a preliminary question for the court, and the <i>Bourjaily</i> Court explained that
</p>
<blockquote>
We are ... guided by our prior decisions regarding admissibility determinations that hinge on preliminary factual questions. We have traditionally required that these matters be established by a preponderance of proof. Evidence is placed before the jury when it satisfies the technical requirements of the evidentiary Rules, which embody certain legal and policy determinations. The inquiry made by a court concerned with these matters is not whether the proponent of the evidence wins or loses his case on the merits, but whether the evidentiary Rules have been satisfied. Thus, the evidentiary standard is unrelated to the burden of proof on the substantive issues, be it a criminal case ... or a civil case. ... The preponderance standard ensures that, before admitting evidence, the court will have found it more likely than not that the technical issues and policy concerns addressed by the Federal Rules of Evidence have been afforded due consideration. ... Therefore, we hold that, when the preliminary facts relevant to Rule 801(d)(2)(E) are disputed, the offering party must prove them by a preponderance of the evidence.
</blockquote>
<p>483 U.S. at 175-76 (note omitted).</p>
<p>Applying Bourjaily to the preliminary questions in Rule 702, it is quite clear that the trial court has to find that "evidentiary reliability" under Rule 702 is more probable than not. To foreclose any debate about it, in <i>Daubert</i> itself, the Court pointed to the preponderance standard, writing that "[f]aced with a proffer of expert scientific testimony, then, the trial judge must determine at the outset, pursuant to Rule 104(a), whether the expert is proposing to testify to (1) scientific knowledge that (2) will assist the trier of fact to understand or determine a fact in issue." 509 U.S. at 592.
</p>
<p>Yet, many public commenters did not see this. Some comments claimed that the word "evidence" in "preponderance of the evidence" would constrain the court to considering only such evidence as would be admissible at trial in deciding whether the proposed expert testimony is admissible. Other comments claimed that the phrase would keep previously admissible evidence from juries. Indeed, "almost all of the fire was directed toward the term 'preponderance of the evidence.'” Advisory Comm. on Evid. Rules, Report to the Standing Committee, May 15, 2022, at 7.
</p>
<p>The Advisory Committee unabashedly rejected both these claims. In its report to the Standing Committee, it wrote that:</p>
<blockquote>
The Committee does not agree that the preponderance of the evidence standard would limit the court to considering only admissible evidence; the plain language of Rule 104(a) allows the court deciding admissibility to consider inadmissible evidence. Nor did the Committee believe that the use of the term preponderance of the evidence would shift the factfinding role from the jury to the judge, for the simple reason that, when it comes to making preliminary determinations about admissibility, the judge <i>is</i> and <i>always has been</i> a factfinder.
</blockquote>
<p>Id. Nevertheless,</p>
<blockquote>
[T]he Committee recognized that it would be possible to replace the term “preponderance of the evidence” with a term that would achieve the same purpose while not raising the concerns (valid or not) mentioned by many commentators. The Committee unanimously agreed to change the proposal as issued for public comment to provide that the proponent must establish that it is “<i>more likely than not</i>” that the reliability requirements are met. This standard is substantively identical to “preponderance of the evidence” but it avoids any reference to “evidence” and thus addresses the concern that the term “evidence” means only admissible evidence.
</blockquote>
<p>Id. Finally,
</p>
<blockquote>
The Committee was also convinced by the suggestion in the public comment that the rule should clarify that it is the court and not the jury that must decide whether it is more likely than not that the reliability requirements of the rule have been met. Therefore, the Committee unanimously agreed with a change requiring that the proponent establish “<i>to the court</i>” that it is more likely than not that the reliability requirements have been met. The proposed Committee Note was amended to clarify that nothing in amended Rule 702 requires a court to make any findings about reliability in the absence of a proper objection.
</blockquote>
<p>Id. Overlooked in this debate over the niceties of the phrase "preponderance of the evidence" is a different drafting point. The proposed amendment makes it explicit that the standard pertains to the court's role in considering scientific validity, but it does not do the same for the other requirements of Rule 702--namely, that the witness be "qualified as an expert by knowledge, skill, experience, training, or education." That a witness is qualified to testify also must be established as more probable than not. For a rare case excluding testimony from a latent fingerprint examiner because she ran into problems in demonstrating proficiency, see United States v. Cloud, No. 1:19-cr-02032-SMJ-1, 2021 WL 7184484 (E.D. Wash. Dec. 17, 2021) (false exclusion in casework, a false exclusion on a proficiency test, and receiving help from her supervisor on a follow-up proficiency test).
</p>
DH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.com0tag:blogger.com,1999:blog-5354567765897135804.post-90345784321722051272022-07-01T15:20:00.000-04:002022-07-01T15:20:19.857-04:00Proposed Amendment to Federal Rule of Evidence 702 Clears More Hurdles<div style="background-color: #d9ead3; border-radius: 10px; border: 1px solid black; padding-bottom: 4px; padding-left: 10px; padding-right: 5px; padding-top: 4px; padding: 4px 5px 4px 10px;">The following report appeared in the OSAC newsletter <i>OSAC In Brief</i>, June 2022, at 4-6 with the title "Proposed Amendment to Federal Rule of Evidence 702 Clears More Hurdles." It updates a report in the June 2022 issue (posted earlier today on this blog). Both reports are meant to be boringly factual. More opinionated remarks may appear later.
</div>
<p>After five years of discussion, a proposed amendment to Federal Rule of Evidence 702 on testimony by expert witnesses has progressed to the <a href="https://www.uscourts.gov/about-federal-courts/governance-judicial-conference/about-judicial-conference" target="_blank">Judicial Conference</a> of the United States—the policy-making arm of the federal judiciary. If the Judicial Conference accepts the unanimous recommendations of both its Advisory Committee on Evidence Rules, which drafted the amendment, and its standing Committee on Rules of Practice and Procedure, which endorsed it this month, the amendment will be delivered to the Supreme Court for transmittal to Congress. Then, unless Congress intervenes, it will become effective by the end of next year.
</p>
<p>But what effect would it have? According to the Advisory Committee chair, U.S. District Court Judge <a href="https://en.wikipedia.org/wiki/Patrick_J._Schiltz" target="_blank">Patrick Schiltz</a>, the amendment does not alter the meaning of the rule in the slightest. “It simply makes it clearer, makes it easier for people to understand, so that fewer mistakes will be made” (as reported June 7, in <i>Bloomberg Law</i>). Box 1 shows the proposed changes, which differ slightly from those discussed in the <i>OSAC In Brief</i> article of July 2021.
</p>
<div style="background-color: antiquewhite; border-radius: 5px; border: 1px solid black; padding-bottom: 4px; padding-left: 10px; padding-right: 5px; padding-top: 4px; padding: 4px 5px 4px 10px;">
BOX 1. Proposed Changes to Federal Rule of Evidence 702
<hr />
A witness who is qualified as an expert by knowledge, skill, experience, training, or education may testify in the form of an opinion or otherwise if <ins>the proponent demonstrates to the court that it is more likely than not that</ins>:<br />
(a) the expert's scientific, technical, or other specialized knowledge will help the trier of fact to understand the evidence or to determine a fact in issue;<br />
(b) the testimony is based on sufficient facts or data;<br />
(c) the testimony is the product of reliable principles and methods; and<br />
(d) the <del>expert has reliably applied</del> <ins>expert's opinion reflects a reliable application of</ins> the principles and methods to the facts of the case.
</div>
<p>On the face of it, the amendment does little, if anything, to alter the substance of the existing rule. It adds the words “if the proponent demonstrates to the court that it is more likely than not” in front of the criteria for admitting expert testimony, but the Supreme Court had already noted that in exercising a longstanding “gatekeeping” role, the district court needs to determine whether the conditions for admitting expert testimony are “established by a preponderance of proof.” (Daubert v. Merrell Dow Pharmaceuticals, 509 U.S. 579, 592 n.10 (1993) (citing Fed. Evid. 104(a); as a result of public comments, the Advisory Committee substituted “more likely than not” for the “preponderance of evidence” to describe the proponent’s burden of persuasion on the issue of admissibility).
</p>
<p>The other wording change concerns the well entrenched reliability-as-applied requirement (“the expert has reliably applied” in part (d)). The amendment uses an alternative phrase—“the expert's opinion reflects a reliable application.” Although one could argue that the specific reference to “opinion” limits the requirement to personal opinions, that is not the intent. An explanatory note that will accompany the revised rule (if and when it is adopted) makes it plain that it still must appear that the expert has applied a valid and reliable method proficiently and appropriately in making any and all findings and inferences. The only purpose of the change is “to emphasize that each expert opinion must stay within the bounds of what can be concluded from a reliable application” of a reliable method to the facts of the case. And, this Advisory Committee Note (ACN) adds that this directive is “is especially pertinent to the testimony of forensic experts,” for which “the judge should (where possible) receive an estimate of the known or potential rate of error of the methodology employed, based (where appropriate) on studies that reflect how often the method produces accurate results” rather than “assertions of absolute or one hundred percent certainty—or to a reasonable degree of scientific certainty ... .”
</p>
<p>During the six-month comment period that ended in February, the draft received well over 500 comments. The Reporter to the Advisory Committee found the public reaction “somewhat surprising, because the proposed amendment essentially seeks only to clarify the application of Rule 702 as it was amended in 2000—and that amendment received [only] 179 comments.” Lawyers from the plaintiffs’ side of the civil bar opposed the latest amendment, while defendants’ lawyers supported it.
</p>
<p>There were relatively few comments about the implications of the additional words and the accompanying note for the areas of forensic science covered by OSAC. These too were (predictably) divided. The National District Attorneys Association (NDAA) objected to the ACN’s singling out forensic-science testimony as a problem and saw the amendments as “a solution in search of a problem.” But the New York City Bar Association expressed “particular concern [with] criminal prosecutions” and “the scientific validity of many types of ‘feature-comparison’ methods of identification, such as those involving fingerprints, footwear and hair.” The New York State Crime Laboratory Advisory Committee (NYSCLAC) objected to “changes limiting forensic science testimony” but then maintained that its laboratories already complied with the guidance in the ACN. The Union of Concerned Scientists questioned parts of the NDAA and NYSCLAC statements and insisted that “forensic evidence should be required to present courts with estimates of error rates relevant to their methodologies.” The Innocence Project and other organizations and individuals submitted a joint statement praising the changes and pressing for more. They wanted the text of the rule to contain a requirement that testimony is not only “the product of reliable principles and methods” (the current wording), but also to specify that it “includes the limitations and uncertainty of those principles and methods.”
</p>
<p>The conflicting comments regarding forensic science produced no modifications. If the amendment is adopted, it will implement, to some extent, the 2016 recommendation of the President’s Council of Advisors on Science and Technology that “the Judicial Conference of the United States ... should prepare ... an Advisory Committee note, providing guidance to Federal judges concerning the admissibility under Rule 702 of expert testimony based on forensic feature-comparison methods.”
</p>
<p><i>Author’s disclaimer</i>: This report presents the views of the author. Their publication in <i>In Brief</i> is not an endorsement by NIST or OSAC, and they are not intended to represent the views of any OSAC unit. No estimate of the known or potential rate of error is available.
</p>DH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.com0tag:blogger.com,1999:blog-5354567765897135804.post-48924023542546052142022-07-01T11:36:00.001-04:002022-07-01T11:36:25.842-04:00Proposed Changes to Federal Rule of Evidence 702<div style="background-color: #d9ead3; border-radius: 10px; border: 1px solid black; padding-bottom: 4px; padding-left: 10px; padding-right: 5px; padding-top: 4px; padding: 4px 5px 4px 10px;">The following report appeared in the OSAC newsletter <i>OSAC In Brief</i>, July 2021, at 3-7 with the uninspired title "Proposed Changes to Federal Rule of Evidence 702." It was followed by an update in the June 2022 issue (about to be reproduced on this blog). Both are meant to be boringly factual. More opinionated remarks may appear later.
</div>
<p>On April 30, the federal Advisory Committee on Evidence Rules unanimously proposed two changes to the wording of Federal Rule of Evidence 702. The rule, which many states have adopted in one form or another, provides for testimony by expert witnesses. The changes do not alter the meaning of the rule, but they can be seen as a course-correction signal telling courts to be more vigorous in ensuring that “forensic expert testimony is valid, reliable, and not overstated in court.”
</p>
<p>The quoted words come from a report of the Advisory Committee. Facilitating such testimony also is part of OSAC’s <i>raison d’être</i>. This article for <i>In Brief</i> therefore describes the proposed amendment, a little bit of its history, the steps required for it to be enacted into law, and its significance for OSAC’s work.
</p>
<p><b>The Proposer: An Advisory Committee to the Standing Committee of the Judicial Conference</b>
</p>
<p>The <a href="https://www.uscourts.gov/about-federal-courts/governance-judicial-conference/about-judicial-conference" target="_blank">Judicial Conference</a> of the United States is the policymaking organ of the judicial branch of the federal government. Composed of the Chief Justice of the U.S. Supreme Court, the chief judges of the 13 federal judicial circuits, and select federal district judges, it also is required by statute “to carry on a continuous study of the operation and effect of the general rules of practice and procedure" that apply in the federal courts (and, with some variations, in many state court systems as well). The Conference relies on a “Committee on Rules of Practice and Procedure, commonly referred to as the ‘Standing Committee.’" The Standing Committee, in turn, relies on advisory committees on appellate, bankruptcy, civil, criminal, and evidence rules. These advisory committees are comprised of “federal judges, practicing lawyers, law professors, state chief justices, and representatives of the Department of Justice.” (Quotations are from the <a href="https://www.uscourts.gov/rules-policies/about-rulemaking-process/how-rulemaking-process-works/overview-bench-bar-and-public" target="_blank">Administrative Office</a> of the U.S. Courts.) The Advisory Committee on Evidence Rules (which we can abbreviate as ACER) is one of these committees.
</p>
<p><b>The Proposed Text: Two Wording Changes</b>
</p>
<p>Rule 702 went into effect in federal courts in 1975. It was one sentence long. The Supreme Court famously interpreted it in <i>Daubert v. Merrell Dow Pharmaceuticals</i>, 509 U.S. 579 (1993), a somewhat ambivalent and abstract opinion. The Court expounded further in cases in 1997 and 1999. The rule was rewritten to incorporate the teachings in these cases in 2000, leading to the version with the longer sentence in the right-hand side of Box 1.
</p>
<table bgcolor="antiquewhite" border="1" cellpadding="10">
<tbody>
<tr>
<td align="center" colspan="2"><b>BOX 1. FEDERAL RULE OF EVIDENCE 702 THEN AND NOW</b></td>
</tr>
<tr>
<td>The Rule in 1975</td>
<td>The Rule in 2021</td>
</tr>
<tr>
<td valign="top">If scientific, technical, or other specialized knowledge will assist the trier of fact to understand the evidence or to determine a fact in issue, a witness qualified as an expert by knowledge, skill, experience, training, or education, may testify thereto in the form of an opinion or otherwise.</td>
<td valign="top">A witness who is qualified as an expert by knowledge, skill, experience, training, or education may testify in the form of an opinion or otherwise if:<br />
(a) the expert’s scientific, technical, or other specialized knowledge will help the trier of fact to understand the evidence or to determine a fact in issue;<br />
(b) the testimony is based on sufficient facts or data;<br />
(c) the testimony is the product of reliable principles and methods; and<br />
(d) the expert has reliably applied the principles and methods to the facts of the case.</td>
</tr>
</tbody>
</table>
<p>The proposed amendment makes two seemingly minor changes, shown in Box 2:
</p>
<div style="background-color: antiquewhite; border: 1px solid black; padding-bottom: 4px; padding-left: 10px; padding-right: 5px; padding-top: 4px; padding: 4px 5px 4px 10px;">
<center><b>BOX 2. THE ADVISORY COMMITTEE’S PROPOSED AMENDMENT TO RULE 702</b></center>
<hr />
A witness who is qualified as an expert by knowledge, skill, experience, training, or education may testify in the form of an opinion or otherwise if <ins>the proponent has demonstrated by a preponderance of the evidence that</ins>:<br />
(a) the expert’s scientific, technical, or other specialized knowledge will help the trier of fact to understand the evidence or to determine a fact in issue;<br />
(b) the testimony is based on sufficient facts or data;<br />
(c) the testimony is the product of reliable principles and methods; and<br />
(d) the <del>expert has reliably applied</del> <ins>expert’s opinion reflects a reliable application of</ins> the principles and methods to the facts of the case.
</div>
<p>Reading these words, one might well ask what is going on. The first change seems to state the obvious (to lawyers, anyway). A footnote in Daubert already indicates that in the “preliminary assessment of whether the reasoning or methodology” possesses “evidentiary reliability,” the trial court must be satisfied by “a preponderance of proof” because that is the threshold for all “[p]reliminary questions concerning the qualification of a person to be a witness, the existence of a privilege, or the admissibility of evidence.” It may not hurt to state this standard in the text of the rule (although including it after the opening clause about qualifications awkwardly fails to modify the qualifications part of the rule). But why bother?
</p>
<p>Similarly, the change to Part (d) is potentially confusing because it limits the “reliable application” prong of the rule to expert “opinion” even though, as the Advisory Committee that drafted the original rule <a href="https://www.law.cornell.edu/rules/fre/rule_702" target="_blank">noted</a>, it is “logically unfounded” to “assume[] that experts testify only in the form of opinions.” Instead, “[t]he rule … recognizes that an expert on the stand may give a dissertation or exposition of scientific or other principles relevant to the case, leaving the trier of fact to apply them to the facts.” But aside from the probably unintended limitation of the as-applied prong to opinions, why bother? What is the difference between testimony when an expert has “reasonably applied the principles and methods” and testimony that “reflects a reasonable application of the principles and methods”?
</p>
<p>The answers lie in ACER’s official note prepared to accompany the rule, the minutes of its meetings, and its periodic reports to the Standing Committee on its progress in revising the rule.
</p>
<p><b>The Purpose of the New Text</b>
</p>
<p>For OSAC, the most salient parts of the note of the Advisory Committee are in Boxes 3 and 4. As to the first change, regarding “preponderance,” ACER believed that
</p>
<div style="background-color: whitesmoke; border: 1px solid black; padding-bottom: 4px; padding-left: 10px; padding-right: 5px; padding-top: 4px; padding: 4px 5px 4px 10px;">
<center>BOX 3. Part of ACER’s Proposed Note Explaining Its First Proposed Change</center>
<hr />
[M]any courts have held that the critical questions of the sufficiency of an expert’s basis, and the application of the expert’s methodology, are questions of weight and not admissibility. These rulings are an incorrect application of Rules 702 and 104(a). … The Committee concluded that emphasizing the preponderance standard in Rule 702 specifically was made necessary by the courts that have failed to apply correctly the reliability requirements of that rule. … [Explicitly incorporating the standard] means that once the court has found the admissibility requirement to be met by a preponderance of the evidence, any attack by the opponent will go only to the weight of the evidence.
</div>
<p>A major push for this change came from individuals and organizations concerned with civil litigation in which, they believed, courts have admitted expert opinions that a drug or chemical is harmful without adequately verifying that there is a body of scientific literature sufficient to let a reasonable expert conclude that the substance can cause the kind of harm claimed to have occurred under the conditions of the case. However, it also will remind judges in criminal cases that they must have proof that the scientific literature is sufficient to support the findings of forensic-science experts.
</p>
<p>As Box 4 shows, the second part of the “amendment is especially pertinent to the testimony of forensic [science] experts in both criminal and civil cases”:
</p>
<div style="background-color: whitesmoke; border: 1px solid black; padding-bottom: 4px; padding-left: 10px; padding-right: 5px; padding-top: 4px; padding: 4px 5px 4px 10px;">
<center>BOX 4. Part of ACER’s Proposed Note Explaining Its Second Proposed Change</center>
<hr />
Rule 702(d) has also been amended to emphasize that a trial judge must exercise gatekeeping authority with respect to the opinion ultimately expressed by a testifying expert. … The amendment is especially pertinent to the testimony of forensic experts in both criminal and civil cases. Forensic experts should avoid assertions of absolute or one hundred percent certainty—or to a reasonable degree of scientific certainty—if the methodology is subjective and thus potentially subject to error. In deciding whether to admit forensic expert testimony, the judge should (where possible) receive an estimate of the known or potential rate of error of the methodology employed, based (where appropriate) on studies that reflect how often the method produces accurate results. Expert opinion testimony regarding the weight of feature comparison evidence (i.e., evidence that a set of features corresponds between two examined items) must be limited to those inferences that can reasonably be drawn from a reliable application of the principles and methods. This amendment does not, however, bar testimony that comports with substantive law requiring opinions to a particular degree of certainty. … [N]othing in the amendment requires the court to nitpick an expert’s opinion in order to reach a perfect expression of what the basis and methodology can support. The … standard does not require perfection. On the other hand, it does not permit the expert to make extravagant claims that are unsupported by the expert’s basis and methodology.
</div>
<p>It is the ACER note, much more than the revisions to the text of the rule, that has implications for forensic-science evidence. As the note indicates, the committee was especially concerned with forensic-science testimony. Its briefing materials included summaries of federal cases from across the spectrum of forensic sciences that raised the issue of “overstatement.” Furthermore, the idea of a new Advisory Committee Note came from the 2016 report of the President’s Council of Advisors on Science and Technology. PCAST called on “the Judicial Conference [to] prepare, with advice from the scientific community, a best practices manual and an Advisory Committee note, providing guidance to Federal judges concerning the admissibility under Rule 702 of expert testimony based on forensic feature-comparison methods.”
</p>
<p>Apparently, PCAST did not realize that ACER is not empowered to write new notes to old rules. At a symposium convened by ACER in 2017, PCAST co-chair and newly appointed Presidential science advisor, Eric Lander, advised the committee as follows: “If an advisory note is a possibility, I’d favor it. If it’s not, change a comma in the rule and then write a new advisory note. Change one word, any word and write an advisory note.” Advisory Comm. on Evid. Rules Symposium on Forensic Expert Testimony, Daubert, and Rule 702, 86 Ford. L. Rev. 1463, 1523 (2018). This change-a-word artifice is more or less what is happening.
</p>
<p><b>What Is Next in the Rulemaking Process?</b>
</p>
<p>The proposed amendment is just that—proposed. To become law, the ACER amendment and accompanying note must be approved by the Standing Committee after a six-month period for public comment and testimony (after which ACER reviews and can revise the proposed amendment and seek more comment). The Standing Committee then reviews the final drafts. It can revise and return the draft to ACER, or it can submit the amendment and note to the full Judicial Conference for its review. If the Judicial Conference approves, the drafts go to the Supreme Court, which normally transmits them to Congress with no substantive review. Congress then can adopt, reject, modify, or defer the rule change, but if Congress is silent for seven months, the amendment becomes effective at the end of the year.
</p>
<p>Plainly, the proposal, which was four years in the making, still has a long way to go, but the very fact that ACER deliberated at length and expressed concern about forensic-science testimony, overstatement, and error probabilities could have more immediate impact in litigation.
</p>
<p><b>Implications for OSAC</b>
</p>
<p>To help satisfy the proof requirements of Rule 702 (both as it stands and as it might be amended), subcommittees drafting standards for making findings and for reporting or testifying should specifically cite the scientific literature that supports each part of the standard. Valid estimates of potential error rates (or related statistics on the accuracy of results), or procedures to arrive at these estimates, should be part of such standards. Scientific and Technical Review Panels (STRPs) already are instructed to look for this content or for an explanation in the standard of why methods for ascertaining and expressing uncertainty in measurements, observations, or inferences are not present in the standards they review.
</p>
<p>The repeated references to “overstatement” in ACER’s deliberations and materials should reinforce the desire of OSAC units to address the admittedly difficult problem of prescribing standards for testimony—and to use phrases in all standards that involve results that will satisfy the insistence on “those inferences that can reasonably be drawn from a reliable application of the principles and methods.” Cases on firearms-toolmark identifications (called “ballistics” cases in the ACER materials) suggest that judicial efforts are unlikely to produce the best solution. The Department of Justice has attempted to confront this issue with its Uniform Language for Testimony and Reports standards (ULTRs). It argued to ACER that these ULTRs help solve the problem of overclaiming, but one response was that because there are no such standards in laboratories generally, a new Advisory Committee Note is necessary. OSAC units still can help fill this gap if they act quickly.
</p>
<p><i>Disclaimer</i>: This report presents the views of the author. Their publication in In Brief is not an endorsement by NIST or OSAC, and they are not intended to represent the views of any OSAC unit. The error rate associated with them is not known.
</p>DH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.com0tag:blogger.com,1999:blog-5354567765897135804.post-59759679429672606682022-06-11T15:05:00.002-04:002022-06-13T22:06:08.143-04:00State v. Ghigliotti, Computer-assisted Bullet Matching, and the ASB Standards<p>In <a href="https://scholar.google.com/scholar_case?case=11816405878614997214" target="_blank"><i>State v. Ghigliotti</i></a>, 232 A.3d 468, 471 (N.J. App. Div. 2020), a firearms examiner concluded that a particular gun did not fire the bullet (or, more precisely, a bullet jacket) removed from the body of a man found shot to death by the side of a road in Union County, New Jersey. That was 2005, and the case went nowhere.<br /></p><p>Ten years later, a detective prevailed on a second firearms examiner to see what he thought of the toolmark evidence. After considerable effort, this examiner reported that the microscopic comparisons with many test bullets from the gun in question were inconclusive.
</p>
<p>However, at a training seminar in New Orleans he learned of two tools developed and marketed by <a href="https://www.ultra-forensictechnology.com/en/about-us" target="_blank">Ultra Electronics Forensic Technology</a>, the creator of the Integrated Ballistics Identification System (IBIS) that "can find the 'needle in the haystack', suggesting possible matches between pairs of spent bullets and cartridge cases, at speeds well beyond human capacity. The <a href="https://www.ultra-forensictechnology.com/en/our-products/ballistic-identification/bullettrax" target="_blank">Bullettrax</a> system “digitally captures the surface of a bullet in 2D and 3D, providing a topographic model of the marks around its circumference.” As “[t]he world’s most advanced bullet acquisition station” it uses “intelligent surface tracking that automatically adapts to deformations of damaged and fragmented bullets.”
</p>
<p>The complementary <a href="https://www.ultra-forensictechnology.com/en/our-products/ballistic-identification/matchpoint" target="_blank">Matchpoint</a> is an “analysis station” with “[p]owerful visualization tools [that] go beyond conventional comparison microscopes to ease the recognition of high-confidence matches. Indeed, Matchpoint increases identification success rates while reducing efforts required for ultimate confirmations.” It features multiple side-by-side view of images from the Bullettrax data and score analysis. The court explained that “the Matchpoint software ... included tools for flattening and manipulating the images, adjusting the brightness, zooming in, and ‘different overlays of ... color scaling.’”
</p>
<p>But the examiner did not make the comparisons based on the digitally generated and enhanced images, and he did not rely on any similarity-score analysis. Rather, he “looked at the images side-by-side on a computer screen using Matchpoint [only] ‘to try and target areas of interest to determine ... if (he) was going to go back and continue with further [conventional] microscopic comparisons or not.’” He found four such areas of agreement. Conducting a new microscopic analysis of these and other areas a few weeks later, he “‘came to an opinion of an identification or a positive identification’ ... grounded in his ‘training and experience and education as a practitioner in firearms identification’ and his handling of over 2300 cases.” 232 A.3d at 478–49.
</p>
<p>The trial court “determined that a <i>Frye</i> hearing was necessary to demonstrate the reliability of the computer images of the bullets produced by BULLETTRAX before the expert would be permitted to testify at trial.” Id. at 471, The state filed an interlocutory appeal, arguing that the positive identification did not depend on Ultra’s products. The Appellate Division affirmed, holding that the hearing should proceed.
</p>
<p>I do not know where the case stands, but its facts provide the basis for a thought experiment. At about the same time as the <i>Ghigliotto</i> court affirmed the order for a hearing, the American Academy of Forensic Sciences Standards Board (ASB) published a package of standards on toolmark comparisons. Created in 2015, ASB describes itself as “an ANSI [American National Standards Institute]-accredited Standards Developing Organization with the purpose of providing accessible, high quality science-based consensus forensic standards.” Academy Standards Board, <a href="https://www.aafs.org/academy-standards-board/about-asb" target="_blank">Who We Are</a>, 2022. Two of its standards concern three-dimensional (3D) data and inferences in toolmark comparisons, while the third is specific to software for comparing 2D or 3D data.
</p>
<p>We can put the third to the side, for it is limited to software that "seeks to assess both the level of geometric similarity
(similarity of toolmarks) and the degree of certainty that the observed similarity results from a common origin." ANSI/ASB Standard 062, Standard for Topography Comparison Software for Toolmark Analysis § 3.1 (2021). The data collection and visualization software here does neither, and the scoring feature of Matchpoint was not used.
</p>
<p>ANSI/ASB Standard 061, Firearms and Toolmarks 3D Measurement Systems and Measurement Quality Control (2021), is more apposite although it is only intended “to ensure the instrument’s accuracy, to conduct instrument calibration, and to estimate measurement uncertainty for each axis (X, Y, and Z).” It promises “procedures for validation of 3D system hardware” but not software. It “does not apply to legacy 2D type systems,” leaving one to wonder whether there are any standards for validating them.
</p>
<p>Even for "3D system hardware," the procedure for “developmental validity” (§ 4.1) is nonexistent. There are no criteria in this standard for recognizing when a measurement system is valid and no steps that a researcher must follow to study validity. Instead, the section on “Developmental Validation (Mandatory)” states that an “organization with appropriate knowledge and/or [sic] expertise” shall complete “a developmental validation”; that this validation “typically” (but not necessarily) consists of library research (“identifying and citing previously published scientific literature”); and that “ample”—but entirely uncited— literature exists “to establish the underlying imaging technology” for seven enumerated technologies. In full, the three sentences on “developmental validation” are
</p><blockquote>
<div style="background-color: #ffe9ec; border-radius: 10px; border: 1px solid black; padding-bottom: 4px; padding-left: 10px; padding-right: 5px; padding-top: 4px; padding: 4px 5px 4px 10px;">
As per ANSI/ASB Standard 063, Implementation of 3D Technologies in Forensic Firearm and Toolmark Comparison Laboratories, a developmental validation shall be completed by at least one organization with appropriate knowledge and/or expertise. The developmental validation of imaging hardware typically consists of identifying and citing previously published scientific literature establishing the underlying imaging technology. The methods defined above of coherence scanning interferometry, confocal microscopy, confocal chromatic microscopy, focus variation microscopy, phase-shifting interferometric microscopy, photometric stereo, and structured light projection all have ample published scientific literature which can be cited to establish an underlying imaging technology.
</div>
</blockquote>
<p>Perhaps the section is merely there to point the reader to the different standard, ASB 063, on implementation of 3D technologies. \1/ But that standard seems to conceive of “developmental validation” as a process that occurs in a forensic laboratory or other organization by a predefined process with a “technical reviewer” to sign off on the resulting document that becomes the object of further review through “[p]eer-reviewed publication (or other means of dissemination to the scientific community, such as a peer-reviewed presentation at a scientific meeting).” § 4.1.3.4. The data and the statistics needed to assess measurement validity are left to the readers' imaginations (or statistical acumen). \2/
</p>
<p>ASB 061 devotes more attention to what it calls “deployment validation” on the part of every laboratory that chooses to use a 3D measuring instrument. This part of the standard describes some procedures for checking whether X, Y, and Z “scales” that should reveal whether measurements of the coordinates of points on the surface of the material are close to what they should be. For example, § 4.2.5.4.1 specifies that</p>
<blockquote>
Using calibrated geometric standards (e.g., sine wave, pitch, step heights), measurements shall be conducted to check the X and Y lateral scales as well as the vertical Z scale. Ten measurements shall be performed consecutively ... . The measurement uncertainty of the repeatability measurements shall overlap with the certified value and uncertainty of the geometric standard used.
</blockquote>
<p>The phrasing is confusing (to me, at least). I assume that a “geometric standard” is the equivalent of a ruler of known length (a “certified value” of, say, 1 ± 0.01 microns). But what does the edict that “[t]he measurement uncertainty of the repeatability measurements shall overlap with the certified value and uncertainty of the geometric standard used” mean operationally?
</p>
<p>The best answer I can think of is that the standard contemplates comparing two intervals. One is the scale value (along, say, the X-axis). Imagine that the “geometric standard” that is taken to be the truth is certified as having a length of 1± 0.01 microns. Let’s call this the “certified standard interval.”
</p>
<p>Now the laboratory makes ten measurements for its “deployment validation” to produce what we can call a “sample interval” from the ten measurements. The ASB standard does not contain any directions on how this is to be done. One approach would be to compute a confidence interval on the assumption that the sample measurements are normally distributed. Suppose the observed sample mean for them is 0.80, and the standard error computed from the ten sample measurements is <i>s</i> = 0.10 microns. The confidence interval is then 0.80 ± <i>k</i>(0.10), where is <i>k</i> is some constant. If the confidence interval includes any part of the certified interval, this part of the deployment-validation requirement is met.
</p>
<p>What values of <i>k</i> would be suitable for the instrument to be regarded as “deploymentally valid”? The standard is devoid of any insight into this critical value and its relationship to confidence. It does not explain what the interval-overlap requirement is supposed to accomplish, but if the confidence interval is part of it, it is an ad hoc form of hypothesis testing with an unstated significance level.
</p>
<p>Is the question of whether the hypothesis that there is no difference between the standard reference value of 1 and the true sample mean can be rejected at some preset significance level all that important here? Should not the question be how much the disparities between the sample of ten measured values and the geometric-standard value would affect the efficacy of the measurements? An observed sample mean that is 20% too low does not lead to the rejection of the hypothesis that the instrument’s measurements are, in the long run, exactly correct, but with only ten measurements in the sample, that may tell us more about the lack of the statistical power of the test than about the ability of the instrumentation to measure what it seeks to measure with suitable accuracy for the applications to which it is put.
</p>
<p>In sum, the standard’s section on “Developmental Validation (Mandatory)” mandates nothing that is not trivially obvious—the court already knows that it should look for support for the 3D scanning and image-manipulation methods in the scientific literature, and the standard does not reveal what the substance of this validation should be. “Deployment Validation (Mandatory)” is supposed to ensure that the laboratory is properly prepared to use a previously validated system for casework. It is of little use in a hearing on the general acceptance of the scanning system and the theories behind it. (One could argue that scientists would accept a system that a laboratory has been rigorously pretested and shown to be perform accurately, even with no other validation, but it is not clear that the standard describes an appropriate, rigorous pretesting procedure.)
</p>
<p>Moreover, the standard explicitly excludes software from its reach, making it inapplicable to the Matchpoint image-manipulation tools the helped the examiner in <i>Ghigliotti</i> zero in on the regions that altered his opinion. The companion standard on software does not fill this gap, for it deals only with software that produces similarity scores or random-match probabilities. Finally, ASB 063's substantive requirements for "deployment validation" prior to laboratory implementation might well prohibit an examiner from going to the developer of hardware and software not yet adopted by his or her employer for help with locating features for further visual analysis, as occurred in <i>Ghigliotti</i>. But that is not responsive to the legal question of whether the developer's system is generally accepted as valid in the scientific community.
</p>
<center>NOTES</center>
<ol>
<li><span style="font-size: x-small;">ANSI/ASB 063 is even more devoid of references. The entire bibliography consists of <a href="https://asq.org/quality-resources/control-chart" target="_blank">a webpage</a> entitled “control chart.” There, attorneys, courts, or experts seeking to use the standard will discover that a “control chart is a graph used to study how a process changes over time.” That is great for quality control of instrumentation, but it is irrelevant to validation.</span>
</li>
<li><span style="font-size: x-small;">Under § 4.1.2.4,
"The plan for developmental validation study shall include the following:<br />
"a) the limitations of the procedure;<br />
"b) the conditions under which reliable results can be obtained;<br />
"c) critical aspects of the procedure that shall be controlled and monitored;<br />
"d) the ability of the resulting procedure to meet the needs of the given application."</span>
</li>
</ol>
<p><span style="font-size: x-small;">Last updated: 12 June 2022</span>
</p>DH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.com0tag:blogger.com,1999:blog-5354567765897135804.post-81731898855344443322022-05-24T15:23:00.003-04:002022-05-24T20:58:24.441-04:00The New York Court of Appeals Returns to Probabilistic Genotyping Software (Part III—Six Empirical Studies)<p>New York’s Court of Appeals returned to the contentious issue of “probabilistic genotyping software” (PGS) in <i>People v. Wakefield</i>, 2022 N.Y. Slip Op. 02771, 2022 WL 1217463 (N.Y. Apr. 26, 2022). <a href="http://for-sci-law.blogspot.com/2022/05/the-new-york-court-of-appeals-returns.html" target="_blank">As previously discussed</a>, in <i>People v. Williams</i>, 147 N.E.3d 1131 (N.Y. 2020), a slim majority of the court had reasoned that the output of a computer program should not have been admitted without a full evidentiary hearing on the program's general acceptance within the scientific community.</p><p>In <i>Wakefield</i>, the Court of Appeals faced a different question for a more complex computer program. This time, the question was whether, after holding such a hearing, the trial court erred in finding that the more sophisticated program was generally accepted as a scientifically valid and reliable means of estimating “likelihood ratios” for DNA mixtures like the ones recovered in the case. The program, known as <a href="https://www.cybgen.com/solutions/products.shtml," target="_blank">TrueAllele</a>, is marketed by <a href="https://www.cybgen.com/company/" target="_blank">Cybergenetics</a>, “a Pittsburgh-based bioinformation company [whose] computers translate DNA data into useful information.”
</p>
<p><a href="http://for-sci-law.blogspot.com/2022/05/the-new-york-court-of-appeals-returns_24.html" target="_blank">As discussed separately</a>, the <i>Wakefield</i> court held that, in the circumstances of the case, the output of TrueAllele was admissible to associate the defendant with a murder. It emphasized “multiple validation studies ... demonstrat[ing] TrueAllele's reliability, by deriving reproducible and accurate results from the interpretation of known DNA samples.” 2022 WL 1217463 at *7. But the court did not describe the level of accuracy attained in any of the validation studies. That is surely something lawyers would want to know about, so I decided to read the “peer-reviewed publications in scientific journals” (id.) to which the court must have been referring.
</p>
<p>The state introduced 31 exhibits at the evidentiary hearing in 2015. Nine were journal publications of some kind. Six of those described data collected to establish (or indirectly suggesting) that TrueAllele was accurate. Only three of them relied on “known DNA samples” as opposed to samples from casework. The synopses that follow do not describe all the parts of them, let alone all the findings from them. I merely pick out the parts that I found most interesting and most pertinent to the question of accuracy or error (two sides of the same coin).
</p>
<p><b>The 2009 Cybergenetics Known-samples Study</b>
</p>
<p>The first study is M.W. Perlin & A. Sinelnikov, <a href="https://doi.org/10.1371/journal.pone.0008327 (2009)" target="_blank">An Information Gap in DNA Evidence Interpretation</a>. 4 PLoS ONE e8327. This experiment used 40 laboratory-constructed two-contributor mixture samples (from two pairs of unrelated individuals) with varying mixture proportions and total DNA amounts (0.125 ng to 1 ng) to show that TrueAllele was much better at classifying a sample as containing a contributor’s DNA than was the cumulative probability of inclusion method (CPI) that employed peak-height thresholds for binary determinations of the presence of alleles. TrueAllele’s likelihood ratios (LRs) supported the hypothesis of inclusion in nearly every instance (LR>1).
</p>
<p>However, the data could not reveal whether the level of positive support (log-LR) was accurate. Does a computed LR of 1,000,000 “really” indicate evidence that is five orders of magnitude more probative than a computed LR of 10? The “empirical evidence” from the study cannot answer this question. The best we can do is to verify that the computed LR increases as the quantity of DNA does. The uncertainty inherent in the PCR process is smaller for larger starting quantities, and this should be reflected in the magnitude of the LR.
</p>
<p><b>The 2011 Cybergenetics–New York State Police Casework Study</b>
</p>
<p>The second study also used two-contributor mixtures, but these came from casework in which the alleles, as ascertained by conventional methods, did not exclude the defendant as a possible contributor. In Mark W. Perlin et al., <a href="https://www.cybgen.com/information/publication/2011/JFS/Perlin-Duceman-Validating-TrueAllele-DNA-mixture-interpretation/paper.pdf" target="_blank">Validating TrueAllele DNA Mixture Interpretation</a>, 56 J. Forensic Sci. 1430 (2011), researchers from Cybergenetics and the New York State Police laboratory selected “16 two-person mixture samples” that met certain criteria “from 40 adjudicated cases and one proficiency test conducted in” the New York laboratory. TrueAllele generated larger LRs than those from the manual analyses. That TrueAllele did not produce LRs < 1 (indicative of exclusions) for any defendant included by conventional analysis is evidence of a low false-exclusion probability. The computed LRs are greater than 1 when they should be. But this empirical evidence does not directly address the question of whether the magnitude of the LRs themselves are as close or as far from 1 as they should be if they are to be understood as a Bayes' factor.
</p>
<p><b>The 2013 Cybergenetics</b><b><b>–</b>New York State Police Casework Study</b>
</p>
<p>The third study is more extensive. In Mark W. Perlin et al., New York State TrueAllele ® Casework Validation Study, 58 J. Forensic Sci. 1458 (2013), Cybergenetics worked with the New York laboratory to reanalyze DNA mixtures with up to three contributors from 39 adjudicated cases and two proficiency tests. “Whenever there was a human result, the computer’s genotype was concordant,” and TrueAllele “produced a match statistic on 81 mixture items ... , while human review reported a statistic on [only] 25 of these items.” </p><p>This time Cybergenetics also tried to answer the question of how often TrueAllele produces false “matches” (LR>1) when it compares a known noncontributor’s sample to a mixed sample. It accomplished this by simulating false pairs of samples for TrueAllele to process. As the authors explained,
</p>
<blockquote>
We compared each of the 87 matched mixture evidence genotypes with the (<87) reference genotypes from the other 40 cases. Each of these 7298 comparisons should generate a mismatch between the unrelated genotypes from different cases and hence a negative log(LR) value. A genotype inference method having good specificity should exhibit mismatch information values [log-LRs] that are negative in the same way that true matches are positive.
</blockquote>
<p>Id. at 1461. Thus, they derived two empirical distributions for likelihood ratios—one for the nonexcluded defendants in the cases (who we would expect to be actual sources)—and one for the unrelated individuals (who we would expect to be non-sources). The empirical distributions were well separated, and the log(LR) was always less than zero for the presumed non-sources. </p><p>So TrueAllele seems to work well as a classifier (for distinguishing true-source pairs from false-source pairs) in these small-scale studies. But again, the question of whether the magnitudes of its LRs are highly accurate remains. With astronomically large LRs, it is hard to know the answer. Cf. David H. Kaye, Theona M. Vyvial & Dennis L. Young, <a href="https://papers.ssrn.com/abstract_id=2705941" target="_blank">Validating the Probability of Paternity</a>, 31 Transfusion 823 (1991). \1/
</p>
<p><b>The 2013 UCF–Cybergenetics Known-samples Study</b>
</p>
<p>The fourth study is J. Ballantyne, E.K. Hanson & M.W. Perlin, DNA Mixture Genotyping by Probabilistic Computer Interpretation of Binomially-sampled Laser Captured Cell Populations: Combining Quantitative Data for Greater Identification Information, 53 Sci. & Justice 103–114 (2013). It is not a validation study, but researchers from the University of Central Florida and Cybergenetics made two different two-person mixtures with equal quantities of DNA from each person. In such 50:50 mixtures, peak heights are expected to be similar, making it harder to fit the pattern of alleles into the pairs (single-locus genotypes) from each contributor than if there had been a major and minor contributor. So the team created ten small (20 cell) subsamples of each of the two mixed DNA samples by selecting cells at random. They analyzed these subsamples separately. They used TrueAllele to estimate the relative contributions (“mixture weights”) in the 20-cell samples, and found that when TrueAllele combined data from multiple subsamples, it assigned a 99% probability to the two contributor genotypes. The point of the study was to demonstrate the possibility of subdividing even small balanced samples to take advantage of peak height differences arising from imbalances in the even smaller subsamples.
</p>
<p><b>The 2013 Cybergenetics–Virginia Department of Forensic Services Casework Study</b></p><p>The fifth study is more on point. In Mark W. Perlin et al., TrueAllele Casework on Virginia DNA Mixture Evidence: Computer and Manual Interpretation in 72 Reported Criminal Cases, 9 PLOS ONE e92837 (2014), researchers from Cybergenetics and the Virginia Department of Forensic Services compared TrueAllele with manual analysis on 111 selected casework samples. The set of criminal case mixtures paired with a nonexcluded defendant’s profile should produce large LRs. For ten pairs, TrueAllele failed to return “a reproducible positive match statistic.” Among the 101 remaining, presumably same-source pairs, the smallest LR was 18. Since the LR must be less than 1 to be deemed indicative of a noncontributor, in no instance did TrueAllele generate a falsely exonerating result.</p><p>But what about falsely incriminating LRs? This time, the researchers did not reassign the defendant’s profiles to other cases to produce false pairs. Rather, they generated 10,000 random STR genotypes (from population databases of alleles in Virginia) to simulate the STR profiles of non-sources of the mixtures from the criminal cases. They paired each of these non-source profiles with 101 genotypes that emerged from the unknown mixtures and calculated LR values. There were fewer than 1 in 20,000 LRs suggesting an association (LR > 1) among these mixture/non-source pairs; less than 1in 1,000,000 for LR > 1,000; and no false positives at all for LR > 6,054. In other words, TrueAllele produced an empirical distribution for false pairs that consisted almost entirely of LRs < 1 and that never had very large LRs. Again, it seems to be an excellent classifier.</p><p><b>The 2015 Cybergenetics–Kern Regional Crime Laboratory Known-samples Study</b>
</p>
<p>Finally, in M.W. Perlin et al., TrueAllele Genotype Identification on DNA Mixtures Containing up to Five Unknown Contributors, 60 J. Forensic Sci. 857 (2015), researchers from Cybergenetics and the Kern Regional Crime Laboratory in California obtained DNA samples from five known individuals. They constructed ten two-person mixtures by randomly selecting two of the five contributors and mixing their DNA in proportions picked at random. The researchers constructed ten 3-, 4-, and 5-person mixtures in the same manner. From each of these 4 × 10 mixtures, they created a 1 nanogram and a 200 picogram sample for STR analysis. TrueAlelle computed an LR for each of the genotypes that went into each analyzed sample (the alternative hypothesis being a random genotype).</p><p>Defining an exclusion as a LR < 1, TrueAllele rarely excluded true contributors to the 1 ng 2- or 3-contributor mixtures (no exclusions in 20 comparisons and 1 in 30, respectively), but with 4 and 5 contributors involved, the false-exclusion rates were 9/40 and 9/50, respectively. The false exclusions came from the more extreme mixtures. As long as at least 10% of the nanogram mixtures came from the lesser contributor, there were no false exclusions. The false-exclusion rates for the 200 pg samples were larger: 2/20, 4/30, 13/40, and 19/50. For these low-template mixtures, a greater proportion of the lesser contributor’s DNA (25%) had to be present to avoid false exclusions.
</p>
<p>To assess false inclusions, 10,000 genotypes were randomly generated from each of three ethnic population allele databases. These noncontributor profiles were compared with the eight mixtures. For ethnic group and DNA mixture sample, the LRs fell well below LR=1, meaning that there were few false inclusions. For the high DNA levels (1 ng), the proportion of comparisons with misleading LRs (LR > 1 for the simulated noncontributors) were 0/600,000, 25/900,000, 186/1,200,000, and 1,301/1,500,000 for the 2-, 3-, 4-, and 5-person mixtures, respectively. The worst case (the most misleadingly high LR) occurred for the five-person mixture, where one LR was 1,592. For the low-template DNA mixtures, the corresponding false-inclusion proportions were 2/600,000, 53/900,000, 177/1,200,000, and 145/1,500,000. The worst outcome was an LR of 101 for a four-person mixture.</p><p>Apparently using “reliable” in its legal or nonstatistical sense (as in <i>Daubert</i> and Federal Rule of Evidence 702), the researchers concluded that “[t]his in-depth experimental study and statistical analysis establish the reliability of TrueAllele for the interpretation of DNA mixture evidence over a broad range of forensic casework conditions.” \2/ My sense of the studies as of the time of the hearing in <i>Wakefield</i> is that they show that within certain ranges (with regard to the quantity of DNA, the number of contributors, and the fractions from the multiple contributors), TrueAlelle’s likelihood ratios discriminate quite well between samples paired with true contributors and the same samples paired with unrelated noncontributors. \3/ Moreover, the program’s output behaves qualitatively as it should, generally producing smaller likelihood ratios for electrophoretic data that are more complex or more deviled by stochastic effects on peak heights and locations.
</p>
<p><b>NOTES</b>
</p>
<ol style="text-align: left;">
<li><span style="font-size: x-small;">In this early study, we compared the empirical LR distribution for parentage using presumably true and false mother-child-father trios derived from a set of civil paternity cases to the “paternity index,” a likelihood ratio computed with software applying simple genetic principles to the inheritance of HLA types. We found that the theoretical PI diverged from the empirical LR for PI > 80 or so.</span></li><span style="font-size: x-small;">
</span><li><span style="font-size: x-small;">Cf. David W. Bauer, Nasir Butt, Jennifer M. Hornyak & Mark W. Perlin, Validating TrueAllele Interpretation of DNA Mixtures Containing up to Ten Unknown Contributors, 65 J. Forensic Sci. 380, 380 (2020), doi: 10.1111/1556-4029.14204 (abstract concluding that “[t]he study found that TrueAllele is a reliable method for analyzing DNA mixtures containing up to ten unknown contributors</span></li><span style="font-size: x-small;">
</span><li><span style="font-size: x-small;">One might argue that the number of mixed samples collectively studied is too small. PCAST indicated that “there is relatively little published evidence” because “[i]n human molecular genetics, an experimental validation of an important diagnostic method would typically involve hundreds of distinct samples.” President's Council of Advisors on Sci. & Tech., Exec. Office of the President, Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods 81 (2016) 81 (notes omitted), https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/PCAST/pcast_forensic_science_report_final.pdf [<a href="https://perma.cc/R76Y-7VU">https://perma.cc/R76Y-7VU</a>]. The number of distinct samples (mixtures from different contributors) combining all the studies listed here seems closer to 100.</span></li>
</ol>DH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.com1tag:blogger.com,1999:blog-5354567765897135804.post-52572039769501856752022-05-24T15:08:00.010-04:002022-08-31T23:02:11.423-04:00The New York Court of Appeals Returns to Probabilistic Genotyping Software (Part II—General Acceptance)<blockquote>The New York Court of Appeals returned to the contentious issue of “probabilistic genotyping software” (PGS) in <i>People v. Wakefield</i>, 2022 N.Y. Slip Op. 02771, 2022 WL 1217463 (N.Y. Apr. 26, 2022). As <a href="http://for-sci-law.blogspot.com/2022/05/the-new-york-court-of-appeals-returns.html" target="_blank">previously discussed</a>, in <i>People v. Williams</i>, 147 N.E.3d 1131 (N.Y. 2020), a slim majority of the court held that the output of a computer program should not have been admitted without a full evidentiary hearing on its general acceptance within the scientific community. The majority opinion described a confluence of considerations:
</blockquote>
<ol>
<li>The program had only been tested in the laboratory that developed it (“an invitation to bias,” id. at 1141);</li>
<li>The only evidentiary hearing ever conducted on the program had only shown “internal validation” and formal approval by a subcommittee of a state forensic science commission that was a “narrow class of reviewers, some of whom were employed by the very agency that developed the technology,” id. at 1142;</li>
<li>Given “the ‘black box’ nature of that program,” the developer's “secretive approach ... was inconsistent with quality assurance standards” id.; and</li>
<li>Submissions for hearings in other cases “suggested that the accuracy calculations of that program may be flawed,” id.</li>
</ol>
<p>But which of these four factors were dispositive? Was it the combination of all four, or something in between, that rendered the evidence inadmissible? If the developer were to change its “secretive approach” so as to allow defense experts to study the program’s source code, would that, plus the “internal validation,” be enough to establish general scientific acceptance? Would it be sufficient for the state to refute the suggestions of flawed “accuracy calculations of the program” through testimony from its experts? Just what did the court mean when it summarized its analysis with the statement that “[i]n short, the [PGS] should be supported by those with no professional interest in its acceptance. <i>Frye</i> demands an objective, unbiased review”?
</p>
<p>The opinion did not reveal how the majority might answer these questions. Of course, in holding that a hearing was necessary, the <i>Williams</i> majority implied that <i>some</i> information outside of the normal scientific literature could fill the gap created by the absence of replicated developmental validation studies from external (“objective, unbiased”) researchers. But what might that information be?
</p>
<p>The court’s encounter with PGS last month did not answer this open question, for the court in <i>Wakefield</i> found that there were replicated studies from the developer of a more sophisticated computer program <i>and</i> other researchers. In addition, it pointed to other evaluations or uses of the program. The totality of the evidence, it reasoned, was stronger than the developer-only record in <i>Williams</i> and demonstrated the requisite general acceptance. But the opinion provoked one member of the court to complain of a "jarring turnabout" from "the same view unsuccessfully advocated by a minority in <i>Williams</i> two years ago."
</p>
<p>This posting describes the case, the DNA evidence, and aspects of the discussions of general acceptance that struck me as interesting or puzzling.
</p>
<p><b>The Crime, the Samples, and Some Misunderstood Probabilities of Exclusion</b>
</p>
<p>In 2010, John Wakefield strangled the occupant of an apartment with a guitar amplifier cord and made off with various items. The New York State Police laboratory analyzed samples from four areas: the front part of the collar of the victim's shirt, the rear part of the collar; the victim's forearm; and the amplifier cord. The laboratory concluded that the DNA on the collar was “consistent with at least two donors, one of which was the victim, and defendant could not be excluded as the other contributor”; that the DNA from the forearm “was consistent with DNA from the victim, as the major contributor, mixed with at least two additional donors” and the DNA on the cord was “a mixture of at least two donors, from which the victim could not be excluded as a possible contributor.” 2022 WL 1217463, at *1.</p>
<p>At this point, the court’s description of the State Police laboratory’s work becomes had to follow. The court wrote that:
</p>
<blockquote>
[T]he analyst did not call any alleles based on peaks on the electropherogram below [the pre-established stochastic] threshold. As a result, there was insufficient data to allow the Lab to calculate probabilities for the unknown contributors to the DNA mixtures found on the amplifier cord and the front of the shirt collar.
</blockquote>
<p>No alleles at all? It takes only one allele to compute a probability of exclusion, although with such a limited profile, the exclusion probability might be close to zero, meaning that the data are uninformative. In any event, for the other two samples, “[t]he Lab was able to call ... 4 ... STR loci” that enabled “the analyst, using the combined probability of inclusion method, [to opine that] the probability an unrelated individual contributed DNA to the outside rear shirt collar was 1 in 1,088" and “that the probability an unrelated individual contributed DNA ... was 1 in 422" for “the profile obtained from the victim's forearm.”<br />
</p>
<p>Or so the court said. As explained in Box 1, these numbers are not “the probability an unrelated individual contributed DNA.” They are estimates of the probability that a randomly selected, unrelated individual could not be excluded as a possible source. Given a large number of unrelated individuals in the region, there easily could be more than a hundred people with STR profiles compatible with the mixtures.
</p>
<div style="background-color: antiquewhite; border-radius: 10px; border: 1px solid black; box-sizing: border-box; padding: 10px;">
<center>BOX 1: TRANSPOSITION</center>
<p>The probability of inclusion is not the probability that an included individual is the contributor. It is the probability of not excluding an individual as a possible contributor. That probability is not necessarily equal to the probability that an included individual actually contributed to the sample from which he or she could not be excluded. If <i>C</i> stands for contributor and <i>I</i> for included, the probability of inclusion for any randomly selected individual can be written P(<i>I</i> given <i>C</i>). The source probability for the individual is different. It is P(<i>C</i> given <i>I</i>). Equating the two is known as the transposition fallacy (or the “prosecutor’s fallacy,” though it could be called the “judges” fallacy as well).
</p>
<p>We do not need any symbols to see that the two conditional probabilities are not necessarily equal. The population of Schnectady county, where the crime occurred, was about 155,000 in 2010. Let’s round down to 150,000. That ought to remove all of Wakefield’s relatives. Excluding all but 1 in 1,088 individuals would leave 138 people as possible perpetrators. Of course, some would be far more plausible suspects than others, but based on the DNA evidence alone, how can the court claim that “the probability an unrelated individual contributed DNA to the outside rear shirt collar was 1 in 1,088”? That probability cannot be determined from the DNA evidence alone. It can be computed only if we are willing to assign a “prior probability” of being the murderer to each of the unrelated individuals in Schnectady (or anywhere else).
</p>
<p>Suppose we assume that, <i>ab initio</i>, everyone in the county has an equal probability of being a source of the DNA on the collar. At that point, Wakefield’s probability is quite small. It is 1/150,000. Since the DNA testing would have excluded all but some 138 people, and because Wakefield is one of them, the probability attached to him is larger. Now the probability is 1/138. But that still leaves the vast bulk of the probability with the 137 unrelated individuals. Instead of transposing, we should say that “the probability an unrelated individual contributed DNA to the outside rear shirt collar was 137 out of in 138” rather than the court’s “1 in 1,088.” Of course, our assumption of equal probabilities for every unrelated individual is unrealistic, but that does not impeach the broader point that the mathematics does not make the probability of an unrelated individual the number that the court supplied.
</p>
</div>
<p><b>Cybergenetics to the Rescue</b>
</p>
<p>To secure a better and more complete analysis, “the electronic data from the DNA testing of the four samples at issue was then sent to Cybergenetics [for] calculating a likelihood ratio—using all of the information generated on the electropherogram, including peaks that fall below a laboratory's stochastic threshold.” <a href="https://www.cybgen.com/company/" target="_blank">Cybergenetics</a> is a private company whose “flagship TrueAllele® technology resolves complex forensic evidence, providing accurate and objective DNA match statistics.” TrueAllele's calculations of the likelihood ratios, using the hypothesis that the four samples contained DNA from an unrelated black individual as the alternative to the hypothesis that Wakefield’s DNA was present were 5.88 billion for the cord, 170 quintillion for the outside rear shirt collar, 303 billion for the outside front shirt collar, and 56.1 million for the forearm.
</p>
<p>Wakefield moved to exclude these findings, The Schnectady County Supreme Court held a pretrial evidentiary hearing “over numerous days.” <a href="https://scholar.google.com/scholar_case?case=1931026626803516354" target="_blank">People v. Wakefield</a>, 47 Misc.3d 850, 851, 9 N.Y.S.3d 540 (2015). (New York calls its trial courts supreme courts.) Finding “that Cybergenetics TrueAllele Casework is not novel but instead is ‘generally accepted’ under the <i>Frye</i> standard,” \1/ <a href="https://ballotpedia.org/Michael_V._Coccoma" target="_blank">Justice Michael V. Coccoma</a> (New York calls its trial judges justices) denied the motion. 47 Misc.3d at 859. A jury convicted Wakefield of first degree murder and robbery. The Appellate Division affirmed, and seven years after the trial, so did the Court of Appeals (New York calls its most supreme court the Court of Appeals).
</p>
<p><b>Changes in New York’s Highest Court</b>
</p>
<p><a href="https://for-sci-law.blogspot.com/2022/05/the-new-york-court-of-appeals-returns.html" target="_blank">Back in <i>Williams</i></a>, the Court of Appeals judges had split 4-3 on whether New York City's home-grown PGS had attained general acceptance. The three judges led by <a href="https://en.wikipedia.org/wiki/Janet_DiFiore" target="_blank">Chief Judge Janet M. DiFiore</a> * objected to the majority’s negative comments about PGS and propounded a narrower rationale for requiring a <i>Frye</i> hearing. But even if one could have confidently applied the majority reasoning in <i>Williams</i> to the scientific status of TrueAllele in <i>Wakefield</i>, the exercise in legal logic might have been futile. In the two short years since <i>Williams</i>, the composition of the court had changed. One concurring judge died, and the majority-opinion bloc lost half its members, including the opinion’s author, to retirements. The reconstituted court gave Chief Judge DiFiore the opportunity to write a more laudatory opinion for a new and larger majority.</p><p>Only one judge stood apart from this new majority. Having been in the majority in Williams, <a href="https://en.wikipedia.org/wiki/Jenny_Rivera_(judge)" target="_blank">Judge Jenny Rivera</a> now found herself in the Chief Judge’s situation in <i>Williams</i>, composing a dissenting opinion with respect to the reasoning on general acceptance but concurring in the result. Drawing on <i>Williams</i>, Judge Rivera maintained that “the court erred in admitting the TrueAllele results but the error ... was harmless” in view of the other evidence of guilt.
</p>
<p><b>The Court’s Understanding of TrueAllele</b>
</p>
<p>The opinions are vague about the inner workings of TrueAllele. The majority opinion suggests that what is distinctive about PGS is that it cranks out a likelihood ratio. \2/ But “likelihood ratio,” for present purposes, simply denotes the probability of data given one hypothesis divided by the probability of the same data given a (simple) alternative hypothesis. It has nothing to do with the probabilistic part of TrueAllelle. Indeed, TrueAllele only computes a likelihood ratio after the probability analysis is completed. It does this by dividing (i) the final posterior odds that favor one source hypothesis as compared to another by (ii) the initial prior odds. This division gives a “Bayes' factor” that states how much the data have changed the odds.
</p>
<p>Let me try saying this another way. In effect, TrueAllele starts with prior odds based solely on the frequencies of various DNA alleles (and hence genotypes) in some population, performs successive approximations to converge on a better estimate of the odds, and divides the adjusted odds by the prior odds to yield what Cybergenetics calls “the match statistic.” If all goes well, this quotient (call it a likelihood ratio, a Bayes' factor, a match statistic, or whatever you want) reveals how powerful the DNA evidence is (which is not necessarily the same as the odds that any hypothesis is true). At least, that is what I think goes on. The court contents itself with warm and fuzzy statements such as “a probability model to assess the values of a genotype objectively” “based on mathematical computations from all the data in the electropherograms.” and “separates the genotypes using the mathematical probability principle of the Markov chain Monte Carlo (MCMC) search to calculate the probability for what the different genotypes could be.” (This last clause may not be so warm and fuzzy; it begins to unpack what I simplistically called successive approximations.)</p>
<p><b>The Timing for General Acceptance</b>
</p>
<p><i>Wakefield</i> is a backwards-looking case. The main question before the Court of Appeals was whether, in 2015, TrueAllele reasonably could have been deemed to have been generally accepted in the scientific community. That is what New York law requires. \3/ The Chief Judge’s analysis of the general acceptance of TrueAllele starts with the observation that “[t]he well-known <i>Frye</i> test applied to the admissibility of novel scientific evidence (Frye v. United States, 293 F. 1013 [D.C. Cir.1923]) is 'whether the accepted techniques, when properly performed, generate results accepted as reliable within the scientific community generally' (People v. Wesley, 83 N.Y.2d 417, 422, 611 N.Y.S.2d 97, 633 N.E.2d 451 [1994]).”
</p>
<p><i>Wesley</i> is an interesting case to cite here. One would not know from the citation or the analysis in <i>Wakefield</i> that in <i>Wesley </i>there was no opinion for a majority of the seven judges on the court. There was one opinion for three judges and another opinion for two judges concurring only in the result. The remaining two judges did not participate. The concurring opinion was written by the late <a href="https://en.wikipedia.org/wiki/Judith_Kaye" target="_blank">Chief Judge Judith S. Kaye</a>, the longest-serving chief judge in New York history.
</p>
<div style="display: flex; flex-flow: row nowrap; justify-content: center;">
<div style="box-sizing: border-box;">
<p>Chief Judge Kaye’s concurrence is memorable for its skepticism about finding general acceptance on the basis of studies from the developer of a method. Current Chief Judge Janet DiFiore briefly summarized that discussion (as did the majority in <i>Williams</i>). A more complete exposition is in Box 2. Chief Judge DiFiore then suggests that the <i>Wesley</i> concurrence was satisfied because “[n]otwithstanding these concerns, Chief Judge Kaye ultimately agreed that, at the time the appeal was decided, "RFLP-based forensic analysis [was] generally accepted as reliable" and those testing procedures were accepted as the standard methodology used in the scientific community until the advent of the PCR STR method used today.”
</p>
<p>This presentation places an odd spin on the <i>Wesley</i> concurrence. The sole basis for the concurrence was that “it can fairly be said that use of DNA evidence was harmless beyond a reasonable doubt” because the DNA evidence “added nothing to the People's case.” 83 N.Y.2d at 444–45. The observations that five years after the hearing in <i>Wesley</i>, it had become clear that “in principle” RFLP-VNTR testing was “fundamentally sound” and was generally accepted were clearly dicta. Chief Judge Kaye was not suggesting that because a method had become generally accepted later, its earlier admission was vindicated. The dicta on later general acceptance was intended to inform trial courts that while they were at liberty to admit RFLP-VNTR evidence without pretrial hearings on general acceptance, they still needed to probe “the adequacy of the methods used to acquire and analyze samples ... case by case.” Id. at 445.</p>
<p>In contrast to <i>Wesley</i>, which emphasized the state of the science “at the time of the <i>Frye</i> hearing in 1988,” 83 N.Y.2d at 425 (plurality opinion), and whether “in 1988, ... there was consensus,” id. at 439 (concurring opinion), Chief Judge DiFiore’s opinion is less precise on when general acceptance came into existence:
</p>
</div>
<div style="background: antiquewhite none repeat scroll 0% 0%; padding: 10px;">
<center>BOX 2. PEOPLE v. WESLEY</center>
83 N.Y.2d 417, 439–41, 611 N.Y.S.2d 97, 633 N.E.2d 451 (N.Y. 1994) (Chief Judge Kaye, concurring) (citations and footnote omitted)
<p>The inquiry into forensic analysis of DNA in this case also demonstrates the "pitfalls of self-validation by a small group" Before bringing novel evidence to court, proponents of new techniques must subject their methods to the scrutiny of fellow scientists, unimpeded by commercial concerns.
</p>
<p>A <i>Frye</i> court should be particularly cautious when — as here — "the supporting research is conducted by someone with a professional or commercial interest in the technique" DNA forensic analysis was developed in commercial laboratories under conditions of secrecy, preventing emergence of independent views. No independent academic or governmental laboratories were publishing studies concerning forensic use of DNA profiling. The Federal Bureau of Investigation did not consider use of the technique until 1989. Because no other facilities were apparently conducting research in the field, the commercial laboratory's unchallenged endorsement of the reliability of its own techniques was accepted by the hearing court as sufficient to represent acceptance of the technique by scientists generally. The sole forensic witness at the hearing in this case was Dr. Michael Baird, Director of Forensics at Lifecodes laboratory, where the samples were to be analyzed. While he assured the court of the reliability of the forensic application of DNA, virtually the sole publications on forensic use of DNA were his own or those of Dr. Jeffreys, the founder of Cellmark, one of Lifecodes' competitors. Nor had the forensic procedure been subjected to thorough peer review. ***
</p>
<p>The opinions of two scientists, both with commercial interests in the work under consideration and both the primary developers and proponents of the technique, were insufficient to establish "general acceptance" in the scientific field. The People's effort to gain a consensus by having their own witnesses "peer review" the relevant studies in time to return to court with supporting testimony was hardly an appropriate substitute for the thoughtful exchange of ideas in an unbiased scientific community envisioned by <i>Frye</i>. Our colleagues' characterization of a dearth of publications on this novel technique as the equivalent of unanimous endorsement of its reliability ignores the plain reality that this technique was not yet being discussed and tested in the scientific community.
</p>
</div>
</div>
<blockquote>
"Although the continuous probabilistic approach was not used in the majority of forensic crime laboratories at the time of the hearing, the methodology has been generally accepted in the relevant scientific community based on the empirical evidence of its validity, as demonstrated by multiple validation studies, including collaborative studies, peer-reviewed publications in scientific journals and its use in other jurisdictions. The empirical studies demonstrated TrueAllele's reliability, by deriving reproducible and accurate results from the interpretation of known DNA samples."
</blockquote>
<p>Presumably, and notwithstanding citations to materials appearing after 2015, \4/ she meant to write that the methodology <i>had</i> been generally accepted in 2015 because the indications listed were present then. (Whether the decisive time for general acceptance <i>should be</i> that of the trial rather than the appeal is not completely obvious. If a technique becomes generally accepted later, why should the defendant be entitled to a new trial in which the evidence that should have been excluded has become admissible anyway? The defendant's interest in the time-of-trial rule is the interest in not being convicted with the help of scientifically sound evidence (as per the general-acceptance standard based on the best current knowledge). A counter-argument is that a large pool of potential defense experts to question the application of the general accepted method in the particular case did not exist at the time of trial because the evidence was too novel.)
</p>
<p><b>Quantifying the Accuracy of PGS</b>
</p>
<p>
Turning to the question of the state of acceptance as of 2015, the majority opinion maintains that
</p><blockquote>
]T]he methodology has been generally accepted in the relevant scientific community based on the empirical evidence of its validity, as demonstrated by multiple validation studies, including collaborative studies, peer-reviewed publications in scientific journals and its use in other jurisdictions. The empirical studies demonstrated TrueAllele's reliability, by deriving reproducible and accurate results from the interpretation of known DNA samples.
</blockquote>
<p>Both the fact that the software was written to implement uncontroversial mathematical ideas and the published empirical evidence are important. If the software were designed to implement a mathematically <i>invalid</i> procedure, the game would be over before it began. But techniques such as Bayes’ rule and sampling methods for getting a representative picture of the posterior distribution only work when they are developed appropriately for a particular application. Acknowledging that these tools have been used to solve problems in many fields of science is a bit like saying that the mathematics of probability theory is undisputed. The validity of the mathematical ideas are a necessary but hardly a sufficient condition for a finding that software intended to apply the ideas functions as intended. Using a particular mathematical formula or method to describe or predict real-world phenomena is an endeavor that is subject to and in need of empirical confirmation. Because PGS models the variability in the empirical data that emerge from chemical reactions and electronic detectors, “empirical evidence ... of its accuracy” is indispensable to establishing its accuracy.
</p>
<p>Unfortunately, <i>Wakefield</i> is short on details from the “multiple validation studies” and “peer-reviewed publications.” What do the studies and publications reveal about the accuracy of output such as “5.88 billion times more probable” and “170 quintillion times more probable”? The Supreme Court opinion is devoid of any quantitative statement of how well the deconvoluted individual profiles and their Bayes’ factors reported by TrueAllele correspond to the presence or absence of those profiles in samples constructed with or otherwise known to contain DNA from given individuals. So is the Appellate Division opinion. So too with the Court of Appeals’ opinions. The court is persuaded that “[t]he empirical studies demonstrated TrueAllele's reliability, by deriving reproducible and accurate results from the interpretation of known DNA samples.” But how well did True Allele perform in the “many published and peer reviewed” validity studies?
</p>
<p>A <a href="http://for-sci-law.blogspot.com/2022/05/the-new-york-court-of-appeals-returns_26.html" target="_blank">separate posting</a> summarizes parts of the six studies circa 2015 that are both published and peer reviewed. The numbers in these studies suggest that within certain ranges (with regard to the quantity of DNA, the number of contributors, and the fractions from the multiple contributors), TrueAllele’s likelihood ratios discriminate quite well between samples paired with true contributors and the same samples paired with noncontributors. For example, in one experiment, LR was never greater than 1 for 600,000 simulations of false contributors to 10 two-person mixtures containing 1 nanogram of DNA—no observed false positives! Conversely, LR was never less than 1 for every true contributor to the same ten mixtures—no observed false negatives in 20 comparisons. Moreover, the program’s output behaves qualitatively as it should, generally producing smaller likelihood ratios for electrophoretic data that are more complex or more bedeviled by stochastic effects on peak heights and locations.
</p>
<p>Such results suggest that TrueAllele’s LRs are in the ballpark. Yet, it is hard to gauge the size of the ballpark. Is a computed LR of 5.88 billion truly a probability ratio of 5.88 billion? Could the ratio be a lot less or a lot more? The validity studies do not give quantitative answers to these questions about “accuracy.” \5/
</p>
<p><b>The Developer’s Involvement</b>
</p>
<p>On appeal, Wakefield had to convince the court that the unchallenged studies and other indicia of general acceptance were too weak to permit a finding of general acceptance. To do so, he pointed to “the dearth of independent validation as a result of Dr. Perlin's involvement in the large majority of studies produced at the hearing.” (Indeed, Dr. Perlin is the lead author of every one of the five published validity studies and a co-author of a sixth published study that also helps show validity.)
</p>
<p>The majority acknowledged “legitimate concern” but decided that it was overcome “by the import of the empirical evidence of reliability demonstrated here and the acceptance of the methodology by the relevant scientific community.” However, the discussion of “the import of the empirical evidence” seems somewhat garbled.
</p>
<p style="text-align: center;">1
</p>
<p>First, the court notes that “the FBI Quality Assurance Standards requires ‘a developmental validation for a particular technology’ be published.” That the FBI might be satisfied with a single publication from the developer of a method does not speak to what the broader scientific community regards as essential to the validation. Along with the QAS, the court cites "NIST, DNA Mixture Interpretation: A NIST Scientific Foundation Review, at 64 (June 2021 <a href="https://nvlpubs.nist.gov/nistpubs/ir/2021/NIST.IR.8351-draft.pdf" target="_blank">Draft report</a>)." The page merely reports that the NIST staff were able to examine “[p]ublicly available data on DNA mixture interpretation performance ... from five sources [including] published PGS studies” and that “conducting mixture studies may be viewed as a necessity to meet published guidelines or QAS requirements ... .” That scientists and other NIST personnel who choose to review a technology will read the scientific reports of the developers of the technology does not tell us much about defendant’s claim that Cybergenetics’ involvement in the published validation studies gravely diminishes “the import of the empirical evidence.”
</p>
<p style="text-align: center;">2
</p>
<p>Second, the Court of Appeals maintained that “the interest of the developer was addressed at the <i>Frye</i> hearing in this case.” As the court described the hearing, the response to this concern was that “[a]lthough Dr. Perlin was involved in and coauthored most of the validation studies, his interest in TrueAllele was disclosed as required by the journals who published the studies and the empirical evidence of the reliability of TrueAllele was not disputed.”
</p>
<p>These responses seem rather flaccid. Some of the articles contain conflict-of-interest statements; most do not. \7/ But the presence or absence of obvious disclaimers does not come to grips with the complaint. Defendant’s argument is not that there are hidden funding sources or financial relationships. It is that interests in the outcomes of the studies somehow may affect the results. The claim is not that validation data were fabricated or that the data analysis was faulty. As with the <a href="https://en.wikipedia.org/wiki/Replication_crisis" target="_blank">movement for replication</a> and “open science,” it is a response to more subtle threats.
</p>
<p style="text-align: center;">3
</p>
<p>Third, the opinion asserts that “the scientific method” is “entirely consistent with” proof of validity coming from the inventors, discoverers, or commercializers (citing President's Council of Advisors on Sci. and Tech., Exec. Office of the President, Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods, at 46 (2016)). Again, however, the argument is not that only disinterested parties do and should participate in scientific dialog. It is that "[w]hile it is completely appropriate for method developers to evaluate their own methods, establishing scientific validity also requires scientific evaluation by other scientific groups that did not develop the method.” Id. at 80 (https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/PCAST/pcast_forensic_science_report_final.pdf [<a href="https://perma.cc/R76Y-7VU" target="_blank">https://perma.cc/R76Y-7VU</a>]).
</p>
<p style="text-align: center;">4
</p>
<p>That precept leads to the court’s last and most telling response to the “legitimate concern” over “the dearth of independent validation.” The Chief Judge finally wrote that “there were [not only] developer [but also] independent validation studies and laboratory internal validation studies, many published and peer-reviewed.”</p><p>But is this a fair characterization of the scientific literature as of 2015? <a href="http://for-sci-law.blogspot.com/2022/05/the-new-york-court-of-appeals-returns_26.html" target="_blank">From what I can tell</a>, no more than five or six studies appear in peer-reviewed journals, and none are completely “independent validation studies.” The NIST report cited in <i>Wakefield</i> lists but a single “internal validation” <a href="https://archive.epic.org/state-policy/foia/dna-software/EPIC-15-10-13-VA-FOIA-20151104-Production-Pt2.pdf" target="_blank">study</a>, from Virginia in 2013, apparently released in response to a Freedom of Information Act request, Although the NIST reviewers limited themselves to laboratory studies or data posted to the Internet, they concluded that “[c]urrently, there is not enough publicly available data to enable an external and independent assessment of the degree of reliability of DNA mixture interpretation practices, including the use of probabilistic genotyping software (PGS) systems.” </p><p>Of course, this “Key Takeaway #4.3” is merely part of a draft report and is not a judgment as to what conclusions on validity should be reached on the basis of the published studies and the internal ones. Nevertheless, the court overlooks this prominent “takeaway” (and others). Instead, the Chief Judge asserts that “[t]he technology was approved for use by NIST”—even though NIST is not a regulatory agency that approves technologies—and that “NIST's use of the TrueAllele system for its standard reference materials likewise demonstrates confidence within the relevant community that the system generates accurate results.”
</p><p style="text-align: center;">~~~
</p>
<p>This is not to say that the scientific literature was patently insufficient to support the court’s assessment of the general scientific acceptance of TrueAllele for interpreting the DNA data in the case. But it does raise the question of whether the court’s assertions about the large number of “independent validity studies” and internal ones that have been “published and peer-reviewed” are exaggerated.
</p>
<p><b>Source Code and General Acceptance</b>
</p>
<p>The defense also contended that the state’s testimony and exhibits from “the <i>Frye</i> hearing [were] insufficient because, absent disclosure of the TrueAllele source code for examination by the scientific community, its ‘proprietary black box technology’ cannot be generally accepted as a matter of law.” This argument bears two possible interpretations. On the one hand, it could be a claim that scientists demand open-source programs—those with every line of code deposited somewhere for everyone to see—before they will consider a program suitable for data analysis or other purposes. We can call this position the open-source theory.
</p>
<p>On the other hand, the claim might be “that disclosure of the TrueAllele source code [to the defense, perhaps with an order to protect against more widespread dissemination of trade secrets] was required to properly conduct the <i>Frye</i> hearing” and that without at least that much discovery of the code, scientists would not regard TrueAllele as valid. We can call this position the discovery-based theory. It implies that, in establishing general scientific acceptance in a <i>Frye</i> hearing, pretrial discovery of secret code is an adequate substitute for exposing the code to the possible scrutiny of the entire scientific community. \8/
</p>
<p>The <i>Wakefield</i> opinions are not entirely clear on about which theory they embrace or reject. Judge Rivera’s concurrence may have endorsed both theories. In addition to accentuating “the need to provide defendant with access to the source code,” she decried the absence of “objective, expert third-party access,” writing that
</p>
<blockquote>
The court's decision was an abuse of discretion as a matter of law because it relied on validation studies by interested parties and evaluations founded on incomplete information about TrueAllele's computer-based methodology. Without defense counsel and objective, expert third-party access to and evaluation of the underlying algorithms and source code, the court could not conclude that TrueAllele's brand of probabilistic genotyping was generally accepted within the forensic science community.
</blockquote>
<p>The “evaluations founded on incomplete information” were from a standards developing organization, a state forensic science commission, and NIST. They were incomplete because, according to Judge Rivera, “without the source code, the agencies could not adequately evaluate the use of TrueAllele for this type of DNA mixture analysis ... .”
</p>
<p>Focusing on the discovery-based theory, the rest of the court determined that “[d]isclosure ... was not needed in order to establish at the <i>Frye</i> hearing the acceptance of the methodology by the relevant scientific community. The Chief Judge gave two, somewhat confusingly stated, reasons. The first was that Wakefield sought the source code under a rule for discovery that did not apply and then “made no further attempt to demonstrate a particularized need for the source code by motion to the court.” But it is not clear how the failure “to demonstrate a particularized need” overcomes (or even responds to) the argument that the scientific community will not accept software as validly implementing algorithms unless the source code is either open source or given only to the defendant.
</p>
<p>The Chief Judge continued:
</p>
<blockquote>
Moreover, defendant's arguments as to why the source code had to be disclosed pay no heed to the empirical evidence in the validation studies of the reliability of the instrument or to the general acceptance of the methodology in the scientific community—the issue for the <i>Frye</i> hearing—and are directed more toward the foundational concern of whether the source code performed accurately and as intended (see <i>Wesley</i>, 83 N.Y.2d at 429, 611 N.Y.S.2d 97, 633 N.E.2d 451).
</blockquote>
<p>The meaning of the sentence may not be immediately apparent. The defense argument is that giving a defendant (or perhaps the scientific community generally) access to source code is a prerequisite to general acceptance of the proposition that the software correctly implements theoretically sound algorithms. If this broad proposition is false dogma, the court should simply say so. It should announce that source code need not be disclosed because there is an alternative, reasonably effective means for establishing that the software performs as it should. The first part of the first sentence starts out that way, but the sentence then states that “whether the source code performed accurately and as intended” is not a matter of general acceptance at all. It is only “foundational” in the sense identified by Chief Judge Kaye in <i>Wesley</i>, who, as we saw (Box 2), wrote that even though RFLP-VNTR testing was generally accepted, the complete “foundation” for admitting DNA evidence entails proof that the generally accepted procedure was performed properly in the case at bar.
</p>
<p>But regarding the argument about source code as falling outside of the <i>Frye</i> inquiry misapprehends the defense argument. Neither the open-source nor the discovery-based theories pertain to the execution of valid software. They question the premise that validity can be generally accepted without disclosure of the program’s source code. Yet, the majority elaborates on its non-<i>Frye</i> "foundational" classification for the source-code argument as follows:</p>
<blockquote>
To the extent the testimony at the hearing reflected that the TrueAllele Casework System may generate less reliable results when analyzing more complex mixtures (see also President's Council of Advisors on Sci. and Tech., Exec. Office of the President, Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods, at 80 [2016] https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/PCAST/pcast_forensic_science_report_final.pdf [published after the <i>Frye</i> hearing was held]), defendant did not refine his challenge to address the general acceptance of TrueAllele on such complex mixtures or how that hypothesis would have been applicable
to the particular facts of this case. As a result, it is unclear that any such objection would have been relevant to defendant's case, where the samples consisted largely of simple (two-contributor) mixtures with the victim as a known contributor (see also NIST, DNA Mixture Interpretation: A NIST Scientific Foundation Review, at 3 [June 2021 Draft report] https://nvlpubs.nist.gov/nistpubs/ir/2021/NIST.IR.8351-draft.pdf).
</blockquote>
<p>These citations to the PCAST and NIST reports actually undercut any suggestion that source-code secrecy does not implicate <i>Frye</i>. The NIST draft repeatedly states that
</p>
<blockquote>
Forensic scientists interpret DNA mixtures with the assistance of statistical models and expert judgment. Interpretation becomes more complicated when contributors to the mixture alleles. Complications can also arise when random variations, also known as stochastic effects, make it more difficult to confidently interpret the resulting DNA profile. <br />
<br />
Not all DNA mixtures present these types of challenges. We agree with the President’s Council of Advisors on Science and Technology (PCAST) that “DNA analysis of single-source samples or simple mixtures of two individuals, such as from many rape kits, is an
objective method that has been established to be foundationally valid” (PCAST 2016).
</blockquote>
<p>NIST, DNA Mixture Interpretation: A NIST Scientific Foundation Review,
at 2-3 & 11-12 (June 2021 draft) (citations omitted). To demand that “defendant ... refine his challenge to address the general acceptance of TrueAllele on ... complex mixtures or ... the particular facts of this case” is to hold that TrueAllele <i>is generally accepted</i> for use with “single-source samples or simple mixtures of two individuals”—<i>even though the source code is hidden</i>. But if science does not demand the disclosure of source code for general acceptance inside the "single" or simple "zone," then why would it demand disclosure for general acceptance outside that zone?
</p>
<p>The court's remarks make more sense as a response to Wakefield’s different discovery argument about the need for source code for trial purposes. This argument does not claim that disclosure of source is essential to general acceptance to exist. It looks to the trial rather than the pretrial <i>Frye</i> hearing. The thought may be that if the accuracy of the program for the “simple” cases is assured, then the need for discovery of the code to prepare for trial testimony is less compelling. The court appears to be responding that because “the samples consisted largely of simple (two-contributor) mixtures with the victim as a known contributor,” there was little need for discovery of the code in this case.
</p>
<p>Although this rejoinder departs from the topic of what Wakefield teaches us about general acceptance, I would note that it is difficult to reconcile this characterization of the case with Chief Judge DeFiore’s own description of the samples. The court mentioned four samples. Its initial description of them indicates that the New York laboratory deemed the sample on the amplifying cord to be “at least” a three-person mixture and stated that “because of the complexity of the mixture,” the laboratory could not even compare “results generated from the amplifier cord ... to defendant's DNA profile.” 2022 WL 1217463, at *1. Because of the “stochastic threshold,” the laboratory could discern peaks at only 4 out of 15 loci for “the outside rear shirt collar” and “for the profile obtained from the victim's forearm.” Id. Presumably, the “insufficient data” on “the unknown contributors to the DNA mixtures found on the amplifier cord and the front of the shirt collar” is what led the state to call Cybergenetics for help. These samples are not instances of what PCAST called “DNA analysis of single-source samples or simple mixtures of two individuals, such as from many rape kits” or what the NIST group called “two-person mixtures involving significant quantities of DNA.” They are “more complicated” situations that arise “when contributors to the mixture share common alleles [and] when random variations, also known as stochastic effects” are present.
</p>
<p>In sum, the deeper one looks into the <i>Wakefield</i> opinions, the more there is to wonder about. But whatever quirks and quiddities reside in the writing, the nearly unanimous opinion of the Court of Appeals signals that a trial court can choose to dispense with the general-acceptance inquiry for at least one PGS program—TrueAllele—for nonchallenging single samples or two-person mixtures and for samples of somewhat greater complexity as well.
</p>
<p><b>NOTES</b>
</p>
<blockquote>
* <span style="font-size: x-small;">UPDATE: On July 12, 2022, Chief Judge DiFiore announced that she will resign on August 31. See, e.g., Jimmy Vielkind & Corinne Ramey, New York’s Top Judge Resigns Amid Misconduct Proceeding: Attorney for Court of Appeals Judge Janet DiFiore Said Her Resignation Wasn’t Related to a Claim that She Improperly Attempted to Influence a Disciplinary Hearing, Wall St. J., July 12, 2022 8:31 am ET, https://www.wsj.com/articles/new-yorks-top-judge-resigns-amid-misconduct-proceeding-11657629111.</span>
</blockquote>
<ol>
<li><span style="font-size: x-small;">This formulation conflates the issue of novelty with the issue of general acceptance, which can change over time. See <i>Williams</i>, 35 N.Y.3d at 43, 147 N.E.3d at 1143.</span></li>
<li><span style="font-size: x-small;">The description begins with the remark that “The likelihood ratio in its modern form was developed by Alan Turing during World War II as a code-breaking method.” That is a possibly defective bit of intellectual history, inasmuch as Turing did not develop the likelihood ratio. To decipher messages, Turing relied on a logarithmic scale for the Bayes’ factor in two ways—as indicating the strength of evidence, and as a tool for sequential analysis. Sir Harold Jeffreys had done the former in his 1939 Theory of Probability book. The sequential analysis problem is not clearly connected to PGS. It arises when the sample size is not fixed in advance and the data are evaluated continuously as they are collected. PGS processes all the data at once.</span></li>
<li><span style="font-size: x-small;">As the court wrote in <i>People v. Williams</i>, 35 N.Y.3d 24, 147 N.E.3d 1131, 1139–40, 124 N.Y.S.3d 593 (N.Y. 2020), “[r]eview of a <i>Frye</i> determination must be based on the state of scientific knowledge and opinion at the time of the ruling (see <i>Cornell</i>, 22 N.Y.3d at 784-785, 986 N.Y.S.2d 389, 9 N.E.3d 884 (‘a <i>Frye</i> ruling on lack of general causation hinges on the scientific literature in the record before the trial court in the particular case’”).</span></li>
<li><span style="font-size: x-small;">E.g., 2022 WL 1217463 at *7 n.10 (“TrueAllele is not an outlier in the use of the continuous probabilistic genotyping method. Other types of probabilistic genotyping software, such as STRMix, have likewise been found to be generally accepted (see e.g. United States v. Gissantaner, 990 F.3d 457, 467 (6th Cir.2021)).”</span></li>
<li><span style="font-size: x-small;">Cf. David H. Kaye, Theona M. Vyvial & Dennis L. Young, <a href="https://papers.ssrn.com/abstract_id=2705941" target="_blank">Validating the Probability of Paternity</a>, 31 Transfusion 823 (1991) (comparing the empirical LR distribution for parentage using presumably true and false mother-child-father trios derived from a set of civil paternity cases to the “paternity index” (PI), a likelihood ratio computed with software applying simple genetic principles to the inheritance of HLA types, and reporting that the theoretical PI diverged from the empirical LR for PI > 80 or so).</span></li>
<li><span style="font-size: x-small;">At trial, “Gary Skuse, Ph.D., a professor of biological sciences at the Rochester Institute of Technology, testified at trial as a defense witness [and] agreed ... that defendant's DNA was present in the mixtures found on the shirt collar and amplifier cord and that it was ‘most likely’ present on the victim's forearm.”</span></li>
<li><span style="font-size: x-small;">The articles in the <i>Journal of Forensic Sciences</i> and <i>Science and Justice</i> have no such statements. The “Competing Interests” paragraph in a <i>PloS One</i> article advises that “I have read the journal’s policy and have the following conflicts. Mark Perlin is a shareholder, officer and employee of Cybergenetics in Pittsburgh, PA, a company that develops genetic technology for computer interpretation of DNA evidence. Cybergenetics manufactures the patented TrueAllele Casework system, and provides expert testimony about DNA case results. Kiersten Dormer and Jennifer Hornyak are current or former employees of Cybergenetics. Lisa Schiermeier-Wood and Dr. Susan Greenspoon are current employees of the Virginia Department of Forensic Science, a government laboratory that provides expert DNA testimony in criminal cases and is adopting the TrueAllele Casework system. This does not alter our adherence to all the PLOS ONE policies on sharing data and materials.”</span></li>
<li><span style="font-size: x-small;">The defense advanced another different discovery theory in arguing that it could not adequately cross-examine and confront Dr. Perlin at trial unless it could access the source code. The court rejected this theory too.</span></li>
</ol>DH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.com0tag:blogger.com,1999:blog-5354567765897135804.post-78451524668089894122022-05-07T16:10:00.003-04:002022-08-31T11:07:20.630-04:00The New York Court of Appeals Returns to Probabilistic Genotyping Software (Part I—Williams)<p>For the past ten years or so, motions to exclude testimony of “probabilistic genotyping” results have been commonly lodged<span style="font-family: "Times New Roman",serif; font-size: 12pt; line-height: 107%; mso-ansi-language: EN-US; mso-bidi-language: AR-SA; mso-fareast-font-family: DengXian; mso-fareast-language: ZH-CN; mso-fareast-theme-font: minor-fareast;">—</span>and almost always denied. With rare exceptions, appellate courts have held these rulings to be proper (or at least within the trial judge’s discretion). Then came <i>People v. Williams</i>, 35 N.Y.3d 24, 147 N.E.3d 1131, 124 N.Y.S.3d 593 (N.Y. 2020).</p><p>In this murder case, New York’s highest court held that the output of one form of probabilistic genotyping software (PGS) was being admitted prematurely, before the scientific community had an adequate chance to evaluate it. But that was two years ago. Two weeks ago, in <i>People v. Wakefield</i>, 2022 N.Y. Slip Op. 02771, 2022 WL 1217463 (N.Y. Apr. 26, 2022), the court returned to the issue of PGS evidence. As in <i>Williams</i>, PGS produced "likelihood ratios" associating the defendant with a murder weapon, but this time the court held that the PGS in question had achieved the general scientific acceptance required to admit scientific evidence in New York.
</p>
<p>This posting discusses <i>Williams</i>. A <a href="http://for-sci-law.blogspot.com/2022/05/the-new-york-court-of-appeals-returns_24.html" target="_blank">separate posting</a> will consider how <i>Wakefield</i> differs from <i>Williams</i>, and why.
</p>
<p>Cadman Williams was accused of a fatal shooting in the Bronx in 2008. The DNA in the case came from a gun hidden in William’s former girlfriend’s apartment. At trial, an expert from the New York City Office of the Chief Medical Examiner (OCME) testified “that it was millions of times more likely that the DNA mixture found on the gun contained contributions from defendant and one unknown, unrelated person, rather than from two unknown, unrelated people.” 35 N.Y.3d at 31. At least, that is how the court understood the testimony. It is not, however, an accurate statement of a likelihood ratio involving the identity of the individual (or individuals) whose DNA is in the sample. Such a likelihood ratio involves only the probability of the DNA data conditional on source hypotheses, not the other war around. (With large enough likelihood ratios, however, the distinction can be academic.) A further issue is the choice of hypotheses. Why not the former girlfriend rather than a random person's DNA?
</p>
<p>Nonetheless, the <i>Williams</i> opinion did not need to consider the proper interpretation of the likelihood ratio for a pair of hypotheses or the selection of those hypotheses. The appeal only concerned the general scientific acceptance of the method that the OCME had devised for generating likelihood ratios for minute quantities of DNA. The Court of Appeals held that the trial court erred “in admitting expert testimony with respect to LCN and FST results in the absence of a <i>Frye</i> hearing.” 35 N.Y.3d at 42. LCN stands for “low copy number” and refers to the fact that the OCME laboratory tweaked the usual method for producing data on the personally identifying features of these fragments to obtain some results. FST stands for “Forensic Statistical Tool,” a computer program developed within the OCME to calculate likelihood ratios. <i>Frye</i> refers to <i>Frye v. United States</i>, 293 F. 1013 (D.C. Cir. 1923), a famous case from the District of Columbia that announced the rule that a scientific method had to achieve acceptance within the scientific community before its results can be received as evidence in court.
</p>
<p>Although the Court of Appeals referred to one New York trial court opinion finding general acceptance of the OCME method as “questionable,” the <i>Williams</i> court did not hold that the computer output from the FST was necessarily inadmissible. The precise legal error lay in the trial court’s allowing the testimony without first conducting an evidentiary hearing on whether OCME’s methods were generally accepted within the scientific community. The majority opinion, written by <a href="https://en.wikipedia.org/wiki/Eugene_M._Fahey" target="_blank">Judge Eugene M. Fahey</a> distinguished between the LCN and FST parts of the OCME method and determined that neither could be said to be generally accepted based on the information presented to the trial judge (namely, prior opinions, mostly without evidentiary hearings; scientific publications; internal studies from the OCME laboratory; and a review conducted by a subcommittee of New York’s forensic science commission).
</p>
<p>“With respect to the FST issue,” the prosecution had “maintained that such evidence should be admitted without a <i>Frye</i> hearing because ‘numerous articles published in peer-reviewed scientific journals’ supported the point that ‘the analytical software employs well-established principles such as Bayesian statistics and likelihood ratios which are used in many areas of science including forensics, medicine and social sciences.’” 35 N.Y.3d at 35 (note omitted). The prosecution added that “given both the thorough review of the FST by DNA Subcommittee of the New York State Forensic Science Committee [sic] and the exhaustive validation of that tool by OCME, the relevant scientific community had accepted the FST as reliable.” 35 N.Y.3d at 31.
</p>
<p>The Court of Appeals was unmoved. It wrote that:
</p>
<blockquote>
If the analysis was as simple as determining whether FST is comprised of existing mathematical formulas that are individually accepted as generally reliable within the relevant scientific community, then FST evidence probably would be admissible even in the absence of a <i>Frye</i> hearing. [¶] The point remains, however, that FST is a proprietary program exclusively developed and controlled by OCME. The sole developer and the sole user are the same. That is not “an appropriate substitute for the thoughtful exchange of ideas ... envisioned by <i>Frye</i>” (Wesley, 83 N.Y.2d at 441, 611 N.Y.S.2d 97, 633 N.E.2d 451 [Kaye, Ch. J., concurring] ). It is an invitation to bias. .... [That the] tool has ... been vetted and approved by “the distinguished scientists making up the DNA Subcommittee of the New York State Forensic Science Committee” is certainly relevant [but] that insular endorsement is no substitute for the scrutiny of the relevant scientific community. [¶] Indeed, here, defendant was hamstrung in demonstrating the existence of conflicting scientific opinions in order to show the need for <i>Frye</i> review of the FST based on the “black box” nature of that program, but his papers adequately showed that OCME's secretive approach to the FST was inconsistent with quality assurance standards within the relevant scientific community. Those papers also showed that facts adduced in challenges to the FST made in <i>Frye</i> applications in other proceedings suggested that the accuracy calculations of that program may be flawed. .... In short, the FST should be supported by those with no professional interest in its acceptance. <i>Frye</i> demands an objective, unbiased review.
</blockquote>
<p>147 N.E.3d at 1141–42.
</p>
<p>
This language was too much for three of the seven judges. Their concurring opinion written by <a href="https://en.wikipedia.org/wiki/Janet_DiFiore" target="_blank">Chief Judge Janet M. DiFiore</a> balked at the “pejorative view of the ... OCME's ... LCN DNA typing technique and its ... probabilistic genotyping software program ... .” 147 N.E.3d at 1147. To be sure, the concurrence agreed that “the issues ... in this 2014 motion [were] ripe for a <i>Frye</i> hearing”<span style="font-family: "Times New Roman",serif; font-size: 12pt; line-height: 107%; mso-ansi-language: EN-US; mso-bidi-language: AR-SA; mso-fareast-font-family: DengXian; mso-fareast-language: ZH-CN; mso-fareast-theme-font: minor-fareast;">—</span>but only because it appeared that the internal studies did not appear to encompass the small quantity of DNA that was analyzed in this case. According to the concurrence,
</p><p></p>
<blockquote>
The LCN DNA profiles drive the FST analysis, and FST results are only as reliable as the predicate assumptions integrated into the FST software program. The People did not meet their burden of establishing the validity of the empirical data used to fuel the calculations performed by this statistical model, including the manner of accounting for the occurrence of the stochastic effect and allelic dropout in a multiple contributor sample of less than 25 picograms, in a manner sufficient to bypass a <i>Frye</i> hearing. Fundamentally, the combined use of that statistical tool with DNA typing on samples that fell beneath validated thresholds may have impacted the reliability of the results, raising a valid challenge to the admissibility of that evidence in a criminal prosecution.
</blockquote>
<p>35 N.Y.3d at 52–53. In other words, the concurring judges seemed to buy the state’s broad argument that there was no need for a <i>Frye</i> hearing to find general acceptance of the OCME system in many cases. But not in this case, where the laboratory pushed beyond what it had validated to its own satisfaction. These judges evinced no concern with the limited outside testing of the FST and expressed no doubts about the conclusions of general acceptance reached in the clear preponderance of trial court rulings on motions to exclude FST likelihood ratios. And the entire court agreed that admission in the case was harmless error because the other evidence against Williams was overwhelming. </p><p>In sum, <i>Williams</i> was a case with broad reasoning (by the majority) on a narrow topic—the OCME’s home-grown Forensic Statistics Tool as applied to data from an especially challenging DNA sample (as stressed by the concurrence). What about other FST likelihood ratios with data from larger DNA samples? Other brands of PGS?<br /></p>
<p>One thing was clear. <i>Williams</i> would not be the last word on probabilistic genotyping. Another case, involving an older and more established computer program known as TrueAllele, was on its way to the Court of Appeals. Stay tuned for thoughts on this case.
</p>DH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.com0tag:blogger.com,1999:blog-5354567765897135804.post-43984194881735422442022-01-15T21:36:00.008-05:002022-01-27T07:52:23.873-05:00Bones of Contention: A Standard for Analyzing Skeletal Trauma in Forensic Anthropology<p>The Academy Standards Board (ASB) of the American Association of Forensic Sciences (<a href="https://www.aafs.org/" target="_blank">AAFS</a>) posted the second proposed draft of a "<a href="https://www.aafs.org/asb-standard/standard-analyzing-skeletal-trauma-forensic-anthropology" target="_blank">Standard for Analyzing Skeletal Trauma in Forensic Anthropology</a>" for public comment. The standard does not go far toward standardizing procedures or showing that the procedures to which it applies have been scientifically tested. Of course, it could well be that ample, well designed studies have demonstrated that forensic anthropologists can consistently and accurately classify skeletal defects in human remains according to the categories the standard mentions. But the standard contains no bibliography and no citations to show that this is the case. </p><p>It contains some negative injunctions and a few positive suggestions about reporting -- for example:</p>
<blockquote>
<div style="border: 1px solid black;">
<ul style="text-align: left;">
<li>Forensic anthropologists shall not determine cause or manner of death.</li><li>Practitioners shall not estimate the temperature or duration of heat exposure based on thermal defects to bone.</li>
<li>Practitioners may report the minimum number of traumatic events (e.g., blunt impacts, projectile entry defects, or sharp defects) observed skeletally, but shall not report a definitive maximum number of impacts, as skeletal trauma evidence may not reflect all impacts to the body.</li>
<li>When a suspect tool is submitted for analysis, similarities between the tool and defect may be reported; conclusions shall be reported in terms of an exclusion or failure to exclude.</li>
</ul>
</div>
</blockquote>
<p>As such, ASB 147-21 is not without any redeeming legal value. Nevertheless, it does not articulate any analytical process by which the classifications it calls for should be made (cf. "<a href="https://www.sciencedirect.com/science/article/pii/S2589871X21000176" target="_blank">vacuous standards</a>"); it requires no reporting of the uncertainty in this process; it does not contemplate the possibility of evidence-based rather than conclusion-based statements of the implications of the data; and it refers to an all-inclusive list of methods as "acceptable." If I may elaborate:<br />
</p>
<div style="text-align: center;"><img alt="File:Human skeleton remains.jpg - Wikimedia Commons" class="n3VNCb" data-noaft="1" src="https://upload.wikimedia.org/wikipedia/commons/6/6d/Human_skeleton_remains.jpg" style="height: 150px; margin: 0px; width: 210px;" /></div>
<p><b>Is "Interpretation" Limited to an Opinion on the Inference (Conclusion) from the Data?</b>
</p>
<p>The revision defines "trauma interpretation" as "Opinion regarding the mechanism of, timing, direction of impact(s) or minimum number of impacts associated with skeletal defect(s) based on quantitative and/or qualitative observations." The phrase "based on ... observations" indicates that the opinion expresses a belief in the truth, falsity, or probability of an inference being drawn from the data. Interpretation should include the possibility of describing the strength of the evidence in favor of the inference rather than opining on the truth, falsity, or probability of the conclusion itself. In addition, if the opinion-statement is an assertion that the hypothesis about what happened is true or false (either categorically or to some probability), it is not just based on the data but on a prior probability for the hypothesis as well.</p><p>Despite these definitions, the standard sanctions "interpretation" in the form of rudimentary statements about the extent to which the data prove the hypothesis in question. Section 6 notes that "Trauma interpretation shall be clearly identified in the report using terms such as ‘indicative of’ and ‘consistent with’ or by using a subheading titled ‘Interpretation.’"These phrases have their <a href="http://for-sci-law.blogspot.com/2016/06/proposed-uniform-language-for-forensic.html" target="_blank">problems</a>, but they are one manner of referring to the probability of the evidence given the truth of certain probabilities rather than vice versa.<br /></p>
<p><b>Is Interpretation Based on Non-scientific Evidence and Inference?</b>
</p>
<p>The revision introduces the following (non)criteria for deciding that blasts or explosions caused skeletal trauma: "Blasts/explosive events often cause blunt (including concussive) and projectile trauma to the body. When the trauma pattern and circumstantial information support a blast event, the trauma mechanism should be classified as 'blast trauma'”. The undefined notion of "support" is too vague to give any guidance. Is "consistent with" considered "support"? Let's hope not -- patterns can be "consistent with" one hypotheses (it could occur when the hypothesis is true) but much more probable under the opposite hypothesis.
</p>
<p>And then there is the green light this recommendation gives to presenting a conclusion based on nonscientific "circumstantial evidence" as if it were based on expertise involving the skeletal evidence. Knowing that a blast occurred can drive the conclusion that the damage to the skeleton is "blast trauma." Should there also be a report on the skeletal evidence from an analyst blinded to the other information uncovered in the investigation?
</p>
<p><b>Is Everything Acceptable?</b>
</p>
<p>ASB 147-21 states that "Skeletal trauma shall be examined. Acceptable methods to examine trauma include gross, microscopic, radiographic, and other analytical methods." This formulation deems every conceivable analytical method as "acceptable" no matter how poorly conceived it may be. Labeling everything as "acceptable" is troublesome in a standard that does not include criteria and procedures for performing the analysis and that does not lead the reader to any evidence of the reliability and validity of the undefined "analytical procedures."
</p>
<p>Of course, forensic anthropologists know that some procedures do not work well, and only an outlier would use them. The drafters of ASB 147-21 undoubtedly appreciate the need for suitable methods (and hence prohibit certain conclusions that cannot be drawn with any existing method). Well motivated and informed forensic anthropologists will not be led astray if they consult the standard. But outliers do appear in court. Remember <a href="https://books.google.com/books?id=4AQdowREOQsC&pg=PA64&lpg=PA64#v=onepage&q&f=false" target="_blank">Louise Robbins</a>. Unless the dubious method yields one of the explicitly prohibited statements in this standard, the outlier witnesses could maintain that they have proceeded exactly as the standard requires. Standards with this potential for abuse should be reformed.They should strive to standardize the methods they govern, and they should state what is known about the accuracy and reliability of these methods.<br /></p><p></p>DH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.com0tag:blogger.com,1999:blog-5354567765897135804.post-30552189333947839422022-01-03T10:23:00.001-05:002022-01-03T16:39:59.965-05:00Fitting "Physical Fit" into the Courtroom<p>The logic of piecing together fragments of broken glass, torn tape, cut paper, and the like seems simple enough. \1/ If the pieces fit in all their details at the edges, and if all surface marks or impressions that would cross an edge also align nicely, one has circumstantial evidence that they were once part of the same object.
</p>
<p>The strength of this evidence for a single source depends on the extent and detail of the concordance between the recovered pieces. A physical fit between two halves of a broken plank of wood is powerful evidence for the hypothesis that the two pieces resulted from breaking this one plank. But if the pieces are weathered and the splintered edges dulled, the physical fit will be less precise and less supportive of the claim that they came from the same original plank.
</p>
<p>At the other extreme, if two pieces are plainly discordant, they might have come from different places on the same object, with the intermediate pieces being missing. Or they might have come from different objects entirely. Consider tearing off five pieces of duct tape from the same roll of tape and comparing the edges of the first and the last segments. The detailed structure of the edges should not be complementary. Likewise, tearing segments of tape from five different rolls should result in a mismatch between the first and the fifth segment.
</p>
<p>Criminalists or materials experts can be extremely helpful in examining the recovered pieces of objects to determine the degree of physical fit -- that is, in elucidating how well the edges fit together and the extent to which a mark on the surface of one piece lines up with a mark on the other when the pieces are aligned. But how they should describe their findings seems to be muddled in forensic-science standards. This posting describes the current vocabulary and argues that it is articificial and a departure from the ordinary meaning of the term "fit." It then outlines better alternatives to reporting the results of an investigation into physical fit.
</p>
<p align="center">I. The Standard Approach
</p>
<p>Let’s look at a couple of ASTM standards. E2225-19a (Standard Guide for Forensic Examination of Fabrics and Cordage) instructs that “[i]f a physical match is found, it should be reported in a manner that will demonstrate that the two or more pieces of material were at one time a continuous piece of fabric or cordage” (§ 7.2.2). This standard treats the “physical match” as an observable property of the specimens (concordant edges and surface marks) that is conclusive of the hypothesis of a single source (the inference from the data).
</p>
<p>ASTM E3260−21 (Standard Guide for Forensic Examination and Comparison of Pressure Sensitive Tapes), on the other hand, characterizes “physical fit” not as a property of the materials, but as a “type of examination that can be performed” (§ 10.5.1). This “conclusive type of examination ... is a physical end match.” Id. It “involves the comparison of edges, fabric (if present), surface striae, and other surface irregularities between samples in which corresponding features provide distinct characteristics that indicate the samples were once joined at the respective separated edges.” Of course, "distinct characteristics that indicate the samples were once joined at the respective separated edges” are not necessarily "conclusive," making this definition of "physical fit" as a "type of examination" puzzling. The intent, it seems, is to define a physical fit examination (rather than a physical fit) as one that is capable of conclusively proving that the pieces were once joined together.
</p>
<p>A Proposed New Standard Guide for the Collection, Analysis and Comparison of Forensic Glass Samples, ASTM WK72932, released for public comment late last year states that “broken objects can be reassembled to their original configuration ... called a ‘physical fit’ (§ 11.1). But a physical fit is the original configuration of a broken object only if the pieces come from that original object, and this origin story is not true just because a standard defines "physical fit" that way. The evidence from the examination may be that the separate pieces fit together extremely well. If so, the conclusion is that they were once together within or as a unitary object. This conclusion may well be true, but one cannot decide, by the fiat of a definition, that the pieces that are observed to fit together well have been <i>re</i>aligned as they once were. Yet, a later section similarly asserts that “[a] glass physical fit is a determination that two or more pieces of glass were once part of the same broken glass object” (§ 11.2.8). This effort to define "physical fit" as inherently conclusive prompted eleven lawyers (including me) \2/ to caution ASTM that “[t]he hypothesis or conclusion that fragments come from the same object is not a physical fit. It is an inference drawn from the observations that produce the designation of a physical fit.”
</p>
<p>Still more recently, an <a href="https://www.nist.gov/osac" target="_blank">OSAC</a> subcommittee released a Standard Guide for Forensic Physical Fit Examination (<a href="https://www.nist.gov/system/files/documents/2021/12/06/OSAC_2022-S-0015_Standard_Guide_for_Forensic_Physical_Fit_Examination_DRAFT_OSAC_PROPOSED.pdf" target="_blank">OSAC 2022-S-0015</a>) for public comment before it is delivered to ASTM for consideration there. This proposed standard goes off in another direction. It equates a “physical fit” with the examiner’s state of mind about a hypothetical ensemble of experiments:
</p>
<blockquote>
<div style="background-color: antiquewhite; border: 1px solid black; padding-bottom: 4px; padding-left: 10px; padding-right: 5px; padding-top: 4px; padding: 4px 5px 4px 10px;">
13.1 <b>Physical Fit</b><br />
13.1.1 The items that have been broken, torn, separated, or cut exhibit physical features that realign in a manner that is not expected to be replicated.<br />
13.1.1.1 Physical Fit is the highest degree of association between items. It is the opinion that the observations provide the strongest support for the proposition that the items originated from the same source as opposed to the proposition they originated from different sources.<br />
<br />
13.2 <b>No Physical Fit</b><br />
13.2.1 The items correspond in observed class characteristics, but exhibit physical features that do not realign, or they realign in a manner that could be replicated.<br />
13.2.2 Alternatively, the items can exhibit physical features that partially realign, display simultaneous similarities and differences, show areas of discrepancy (e.g., warped areas, burned areas, missing pieces), or have insufficient individual characteristics that hinder the ability to determine the presence or absence of a physical fit.
</div>
</blockquote>
<p>Statisticians will notice the shift from (1) the incompletely expressed frequentist idea of an infinite sequence of trials in which different objects <i>A</i> and <i>B</i> are broken and the pieces from <i>A</i> never align with those from <i>B</i> to (2) the likelihoodist conception of support for the same-source hypothesis. But that implicit change in the theory of inference is hardly a cardinal sin in this context. If the probability of a fit at least as good as the one observed is practically zero for different sources, and if the probability of such a fit for the same source is much higher, then the support (the log-likelihood ratio) is very high.
</p>
<p>Nevertheless, defining physical fit as a categorical opinion rather than a more variable degree of congruency that generates the opinion — and dumping everything short of a perceived fit into the category of ”no physical fit” — deviates from the common understanding that physical fit comes in degrees. There can be a remarkably great fit, a pretty good fit, and so on, down to a blatant misfit. The question the examiner must answer, at least intuitively, before the fit/no-fit classification can be made is just <i>how well</i> the pieces fit together. Fit is not a uniform degree of association that springs into existence exactly when a particular examiner is convinced that no other source could account for the complexity and extent of the fit. There is no such thing as “<i>the</i> strongest support.” One can always conceive of a situation with still stronger support (because a fracture or other separation of the pieces could generate an even richer set of irregularities in the edges).
</p>
<p>The current approach of <i>defining</i> a physical fit as a single source for the pieces and calling everything else “no fit” does not create a vocabulary that judges or jurors will easily understand. A vocabulary in which physical congruency (fit) lies on a continuum — and that then addresses the inference that should be drawn from the observations — is more transparent.The definitions in the standards collapse the two steps of data acquisition and inference into one.<br /></p>
<p align="center">II. Inference: From Data to Conclusions
</p>
<p>So how should examiners answer the question of how well the pieces fit together? An examination for fit yields multidimensional, spatial data. An examiner could present photographs of the aligned edges and surfaces and highlight the concordant and discordant features. Although the highlighting involves some interpretative thinking, I have called a courtroom presentation that stops at this point "features-only testimony." \3/ It is appropriate when examiners have no special expertise at interpreting how strongly their results support the same-source hypothesis. If they are no better than lay judges and jurors at discerning how improbable the features are in the hypothetical cases of repeatedly breaking the same object, it could be argued that these witnesses should not try to interpret the results any further. Such interpretation would not actually assist the trier of fact, as required by Federal Rule of Evidence 702.</p><p>For example, a few days ago, a forensic scientist told me of a case in which a criminalist was able to reassemble pieces of glass recovered at the site of a hit-and-run accident so that they fit neatly into the metal holder of a side rear mirror on the suspect’s car that was missing its glass. That’s good detective work, but did the criminalist have any special insights to offer into the obvious implications of this solution to the jigsaw puzzle? (The work was not presented in court because the crime laboratory’s management was concerned that there was no written protocol for pasting mirror fragments back in place. As the scientist observed, that's silly. The evidence practically speaks for itself, and its message is the same with or without a written protocol.)
</p>
<p>Nevertheless, let’s assume that examiners do have specialized skill at interpreting the findings about the alignment of the features. The ASTM and OSAC-proposed standards ignore the possibility of a qualitative expression of relative support — for example, “It is far more likely to get the detailed alignment of the features I just showed you if the pieces were broken parts of the same objects than if they were from different objects.” Or, similarly, “The detailed alignment gives very strong support to the idea that the pieces broke off of the same object as opposed to two different objects.”
</p>
<p>As Part I showed, the standards advocate a fit/no-fit classification in which “fit” is either a statement about the probability of the same-source hypothesis (that the pieces had to have come from the same object) or a statement of belief in the hypothesis (“my opinion is that they were together in the same object — that’s what makes it a physical fit). No-fit does not have a comparably sharp meaning. It could mean anything from no realistic possibility that the pieces were once contiguous parts of the same object to “partial fit features [that] increase the significance of the finding” (OSAC 2022-S-0015 § 13.2.4).
</p>
<p>A more straightforward and comprehensible approach would be to have a three-tiered reporting scale for the support the data give to the same-source hypothesis. What is now called a physical fit would be designated <i>a highly probative physical fit</i> (that is, a physical fit that strongly supports the same-source hypothesis). “Partial fit features” would be described as <i>a limited fit</i> (that gives <i>some</i> support to the same-source hypothesis). Finally, an obvious mismatch could be called <i>a misfit</i> (which strongly supports the conclusion that the pieces were never adjacently located on the same object).
</p>
<p>This tripartite classification is an imperfect way to express an underlying likelihood ratio formed from subjective probabilities. Whether better results would be achieved if analysts were forced to articulate their probabilities, either quantitatively or in the qualitative way mentioned earlier, is an interesting question. But the three-tiered reporting scale is closer to the current practice and seems feasible. \4/ It offers a framework for a better standard on reporting the results of a physical fit examination. Or so it seems to me — those who disagree are encouraged to hit the comment button.<br /></p>
<p>NOTE
</p>
<ol>
<li><span style="font-size: x-small;">But see Forensic Science’s Latest Proof of Uniqueness, Dec. 22, 2013, <a href="http://for-sci-law.blogspot.com/2013/12/forensic-sciences-latest-proof-of.html">http://for-sci-law.blogspot.com/2013/12/forensic-sciences-latest-proof-of.html</a>.</span></li><span style="font-size: x-small;">
</span><li><span style="font-size: x-small;">The other commenters were Alyse Bertenthal, Amanda Black, Jennifer Friedman, Julia Leighton, Kate Philpott, Emily Prokesch, Matt Redle, Andrea Roth, Maneka Sinha, and Pate Skene.</span></li><span style="font-size: x-small;">
</span><li><span style="font-size: x-small;">David H. Kaye et al., The New Wigmore on Evidence" Expert Evidence (2d ed. 2011).</span></li><span style="font-size: x-small;">
</span><li><span style="font-size: x-small;">When there is a mismatch, testimony about a physical match has little value. Other features than the alignment of edges and surface markings will need to be studied if the expert is to shed light on whether the pieces came from a single object. The current and proposed standards are clear on this point.</span></li>
</ol>DH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.com0tag:blogger.com,1999:blog-5354567765897135804.post-2445628787202898212021-12-25T16:46:00.006-05:002022-01-03T18:50:25.274-05:00The FBI's Misinformation Campaign on Firearms-toolmark Testimony<p>On Tuesday (21 December 2021), the Texas Forensic Science Commission issued a <a href="http://www.txcourts.gov/fsc/publications-reports/other-reports/" target="_blank">Statement Regarding 'Alternate Firearms Opinion Terminology'</a>. It is a forceful correction to misinformation from the <a href="https://www.fbi.gov/services/laboratory" target="_blank">FBI Laboratory</a>'s Assistant General Counsel, Jim Agar II. \1/ The email that attracted the Commission's critical attention tells forensic analysts what they are supposed to say in opposition to motions to limit their testimony about firearms-toolmark comparisons. As previous postings show, there has been no shortage of defense motions seeking to forbid eliciting opinions that ammunition components associated with a crime came from a particular gun. </p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEgnmAx3vCb_-kNBh4mM6ck_uZS6ZsFK1N24A162W3b62twNSUSZ6n48PC1aMR37Jn5aB7D36M__aGcOHntQaL7Eh7VIM9JF2TOBLSXvducYF_kIqO3HiOITfeEYjP7SQ95siR3JMP1SYwmVrmNZY7XLLxvfRqbCMai8EgFua9N-72nFJUQMMo2BQwE=s269" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="202" data-original-width="269" height="101" src="https://blogger.googleusercontent.com/img/a/AVvXsEgnmAx3vCb_-kNBh4mM6ck_uZS6ZsFK1N24A162W3b62twNSUSZ6n48PC1aMR37Jn5aB7D36M__aGcOHntQaL7Eh7VIM9JF2TOBLSXvducYF_kIqO3HiOITfeEYjP7SQ95siR3JMP1SYwmVrmNZY7XLLxvfRqbCMai8EgFua9N-72nFJUQMMo2BQwE" width="119" /></a></div>
<p>The FBI advice to firearms examiners is entitled "Dealing with Alternate Firearms Opinion Terminology" (hereinafter <i>Dealing</i>). It begins by dismissing the best efforts of federal and state judges to respond to weaknesses in traditional "This is the gun!" testimony as "wholesale attempts to rewrite the firearm expert's testimony by a layman with no experience in forensic science." \2/ The fact that eminent scientists and respected jurists have questioned source-attribution testimony in general and in this field in particular does not seem to matter. According to <i>Dealing</i>, the limitations are "not supported by either science or the law." Despite the government's annoyance with lay judges' rulings, however, courts have a duty to review the scientific and scholarly literature to decide whether strong claims of source attributions are sufficiently warranted. \3/</p>
<p><i>Dealing</i> continues, more reasonably, with the strategic recommendation that "firearms examiners and prosecutors should address the terminology issue head-on during their direct examination at the admissibility hearing. Preempt this issue early. Don't wait for the judge or the defense counsel to bring it up." But the tactics for bringing it up are over the top. <i>Dealing</i> imagines the following colloquy:</p>
<blockquote>
<div style="background-color: #ffe9ec; border-radius: 10px; border: 1px solid black; padding-bottom: 4px; padding-left: 10px; padding-right: 5px; padding-top: 4px; padding: 4px 5px 4px 10px;"><i>
Prosecutor</i>: Can you testify truthfully that your opinion is that the cartridge cases and/or bullets in this case <br /> • "Could or may have been fired by this gun?"<br /> • ''Are consistent with having been fired by this gun?"<br /> • "Are more likely than not having been fired by this gun?"<br /> • "Cannot be excluded as having been fired by this gun?"<br />
<i>Examiner</i>: No, I cannot testify truthfully to any of those statements or just the class characteristics alone.<br />
<i>Prosecutor</i>: Why not?<br />
<i>Examiner</i>: For three reasons: First, there are no empirical studies or science to backup any of those statements or terminology. Second, those statements are not endorsed nor approved by my laboratory, any nationally recognized forensic science organization, law enforcement, or the Department of Justice. Third, those statements are false as they do not reflect my true opinion of identification. Such statements would mislead the jury about my opinion in this case. It would also constitute a substantive and material change to my opinion from one of Identification to Inconclusive. This would constitute perjury on my part for I would not be telling the jury the whole truth.
</div>
</blockquote>
<p>The "three reasons" border on the absurd (if they do not cross the border). First, the empirical studies that prosecutors cite to support the ability of firearms experts to match ammunition components to specific guns also support the bulleted statements. This is because the alternatives are lesser included statements, so to speak. If a categorical source attribution is correct, then a weaker included statement such as "cannot be excluded" also is true. If "empirical studies or science" do not adequately support these weaker statements, then, <i>a fortiori</i>, they do not support the much stronger claims that <i>Dealing</i> advocates.</p><p>Second, that law enforcement organizations and crime laboratories do not approve of the policy of replacing traditional "This is the gun!" testimony with a less telling alternative proves nothing about whether the bulleted statements are true or false. It merely means that a laboratory is unwilling to change its standard operating procedure and that "law enforcement" opposes losing the opinions that prosecutor's love their experts to provide. No self-respecting expert can say that the desire of "law enforcement" and crime laboratories for the strongest possible testimony makes less compelling testimony "untruthful."<br /></p><p>Finally, that any lawyer -- let alone one representing the FBI -- would ask a forensic examiner to tell a judge that it would be perjurious to testify in the bulleted ways is shocking. A federal perjury prosecution would be laughed out of court. Under federal law, statements that are known to be incomplete, or, worse, fully intended to distract or mislead, do not constitute perjury if they are literally true. The leading case is <i>Bronson v. United States</i>. \4/ There, defendant testified as follows:</p>
<blockquote>
Q. Do you have any bank accounts in Swiss banks, Mr. Bronston?<br />
A. No, sir.<br />
Q. Have you ever?<br />
A. The company had an account there for about six months, in Zurich.<br />
Q. Have you any nominees who have bank accounts in Swiss banks?<br />
A. No, sir.<br />
Q. Have you ever?<br />
A. No, sir.
</blockquote>
<p>In reality, the witness had previously maintained and had made deposits to and withdrawals from a personal bank account in Geneva, Switzerland. Clearly, his answers were calculated to avoid revealing this fact. However, the Supreme Court unanimously reversed a conviction for perjury, concluding that the federal statute did not criminalize lying by omission and misdirection.</p>
<p>To be sure, some state statutes define the crime to encompass wilful omissions, but the core idea remains that perjury occurs when the witness intends to give the questioner false information or a false impression so as to obstruct the ascertainment of the truth. \5/ An expert witness who testifies sincerely to true statements such as "the defendant's gun cannot be excluded as the one that fired the recovered bullet" or "measurements of the bullet and the pistol showed them both to be 9 mm, so the bullet could have been fired from the gun," is not intending to lead anyone to a false conclusion. That the FBI would like firearms examiners to give more incriminating opinions does not make the lesser included testimony false or misleading. A prosecutor who truly is worried that "[t]estimony about class characteristics alone may falsely imply an examiner was unable to reach a conclusion of identification" can ask the court to instruct the jurors that the rules of evidence no longer allow an expert witness to testify that a bullet came from a particular gun and that they may not draw any inference from the absence of such inadmissible testimony. Instead, they are to use only the testimony that the expert gave in coming to a conclusion about which gun fired the recovered bullet.</p>
<p>After maintaining that "laymen" (courts) are asking toolmark examiners to commit perjury, <i>Dealing</i> gives another specious argument to persuade toolmark experts to stick to their guns (sorry about that) and refuse to "agree to testify to the terms of 'Could or may have fired,' or 'Consistent with,' 'More likely than not,' or 'Cannot be excluded.'" FBI counsel believes that examiners who testify this way when they feel that a traditional source attribution is justified "are ratifying these bogus statements and adopting this as their testimony, giving the judge a pass on the difficult decision to admit or exclude their testimony. They are also acquiescing to the judge's faulty terminology."</p><p>This is nonsense. The law has a spectrum of options ranging from excluding every bit of information a firearms expert might provide (which is unjustified given what is known about the performance of these experts) to unfettered admission of "This is the gun!" testimony (which is traditional). The only "fault" in the intermediate testimony is that it is not as strong as a prosecutor might want it to be. It is conservative in the sense of understating probative value (as FBI counsel understands the science), but testifying conservatively at trial when that is what a court requires does not "ratify" anything about the court's ruling. It simply presents a permissible opinion. DNA experts who testified to "ceiling" probabilities of random matches because that was the best the prosecution could get some courts to accept circa 1995 were not perceived as "ratifying these bogus statements." \6/<br /></p>
<p><i>Dealing</i> disagrees. FBI counsel insists that "acquiescing" in court rulings is "fatal" to an examiner's career as a witness:</p>
<blockquote>
<div style="background-color: #ffe9ec; border-radius: 10px; border: 1px solid black; padding-bottom: 4px; padding-left: 10px; padding-right: 5px; padding-top: 4px; padding: 4px 5px 4px 10px;">
This is fatal. Why? Once you testify to these bogus terms, you are wedded to them for life. At subsequent trials, defense counsel will pull out the verbatim transcript of the examiner's previous testimony where they used these court-induced terms. On cross examination, they will confront the examiner with their previous testimony and contrast their opinion of "Identification" with those in previous cases, then claim the expert is merely making this stuff up. The examiner no longer has any credibility in the jury's eyes.
</div>
</blockquote>
<p>This fear of cross-examination is fanciful. If the expert testifies at the admissibility stage (as <i>Dealing</i> contemplates) that "This is the gun!" testimony is scientifically justified, then that is what the expert is on record as stating. Later, more circumscribed testimony pursuant to court order is not an inconsistent statement useful for impeachment. Any competent expert witness will have no trouble explaining that in the earlier case, I reached the conclusion of "identification" (just read my case notes), and I used other terminology only because the prosecutor asking the question (or the judge) said I had to use the lesser included language because of a legal rule rather than a scientific principle.</p><p>In contrast, the witness who follows FBI counsel's advice <i>will</i> lose all credibility. The truth is that the lesser included testimony, while less powerful, is no less truthful than "This is the gun!" testimony. It is somewhat like choosing a wider confidence interval to increase the coverage probability; the statement becomes less precise, but it is more likely to be true. Talk of perjury and being asked to lie suggests either that (1) the witness does not understand a statement such as "the recovered bullet could have come from/is consistent with coming from/is not excluded as coming from/is more likely to have come from the firearm in question or that (2) the witness has chosen to lobby for the prosecution rather than to educate the judge impartially.</p>
<p><b>NOTES</b></p>
<ol>
<li><span style="font-size: x-small;">Mr. Agar is a decorated, retired Colonel with "31 years of successful experience leading complex legal organizations as a general counsel, attorney, leader, mentor and trainer of FBI legal offices and senior-level Army staffs" and "hands-on experience in advising senior FBI and Army leaders in all legal matters." His work as Assistant General Counsel for the "FBI Forensic Laboratory" began in October 2016. On <a href="https://www.linkedin.com/in/jim-agar-87918481" target="_blank">Linkedin</a>, from which these quotations are taken, he summarizes his current position as
</span><blockquote><span style="font-size: x-small;">
Legal advisor to the largest and best forensic laboratory in the world with a staff of over 700 scientists and a budget of $110 million. Responsible for training and qualifying the FBI’s forensic examiners to testify in any and all courts nationwide and internationally, consisting of over 120 examiners in 37 different disciplines. Coordinate all discovery for the Laboratory. Provide ethics advice to Laboratory personnel.
</span></blockquote></li><span style="font-size: x-small;">
</span><li><span style="font-size: x-small;">Discussion of this line of cases can be found in David H. Kaye et al., Wigmore on Evidence: Expert Evidence (3d ed. 2021).</span></li><li><span style="font-size: x-small;">The track record of the courts in translating this literature and the growing research on firearms-toolmark comparisons into appropriate constraints on proposed expert testimony is not perfect. Indeed, most of the judicial palliatives for perceived expert overclaiming (such as the supposed limitation of "a reasonable degree of ballistic certainty" and the alternatives listed in <i>Dealing</i>) are far from optimal. Id. (and other postings in this blog). But these failures hardly mean that, as "laymen," judges are disqualified from trying to improve the presentation of expert knowledge by excluding certain forms of testimony.</span></li><span style="font-size: x-small;">
</span><li><span style="font-size: x-small;">Bronston v. United States, 409 U.S. 352 (1973).</span></li><span style="font-size: x-small;">
</span><li><span style="font-size: x-small;">See Ira P. Robbins, Perjury by Omission, 97 Wash. U. L. Rev. 265 (2019).</span></li><li><span style="font-size: x-small;">See, e.g., David H. Kaye, The Double Helix and the Law of Evidence (2010).</span><br /></li>
</ol><p></p>DH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.com0tag:blogger.com,1999:blog-5354567765897135804.post-2625616553156832682021-09-03T13:46:00.003-04:002022-09-17T18:01:04.342-04:00Does Qualitative Measurement Uncertainty Exist?<p>I have heard it said that forensic-science standards for interpreting the results of chemical or other tests need not discuss uncertainty in measurements of qualitative properties. For instance, ASTM International appropriately requires standards for test methods to include a section reporting on precision and bias as manifested in interlaboratory tests. Yet, it applies this requirement exclusively to quantitative measurements. Its 2021 style manual is unequivocal:</p>
<blockquote>
<div style="background-color: #ffe9ec; border-radius: 10px; border: 1px solid black; padding-bottom: 4px; padding-left: 10px; padding-right: 5px; padding-top: 4px; padding: 4px 5px 4px 10px;">
When a test method specifies that a test result is a nonnumerical report of success or failure or other categorization or classification based on criteria specified in the procedure, use a statement on precision and bias such as the following: “Precision and Bias—No information is presented about either the precision or bias of Test Method X0000 for measuring (insert here the name of the property) since the test result is nonquantitative" (ASTM 2020, § A21.5.4, pp. A3-A14).
</div>
</blockquote>
<p>Qualitative measurements are observation-statements such as the ink is blue, the friction ridge skin pattern includes loops, the bloodstain displays a cessation pattern, the blood group is type A, the glass fragments fit together perfectly, or the material contains cocaine. Likewise, the statements could be comparative: the recording of an unknown bell ringing sounds like it has a higher pitch than the ringing of a known bell; the hairs are microscopically indistinguishable; or the striations on the recovered bullet and the test bullet line up when viewed in the comparison microscope.</p><p>“Precision” is defined as “the closeness of agreement between test results obtained under prescribed conditions” (ibid. § A21.2.1, at A12). “A statement on precision allows potential users of the test method to assess in general terms its usefulness in proposed applications” and is mandatory (ibid. § A21.2, at A12). So how can it be that statements of precision and bias are not allowed for qualitative as opposed to quantitative findings? In both situations, the system that generates the findings could be noisy or skewed in its outcomes.</p>
<p>The only answer I have heard is that measurements cannot be qualitative because the word "measurement" is reserved for determining the magnitude of <i>quantities</i> such as length or mass. The values of these quantitative variables are basically isomorphic to the nonnegative real numbers. Counts, such as the number of alpha particles emitted in a given interval of time by radium atoms, also qualify as measurements because there is a quantitative, additive structure to them. The values of the variable are basically isomorphic to the natural numbers. Properties that only have names are described by nominal variables. Although numbers can assigned (1 for a match and 0 for a nonmatch, for example) these numbers are no more a measurement than a social security number is. In short, the argument is that because “measurements” do no not include qualitative judgments, classifications, decisions, identifications, or whatever one might call them, no statement of measurement uncertainty or error is possible, let alone required.</p>
<p>This argument is incredibly weak. To begin with, the definition of “measurement” is a highly contested concept. As one guide from NIST explains, a “much wider” conception of measurement than the one “contemplated in the current version of the International vocabulary of metrology (VIM)” has been developed in the metrology literature, and the measurand “may be ... qualitative (for example, the provenance of a glass fragment determined in a forensic investigation" (Possolo 2015). Broader conceptions of measurement have been the subject of many decades of writing in psychology and psychometrics (see, e.g., Humphry 2017; Mitchell 1990). Philosophers have been struggling to describe the scope and meaning of "measurement" at least since Aristotle (see, e.g., Tal 2015).</p>
<p>Second, even if one agrees with the definition in one NIST publication that “[m]easurement is [confined to] an experimental process that produces a value that can reasonably be attributed to a quantitative property of a phenomenon, body, or substance” (NIST 2019), some qualitative observations fit this definition. The color of a strip of litmus paper, for instance, can be understood as a value “that can reasonably be attributed to a quantitative property,” It is simply a crude measurement of pH.</p>
<p>Finally, the argument that there can be no measurement error for qualitative properties because those properties are not really “measured” is a semantic ploy that misses the point. The observations or estimates of nonquantitative properties as well as the individual measurements of quantitative properties are all subject to possible random and systematic error, and statements expressing the range of probable error for all measurements, observations, estimates, and classifications are essential. The need for these statements cannot be avoided for qualitative properties or judgments by the fiat of the VIM or some other dictionary. Even if “measurement” must be read in one particular, narrow, technical sense, “evaluation uncertainty” or “examination uncertainty” still must be reckoned with (Mari et al. 2020).</p>
<p>In sum, there is no excuse for ASTM and other organizations promulgating standards for forensic-science test methods to exempt any reported findings from required statements of uncertainty. Many statistics can be used to indicate how reliable (repeatable and reproducible) and valid (accurate) the test results may be (ibid.; Ellison & Gregory 1998; Pendrill & Petersson 2016). The qualitative-quantitative distinction affects the choice of the statistical method or expression but not the need to have one.</p>
<p><b>REFERENCES</b></p>
<ul>
<li>ASTM Int’l, Form and Style for ASTM Standards (2020), <a href="https://www.astm.org/FormStyle_for_ASTM_STDS.html">https://www.astm.org/FormStyle_for_ASTM_STDS.html</a>.</li>
<li>Stephen L. R. Ellison & Soumi Gregory, Perspective: Quantifying Uncertainty in Qualitative Analysis, Analyst 123, 1155-1161 (1998),
<a href="https://doi.org/10.1039/A707970B">https://doi.org/10.1039/A707970B</a></li>
<li>Stephen M. Humphry, Psychological Measurement: Theory, Paradoxes, and Prototypes, 27(3) Theory & Psychology 407–418 (2017)</li>
<li>L. Mari, C. Narduzzi, S. Trapmann, Foundations of Uncertainty in Evaluation of Nominal properties, 152 Measurement 107397 (2020), DOI:10.1016/j.measurement.2019.107397</li>
<li>Joel Mitchell, An Introduction to the Logic of Psychological Measurement (1990)</li>
<li>NIST, Statistical Engineering Division, Measurement Uncertainty, updated Nov. 15, 2019,
<a href="https://www.nist.gov/itl/sed/topic-areas/measurement-uncertainty">https://www.nist.gov/itl/sed/topic-areas/measurement-uncertainty</a></li>
<li>Leslie Pendrill & Niclas Petersson, Metrology of human-based and other qualitative measurements,
27(9) Measurement Sci. Technol. 27 094003 (2016)</li>
<li>A. Possolo, Simple Guide for Evaluating and Expressing the Uncertainty of NIST<br />Measurement Results (NIST Technical Note 1900), 2015, doi: 10.6028/NIST.TN.1900</li>
<li>Eran Tal, Measurement in Science, in Stanford Encyclopedia of Philosophy (Edward N. Zalta ed. 2015),
<a href="https://plato.stanford.edu/archives/fall2017/entries/measurement-science/">https://plato.stanford.edu/archives/fall2017/entries/measurement-science/</a></li>
</ul>
<p><b>APPENDIX</b>: ADDITIONAL PUBLICATIONS ON "QUALITATIVE MEASUREMENT"</p>
<ol>
<li>Mary J. Allen & Wendy M. Yen, Introduction to Measurement Theory 2 (1979) ("In measurement, numbers are assigned systematically and can be of various forms. For example, labeling people with red hair "1" and people with brown hair "2" is a measurement. Since numbers are assigned to individuals in a systematic way and differences between scores represent differences in the property being measured (hair color).")</li>
<li>Peter-Th. Wilrich, The determination of precision of qualitative measurement methods by interlaboratory experiments, Accreditation and quality assurance, 15: 439-444 (2010)</li>
<li>Boris L. Milman, Identification of chemical compounds, Trends in Analytical Chemistry, 24:6, 2005 ("identification itself is considered as measurement on a qualitative scale")</li>
<li>NIST Expert Working Group on Human Factors in Latent Print Analysis, Latent Print Examination and Human Factors: Improving the Practice Through a Systems Approach, Gaithersburg: National Institute of Standards and Technology, David H. Kaye ed., 2012 (defining "measurement" broadly, to encompass categorical variables, including the examiner's judgment about the source of a print).</li>
<li>Lim, Yong Kwan, Kweon, Oh Joo, Lee, Mi-Kyung and Kim, Hye Ryoun. Assessing the measurement uncertainty of qualitative analysis in the clinical laboratory. Journal of Laboratory Medicine, vol. 44, no. 1, 2020, pp. 3-10. https://doi.org/10.1515/labmed-2019-0155 ("Measurement uncertainty is a parameter that is associated with the dispersion of measurements. Assessment of the measurement uncertainty is recommended in qualitative analyses in clinical laboratories; however, the measurement uncertainty of qualitative tests has been neglected despite the introduction of many adequate methods.")</li>
<li>Donald Richards, Simultaneous Quantitative and Qualitative Measurements in Drug-Metabolism Investigations, Pharmaceutical Technology 2013</li>
<li>Kadri Orro, Olga Smirnova, Jelena Arshavskaja, Kristiina Salk, Anne Meikas, Susan Pihelgas, Reet Rumvolt, Külli Kingo, Aram Kazarjan, Toomas Neuman & Pieter Spee, Development of TAP, a non-invasive test for -qualitative and quantitative measurements of biomarkers from the skin surface, Biomarker Research 2: 20 (2014)</li>
<li>J M Conly & K Stein, Quantitative and qualitative measurements of K vitamins in human intestinal contents, Am J Gastroenterol. 1992 Mar;87(3):311-316</li>
<li>Wenjia Meng, Qian Zheng, Gang Pan, Qualitative Measurements of Policy Discrepancy for Return-Based Deep Q-Network, IEEE Transactions on Neural Networks and Learning Systems 2020</li>
<li>Rudolf M. Verdaasdonk, Jovanie Razafindrakoto, Philip Green, Real time large scale air flow imaging for qualitative measurements in view of infection control in the OR (Conference Presentation) Proceedings Volume 10870, Design and Quality for Biomedical Technologies XII; 1087002 (2019) <a href="https://doi.org/10.1117/12.2511185">https://doi.org/10.1117/12.2511185</a></li>
<li>Rashis, Bernard, Witte, William G. & Hopko, Russell N., Qualitative Measurements of the Effective Heats of Ablation of Several Materials in Supersonic Air Jets at Stagnation Temperatures Up to 11,000 Degrees F, National Advisory Committee for Aeronautics, July 7, 1958</li>
<li>Lawrence F Cunningham and Clifford E Young, Quantitative and Qualitative Approaches, Journal of Public Transportation 1(4) (1997) ("The study also contrasts the results of quantitative and qualitative measurements and methodologies for assessing transportation service quality")</li>
<li>JM Conly, K Stein, Quantitative and qualitative measurements of K vitamins in human intestinal contents, American Journal of Gastroenterology, 1992</li>
<li>P Sinha, Workshop on Biologically Motivated Computer Vision, 2002 - Springer ("Our emphasis on the use of qualitative measurements renders the representations stable in the presence of sensor noise and significant changes in object appearance. We develop our ideas in the context of the task of face-detection under varying illumination")</li>
<li>D Michalski, S Liebig, E Thomae & A Hinz, Pain in Patients with Multiple Sclerosis: a Complex Assessment Including Quantitative and Qualitative Measurements, 40 J. Pain 219–225 (2011), <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3160835/">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3160835/</a></li>
<li>Cécilia Merlen, Marie Verriele, Sabine Crunaire,Vincent Ricard, Pascal Kaluzny, Nadine Locoge, Quantitative or Only Qualitative Measurements of Sulfur Compounds in Ambient Air at Ppb Level? Uncertainties Assessment for Active Sampling with Tenax TA®, 132 Microchemical J. 143-153 (2017)</li>
<li>Tomomichi Suzuki, Jun Ichi Takeshita, Mayu Ogawa, Xiao-Nan Lu, Yoshikazu Ojima, Analysis of Measurement Precision Experiment with Categorical Variables, 13th International Workshop on Intelligent Statistical Quality Control 2019, Hong Kong ("Evaluating performance of a measurement method is essential in metrology. Concepts of repeatability and reproducibility are introduced in ISO5725-1 (1994) including how to run and analyse experiments (usually collaborative studies) to obtain these precision measures. ISO5725-2 (1994) describe precision evaluation in quantitative measurements but not in qualitative measurements. Some methods have been proposed for qualitative measurements cases such as Wilrich (2010), de Mast & van Wieringen (2010), Bashkansky, Gadrich & Kuselman (2012). Item response theory (Muraki, 1992) is another methodology that can be used to analyse qualitative data.").</li>
</ol>DH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.com0tag:blogger.com,1999:blog-5354567765897135804.post-19378742220104719682021-06-14T10:23:00.002-04:002021-07-04T19:35:53.639-04:00Tibbs. Shipp, and Harris on "Meaningul" Peer Review of Studies on Firearms-toolmark Matching<p>
The Supreme Court's celebrated (but ambiguous) opinion in <a href="https://scholar.google.com/scholar_case?case=827109112258472814&" target="_blank"><i>Daubert v. Merell Dow Pharmaceuticals</i></a>, \1/ was a direct response to a seemingly simple rule--results that are not published in the peer-reviewed scientific literature are inadmissible to prove that a scientific theory or method is generally accepted in the scientific community. The Court unanimously rejected this strict rule--and more broadly, the very requirement of general acceptance--in favor of a multifaceted examination guided by four or five criteria that have come to be known as "the <i>Daubert</i> factors."<br /></p><p>But "peer review and publication" lives on--not as a formal requirement, but as one of these factors. Thus, courts routinely ask whether the peer-reviewed scientific literature supports the reasoning or data that an expert is prepared to present at trial. All too often, however, the examination of the literature is cursory or superficial. The temptation, especially for overburdened judges not skilled in sorting through biomedical and other journals, is to check that there are articles on point, and if the theory has been discussed (critically or otherwise) in the literature, to write that the "peer review and publication" factor supports admission of the testimony.
</p>
<p>
One area in which this dynamic is apparent is traditional testimony of firearms examiners matching marks from guns to bullets or shell casings. \2/ Defendants have strenuously objected that traditional associations of particular guns to ammunition components is an inscrutable judgment call that does not pass muster under <i>Daubert</i>. Perhaps the most meticulous analysis of this issue comes from an <a href="https://context-cdn.washingtonpost.com/notes/prod/default/documents/cc85da89-f6a1-4172-bf1c-b6b759669687/note/2faab6e6-85da-4abe-a669-b9f48db2498e.pdf" target="_blank">unpublished opinion</a> of Judge <a href="https://en.wikipedia.org/wiki/Todd_E._Edelman" target="_blank">Todd Edelman</a> in <i>United States v. Tibbs</i>. \3/ Judge Edelman's discussion of peer review and publication is unusually thorough and may have been penned as an antidote to the strategy in which the government gives the court a laundry list of articles
that have discussed the procedure and the court checks off the "peer
review and publication" box.</p>
<p>
Being an opinion for a trial court (the District of Columbia Superior Court), <i>Tibbs</i> is not binding precedent for that court or any other, but it has not gone unnoticed. Two federal district courts recently reached mutually opposing conclusions about Judge Edelman's analysis of one large segment of the literature cited in support of admitting match determinations--namely, the extensive research reported in the <i>AFTE Journal</i>. ("AFTE" stands for the Association of Firearms and Toolmark Examiners. The organization was formed in 1969 in "recognition of the need for the interchange of information, methods, development of standards, and the furtherance of research, [by] a group of skilled and ethical firearm and/or toolmark examiners" who "stand prepared to give voice to this otherwise mute evidence." \4/)
</p>
<p align="center">
<i>Tibbs</i>' Analysis of the <i>AFTE Journal</i>
</p>
<p>
Because of the <i>AFTE Journal</i>'s orientation and editorial process, <i>Tibbs</i> did not give "the sheer number of studies conducted and published" there much weight. \5/ Judge Edelman made essentially four points about the journal:</p>
<ul>
<li>Contrary to the testimony of the government’s experts, post-publication comments or later articles are not normally considered to be “peer review”;
</li><li>
The AFTE pre-publication peer review process is “open,” meaning that “both the author and reviewer know the other's identity and may contact each other during the review process”;
</li>
<li>The reviewers who form the editorial board are all “members of AFTE” who may well “be trained and experienced in the field of firearms and toolmark examination, but do not necessarily have any ... training in research design and methodology” and who “have a vested, career-based interest in publishing studies that validate their own field and methodologies”; and
</li>
<li>
“AFTE does not make this publication generally available to the public or to ... reviewers and commentators outside of the organization's membership [and] unlike other scientific journals, the AFTE Journal ... cannot even be obtained in university libraries.” \6/
</li>
</ul>
<p>
The court contrasted these aspects of the journal’s peer review to a "double-blind" process and observed that the AFTE “open” process was “highly unusual for the publication of empirical scientific research.” \7/ The full opinion, which develops these ideas more completely can be <a href="https://context-cdn.washingtonpost.com/notes/prod/default/documents/cc85da89-f6a1-4172-bf1c-b6b759669687/note/2faab6e6-85da-4abe-a669-b9f48db2498e.pdf" target="_blank">found online</a>.</p>
<p align="center">
<i>Shipp</i>
</p>
<p>
Senior Judge <a href="https://en.wikipedia.org/wiki/Nicholas_Garaufis" target="_blank">Nicholas Garaufis</a> of the Eastern District of New York was impressed with "this thorough opinion." \8/ His opinion in <i>United States v. Shipp</i> referred to the "several pages analyzing the AFTE Journal's peer review process [that] highlight[] several reasons for assigning less weight to articles published in the AFTE Journal than in other publications" and added that
</p>
<blockquote>
The court shares these concerns about the AFTE Journal's peer review process. In particular, the court is concerned that the reviewers, who are all members of the AFTE, have a vested, career-based interest in publishing studies that validate their own field and methodologies. Also concerning is the possibility that the reviewers may be trained and experienced in the field of firearms and toolmark identification, but [may] not necessarily have any specialized or even relevant training in research design and methodology. \9/
</blockquote>
<p style="text-align: center;"><i>Harris</i></p><p>In contrast, Judge <a href="https://en.wikipedia.org/wiki/Rudolph_Contreras" target="_blank">Rudolph Contreras</a> of the U.S. District Court for the District of Columbia, writing in <i>United States v. Harris</i>, \10/ had nothing complimentary to say about <i>Tibbs</i>. This court defended the <i>AFTE Journal</i> research articles said to demonstrate the validity of firearms-toolmark identification with two rejoinders to <i>Tibbs</i>. First, Judge Contreras maintained that “there is far from consensus in the scientific community that double-blind peer review is the only meaningful kind of peer review.” \11/ This is true enough, but the issue raised by the criticism of “open” review is not whether double-blind review is better than single-blind review (in which the author does not know the identity of the referees) or some other system. It is whether “open” review conducted exclusively by AFTE members is the kind of peer review envisioned as a strong indicator of scientific soundness in <i>Daubert</i>. The factors enumerated in <i>Tibbs</i> make that a serious question.<br /></p>
<p>
Second, Judge Contreras observed that the <i>Journal of Forensic Sciences</i>, which uses double-blind review, republished one AFTE study. This solitary event, the <i>Harris</i> opinion suggests, is a “compelling” rebuttal of “the allegation by Judge Edelman in <i>Tibbs</i> that the <i>AFTE Journal</i> does not provide 'meaningful' review." \12/ But Judge Edelman never proposed that every article in the AFTE journal was without scientific merit. Rather, his point was far less extreme. It was merely that courts should not “accept at face value the assertions regarding the adequacy of the journal's peer review process.” \13/ That one article—or even dozens—published in the <i>AFTE Journal</i> could have been published in other journals reveals very little about the level and quality of AFTE review. After all, even a completely fraudulent review process that accepted articles for publication by flipping a coin would result in the publication of <i>some</i> excellent articles—but not because the review process was meaningful or trustworthy. In addition, one might ask whether the very fact that an article had to be republished in a more widely read journal fortifies the fourth point in <i>Tibbs</i>, that the journal’s circulation is too restricted to make its publications part of the mainstream scientific literature. The discussion of peer review and publication in <i>Harris</i> ignores this concern.
</p>
<p align="center">Beyond the <i>AFTE Journal</i><br /></p>
<p>
The significant concerns exposed in <i>Tibbs</i> do not prove that the peer-reviewed scientific literature, taken as a whole, undermines firearms identification as commonly practiced. They simply mean that the list of publications over the years in the <i>AFTE Journal</i> may not be entitled to great weight in evaluating whether the scientific literature supports the claim of firearms and toolmark examiners to be able to supply generally accurate and reliable "opinions relative to evidence which otherwise stands mute before the bar of justice." \14/
</p>
<p>
Fortunately, newer peer-reviewed studies exist, and not <i>all</i> the older research appears in the <i>AFTE Journal</i>. \15/ Thus, the <i>Harris</i> court asserted that
</p><blockquote>
[E]ven if the Court were to discount the numerous peer-reviewed studies published in the AFTE Journal, Mr. Weller's affidavit also cites to forty-seven other scientific studies in the field of firearm and toolmark identification that have been published in eleven other peer-reviewed scientific journals. This alone would fulfill the required publication and peer review requirement. \16/
</blockquote>
<p>The last sentence could be misunderstood. As a statement that the 47 studies could be the basis of an scientifically informed judgment about the validity of firearms-toolmark matching, the conclusion is correct. As a statement that checking the "peer review and publication" box on the basis of a large number of studies published in the right places "alone" is a reason to admit the challenged testimony, it would be more problematic. The "required ... requirement" (to the extent <i>Daubert</i> imposes one) is for a substantial body of peer-reviewed papers that form a solid foundation for a scientific assessment of a method. Unless this research literature is actually supportive of the method, however, satisfying the "the required publication and peer review requirement" is not a reason to admit the evidence. </p><p>Do the 47 studies (old and new) in widely accessible, quality journals all show that examiners' opinions derived from comparing toolmarks are consistently correct and stable for the kinds of comparisons made in practice? If so, then it is high time to stop the arguments over scientific validity. If not, if the 47 studies are of varying quality, scope, and relevance to ascertaining how repeatable, reproducible, and accurate the opinions rendered by firearms-toolmark examiners are, then there is room for further analysis of whether and how these experts can provide valuable information for the legal factfinders.<br /></p>
<p>
<b>NOTES</b>
</p>
<ol>
<li>
509 U.S. 579 (1993),
</li>
<li>
No. 2016 CF1 19431, 2019 D.C. Super. LEXIS 9, 2019 WL 4359486 (D.C. Super. Ct., Sept. 5, 2019).
</li>
<li>
"[T]he process that most firearms examiners use when analyzing evidence" is desctibed in graphic detail in "[t]he Firearms Process Map, which captures the ‘as-is’ state of firearms examination, provides details about the procedures, methods and decision points most frequently encountered in firearms examination." NIST, OSAC's Firearms & Toolmarks Subcommittee Develops Firearms Process Map
Jan. 19, 2021, <a href="https://www.nist.gov/news-events/news/2021/01/osacs-firearms-toolmarks-subcommittee-develops-firearms-process-map">https://www.nist.gov/news-events/news/2021/01/osacs-firearms-toolmarks-subcommittee-develops-firearms-process-map</a>.
</li>
<li>
AFTE Bylaws, Preamble, https://afte.org/about-us/bylaws
</li>
<li>2019 D.C. Super. LEXIS 9, at *35. For a decade or so, both legal academics and forensic scientists had pointed to the <i>AFTE Journal</i> as an example of a practitioner-oriented outlet for publications that did not follow the peer review and publication practices of other scientific journals. See, e.g., David H. Kaye, <a href="https://papers.ssrn.com/abstract_id=3117674" target="_blank">Firearm-Mark Evidence: Looking Back and Looking Ahead</a>, 68 Case W. Res. L. Rev. 723 (2018); Jennifer L. Mnook--n et al., <a href="http://ssrn.com/abstract=1755722" target="_blank">The Need for a Research Culture in the Forensic Sciences</a>, 58 UCLA L. Rev. 725 (2011).
</li>
<li>
2019 D.C. Super. LEXIS 9, at *32-*33.
</li>
<li>
Id. at *33.
</li>
<li>
United States v. Shipp, 422 F.Supp.3d 762, 776 (E.D.N.Y. 2019).
</li>
<li>
Id. (citations and internal quotation marks omitted). Nevertheless, the court found "sufficient peer review." It wrote that "even assigning limited weight to the substantial fraction of the literature that is published in the AFTE Journal, this factor still weighs in favor of admissibility. <i>Daubert</i> found the existence of peer-reviewed literature important because “submission to the scrutiny of the scientific community ... increases the likelihood that substantive flaws in the methodology will be detected.” <i>Daubert</i>, 509 U.S. at 593. Despite AFTE Journal’s open peer-review process, the AFTE Theory has still been subjected to significant scrutiny. ... Therefore, the court finds that the AFTE Theory has been sufficiently subjected to 'peer review and publication' [outside of the AFTE Journal].” <i>Daubert</i>, 509 U.S. at 594."
</li>
<li>
502 F.Supp.3d 28 (D.D.C. 2020).
</li>
<li>
Id. at 40.
</li>
<li>
Id.
</li>
<li>
<i>Tibbs</i>, 2019 D.C. Super. LEXIS 9, at *29.
</li>
<li>
AFTE Bylaws, Preamble, https://afte.org/about-us/bylaws.
</li>
<li>
AFTE has sought to remedy at least one complained-of feature of its peer review process. In 2020, it instituted the double-blind peer review that the <i>Harris</i> court found unnecessary. AFTE Peer Review Process – January 2020, <a href="https://afte.org/afte-journal/afte-journal-peer-review-process">https://afte.org/afte-journal/afte-journal-peer-review-process</a>. Whether the qualifications and backgrounds of the journal's referrees have been changed is not apparent from the AFTE website.
</li>
<li>
<i>Harris</i>, 502 F.Supp.3d at 40.
</li>
</ol>DH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.com0tag:blogger.com,1999:blog-5354567765897135804.post-54813139464489388022021-04-19T22:17:00.005-04:002021-09-27T14:31:03.561-04:00What is Accuracy?<p>The Organization of Scientific Area Committees for Forensic Science (<a href="https://www.nist.gov/osac" target="_blank">OSAC</a>) has an <a href="https://lexicon.forensicosac.org/" target="_blank">online "lexicon"</a> that collects definitions of terms as they appear in published standards. <u>1</u>/ These may or may not be the same as definitions in textbooks or other authoritative sources. <u>2</u>/ They may or may not be accurate. (Yet, the drafters of OSAC standards sometimes point to the existence of a definition in the compendium as if it were a conclusive reason to perpetuate it. <u>3</u>/)
</p>
<p>Speaking of "accurate," the word "accuracy" has five overlapping definitions in OSAC's lexicon:
</p>
<ul style="text-align: left;">
<li>Closeness of agreement between a measured quantitiy [sic] value and a true quantity vlaue [sic] of a measurement.
</li>
<li>The degree of agreement between a test result or measurement and the accepted reference value.
</li>
<li>Closeness of agreement between a test result or measurement result and the true value. 1) In practice, the accepted reference value is
substituted for the true value. 2) The term “accuracy,” when applied to a set of test or measurement results, involves a combination of random
components and a common systematic error or bias component. 3) Accuracy refers to a combination of trueness and precision. [ISO 3534-2:2006].
</li>
<li>The closeness of agreement between a test result and the accepted reference value. 1) In practice, the accepted reference value is
substituted for the true value. 2) The term "accuracy," when applied to a set of test or measurement results, involves a combination of random
components and a common systematic error or bias component. 3) Accuracy refers to a combination of trueness and precision.
</li>
<li>Degree of conformity of a measure to a standard or true value.</li>
</ul>
<p>Some of the definitions in the "lexicon" are designated "preferred terms." <u>4</u>/ None of the definitions in the lexicon is marked preferred.
</p>
<p>The main difficulty with the forensic scientists' set of definitions is that "accuracy" can refer to single measurements or estimates or to a process for making measurements or estimates. The longer definitions are confusing because they do not make it plain that "a combination of trueness and precision" applies to the <i>accuracy of the process</i> (or a large set of measurements from the process) and not so much to the <i>accuracy of particular measurements</i>.
</p>
<p>"Precision" refers to the dispersion of repeated measurements under the same conditions. A precise estimate comes from a process that generates measurements that are typically tightly clustered around some value -- without regard to whether that value is the true one. A set of precise measurements -- ones that come from a process that tends to generate similar measurements when repeated -- may be far from the true value. Such measurements.(and the system that generates them) is statistically biased; these measurements have a systematic error component.
</p>
<p>Conversely, an imprecise estimate -- one coming from a system that tends to produce widely divergent measurements -- may be essentially identical to the true value. Most other estimates from the same system would tend to stray farther from, the true value, but to say that an estimate that is spot on is not accurate sounds odd. The estimate may be <i>unreliable</i> (in the statistical sense of coming from a process that is highly variable), but it is practically 100% accurate (in this case). Even a generally inaccurate system may produce some accurate results.</p><p>The epistemological problem is that we should not rely on an unreliable system to ascertain the true value. For extremely imprecise point estimates, accuracy (in the sense of the absence of error and correspondence to the truth) becomes a matter of luck. It is unwise to act as if a particular measurement (or a small number of them) from an unreliable system adds much to our knowledge.</p><p>But the fact that the individual estimates provide little information is not well expressed by describing a result that is (luckily) correct as lacking accuracy.The investment analyst who said that a bitcoin will increase in value by 50% tomorrow is accurate if bitcoin's price did spike by approximately 50%. Nevertheless, this accurate prediction probably was unwarranted. Unless the analyst had a remarkable history of consistently predicting the ups and downs of bitcoin and an articulable and plausible basis for making the predictions, giving much credence to the prediction before the fact would have been unjustified.<br /></p>
<p>Let's apply these elementary ideas to some forensic measurements. Suppose that analysts in a laboratory use an appropriate instrument to measure the refractive index of glass fragments. Most analysts are extremely proficient. Their measurements are both reliable (repeatability is high) and generally close to the true values. A smaller number of analysts are less proficient. Indeed, they are downright sloppy. They are not biased -- they err in both directions -- but the values they come up with are highly variable. An analyst from the proficient group obtains the value <i>x</i> for a particular fragment, and so does an analyst in the sloppy group.
</p>
<p>Should we say that <i>x</i> is an accurate value when it comes from one of the former analysts and inaccurate when it comes from one of the latter? Some of the definitions from the standards suggest (or could be read as giving) one answer, whereas others suggest the opposite. It is far more straightforward to say that <i>x</i> is accurate (if it is close to the truth) in both cases.
</p>
<p>To be sure, precision is a component of accuracy in the long run -- the imprecise analysts will tend to have lower accuracy (and higher error) rates. Their reports do not provide a sound basis for action. They are neither trustworthy nor statistically reliable. But it invites confusion to characterize every such report -- even ones that provide perfectly or approximately true values -- as inaccurate. When speaking of particular measurements, we simply need to distinguish between those that are wrong because they are far from the truth -- inaccurate -- and those that are accurate -- close to the truth either by good fortune or because of true knowledge. Systems that use luck to get the right answers are systematically inaccurate; properly functioning systems grounded on true knowledge are systematically accurate.
</p>
<p>NOTES</p>
<ol style="text-align: left;">
<li>"The OSAC Forensic Lexicon should be the primary resource for terminology and used when drafting and editing forensic science standards and other OSAC work products. It is continually updated with the latest work from OSAC units, as well as terms from newly published documentary standards and standards elevated to the OSAC Registry." OSAC Registry, https://lexicon.forensicosac.org/ (undated).
</li>
<li>Cf. id. ("The terms and definitions in the OSAC Lexicon come from the published literature, including documentary standards, specialized dictionaries, Scientific Working Group (SWG) documents, books, journal articles, and technical reports. When a suitable definition can’t be located in any of these sources, an OSAC unit generates new or modifies existing definitions. Gradually terms are evaluated and harmonized by the OSAC to a single term. This process results in an OSAC Preferred Term."). </li><li>E.g., Comment Adjudication, OSAC 2021-N-0001, Wildlife Forensics Method-Collection of Known DNA Samples from Domestic Mammals, Feb. 11, 2021, at cells L25 & L27 (OSAC Proposed Standard added to the Registry Apr. 6, 2021) (link to Excel spreadsheet at https://www.nist.gov/osac/public-documents).
</li>
<li>Id. They should be called "preferred definitions" for terms, and terms that are not supposed to be used in standards anymore should be called
"deprected terms," but I digress.
</li>
</ol>
Last modified: 9/27/21 14:30 ETDH Kayehttp://www.blogger.com/profile/09329862957840849989noreply@blogger.com0