Friday, November 27, 2020

Mysteries of the Department of Justice's ULTR for Firearm-toolmark Pattern Examinations

The Department of Justice's Uniform Language for Testimony and Reports (ULTR) for the Forensic Firearms/toolmarks Discipline – Pattern Examination" offers a ready response to motions to limit overclaiming or, to use the pedantic term, ultracrepidarianism, in expert testimony. Citing the DoJ policy, several federal district courts have indicated that they expect the government's expert witnesses to follow this directive (or something like it). \1/

Parts of the current version (with changes from the original) are reproduced in Box 1. \2/ This posting poses three questions about this guidance. Although the ULTR is a step in the right direction, it has a ways to go in articulating a clear and optimal policy.

Box 1. The ULTR

DEPARTMENT OF JUSTICE
UNIFORM LANGUAGE FOR TESTIMONY AND REPORTS
FOR THE FORENSIC FIREARMS/TOOLMARKS DISCIPLINE –
PATTERN MATCH EXAMINATION
...
III. Conclusions Regarding Forensic Pattern Examination of Firearms/Toolmarks Evidence for a Pattern Match

The An examiner may offer provide any of the following conclusions:
1.Source identification (i.e., identified)
2.Source exclusion (i.e., excluded)
3.Inconclusive
Source identification
‘Source identification’ is an examiner’s conclusion that two toolmarks originated from the same source. This conclusion is an examiner’s decision opinion that all observed class characteristics are in agreement and the quality and quantity of corresponding individual characteristics is such that the examiner would not expect to find that same combination of individual characteristics repeated in another source and has found insufficient disagreement of individual characteristics to conclude they originated from different sources.

The basis for a ‘source identification’ conclusion is an examiner’s decision opinion that the observed class characteristics and corresponding individual characteristics provide extremely strong support for the proposition that the two toolmarks came originated from the same source and extremely weak support for the proposition that the two toolmarks came originated from different sources.

A ‘source identification’ is the statement of an examiner’s opinion (an inductive inference2) that the probability that the two toolmarks were made by different sources is so small that it is negligible. A ‘source identification’ is not based upon a statistically-derived or verified measurement or an actual comparison to all firearms or toolmarks in the world.

Source exclusion
‘Source exclusion’ is an examiner’s conclusion that two toolmarks did not originate from the same source.

The basis for a ‘source exclusion’ conclusion is an examiner’s decision opinion that the observed class and/or individual characteristics provide extremely strong support for the proposition that the two toolmarks came from different sources and extremely weak or no support for the proposition that the two toolmarks came from the same source two toolmarks can be differentiated by their class characteristics and/or individual characteristics.

Inconclusive
‘Inconclusive’ is an examiner’s conclusion that all observed class characteristics are in agreement but there is insufficient quality and/or quantity of corresponding individual characteristics such that the examiner is unable to identify or exclude the two toolmarks as having originated from the same source.

The basis for an ‘inconclusive’ conclusion is an examiner’s decision opinion that there is an insufficient quality and/or quantity of individual characteristics to identify or exclude. Reasons for an ‘inconclusive’ conclusion include the presence of microscopic similarity that is insufficient to form the conclusion of ‘source identification;’ a lack of any observed microscopic similarity; or microscopic dissimilarity that is insufficient to form the conclusion of ‘source exclusion.’

IV. Qualifications and Limitations of Forensic Firearms/Toolmarks Discipline Examinations
A conclusion provided during testimony or in a report is ultimately an examiner’s decision and is not based on a statistically-derived or verified measurement or comparison to all other firearms or toolmarks. Therefore, an An examiner shall not assert that two toolmarks originated from the same source to the exclusion of all other sources. This may wrongly imply that a ‘source identification’ conclusion is based upon a statistically-derived or verified measurement or an actual comparison to all other toolmarks in the world, rather than an examiner’s expert opinion.
○ assert that a ‘source identification’ or a ‘source exclusion’ conclusion is based on the ‘uniqueness’3 of an item of evidence.

○ use the terms ‘individualize’ or ‘individualization’ when describing a source conclusion.

○ assert that two toolmarks originated from the same source to the exclusion of all other sources.
• An examiner shall not assert that examinations conducted in the forensic firearms/toolmarks discipline are infallible or have a zero error rate.

• An examiner shall not provide a conclusion that includes a statistic or numerical degree of probability except when based on relevant and appropriate data.

• An examiner shall not cite the number of examinations conducted in the forensic firearms/toolmarks discipline performed in his or her career as a direct measure for the accuracy of a proffered conclusion provided. An examiner may cite the number of examinations conducted in the forensic firearms/toolmarks discipline performed in his or her career for the purpose of establishing, defending, or describing his or her qualifications or experience.

• An examiner shall not assert that two toolmarks originated from the same source with absolute or 100% certainty, or use the expressions ‘reasonable degree of scientific certainty,’ ‘reasonable scientific certainty,’ or similar assertions of reasonable certainty in either reports or testimony unless required to do so by a judge or applicable law.34


2 Inductive reasoning (inferential reasoning):
A mode or process of thinking that is part of the scientific method and complements deductive reasoning and logic. Inductive reasoning starts with a large body of evidence or data obtained by experiment or observation and extrapolates it to new situations. By the process of induction or inference, predictions about new situations are inferred or induced from the existing body of knowledge. In other words, an inference is a generalization, but one that is made in a logical and scientifically defensible manner. Oxford Dictionary of Forensic Science 130 (Oxford Univ. Press 2012).
3 As used in this document, the term ‘uniqueness’ means having the quality of being the only one of its kind.’ Oxford English Dictionary 804 (Oxford Univ. Press 2012).
34 See Memorandum from the Attorney General to Heads of Department Components (Sept. 9. 2016), https://www.justice.gov/opa/file/891366/download.

1

Are the two or three conclusions -- identification, exclusion, and inconclusive -- the only way the examiners are allowed to report their results?

In much of the world, examiners are discouraged from reaching only two conclusions--included vs. excluded (with the additional option of denominating the data as too limited to permit such a classification). They are urged to articulate how strongly the data support one classification over the other. Instead of pigeonholing, they might say, for example, that the data strongly support the same-source classification (because those data are far more probable for ammunition fired from the same gun than for ammunition discharged from different guns). 

The ULTR studiously avoids mentioning this mode of reporting. It states that examiners "may provide any of the following ... ." It does not state whether they also may choose not to -- and instead report only the degree of support for the same-source (or the different-source) hypothesis. Does the maxim of expressio unius est exclusio alterius apply? Department of Justice personnel are well aware of this widely favored alternative. They have attended meetings of statisticians at which straw polls overwhelmingly endorsed it over the Department's permitted conclusions. Yet, the ULTR does not list statements of support (essentially, likelihood ratios) as permissible. But neither are they found in the list of thou-shalt-nots in Part IV. \3/ Is the idea that if the examiners have a conclusion to offer, they must state it as one of the two or three categorical ones -- and that they may give a qualitative likelihood ratio if they want to?

2

Is the stated logic of a "source identification" internally coherent and intellectually defensible?

The ULTR explains that "[t]he basis for a 'source identification' is

an examiner’s opinion that the observed class characteristics and corresponding individual characteristics provide extremely strong support for the proposition that the two toolmarks originated from the same source and extremely weak support for the proposition that the two toolmarks originated from different sources.

Translated into likelihood language, the DoJ's "basis for a source identification" is the belief that the likelihood ratio is very large -- the numerator of L is close to one, and the denominator is close to zero (see Box 2).

On this understanding, a "source identification" is a statement about the strength of the evidence rather than a conclusion (in the sense of a decision about the source hypothesis). However, the next paragraph of the ULTR states that "[a] ‘source identification’ is the statement of an examiner’s opinion (an inductive inference2) that the probability that the two toolmarks were made by different sources is so small that it is negligible."

Box 2. A Technical Definition of Support
The questioned toolmarks and the known ones have some degree of observed similarity X with respect to relevant characteristics. Let Lik(S) be the examiner's judgment of the likelihood of the same-source hypothesis S. This likelihood is proportional to Prob(X | S), the probability of the observed degree of similarity X given the hypothesis (S). For simplicity, we may as well let the constant be 1. Let Lik(D) be the examiner's judgment of the likelihood of the different-source hypothesis (D). This likelihood is Prob(X | D). The support for S is the logarithm of the likelihood ratio L = Lik(S) / Lik(D) = Prob(X | S) / Prob(X | D). \4/

In this way, the ULTR jumps from a likelihood to a posterior probability. To assert that "the probability that the two toolmarks were made by different sources ... is negligible" is to say that Prob(D|X) is close to 0, and hence that Prob(S|X) is nearly 1. However, the likelihood ratio L = Lik(S) / Lik(D) is only one factor that affects Prob(D|X). Bayes' theorem establishes that

Odds(D|X) = Odds(D) / L.

Consequently, a very large L (great support for S) shrinks the odds in favor of S, but whether we end up with a "negligible" probability for D depends on the odds on D without considering the strength of the toolmark evidence. Because the expertise of toolmark analysts only extends to evaluating the toolmark evidence, it seems that the ULTR is asking them to step outside their legitimate  sphere of expertise by assessing, either explicitly or implicitly, the strength of the particular non-scientific evidence in the case.

There is a way to circumvent this objection. To defend a "source identification" as a judgment that Prob(D|X) is negligible, the examiner could contend that the likelihood ratio L is not just very large, as the ULTR's first definition required, but that it is so large that it swamps every probability that a judge or juror reasonably might entertain in any possible case before learning about the toolmarks. A nearly infinite L would permit an analyst to dismiss the posterior odds on D as negligible without attempting to estimate the odds on the basis of other evidence in the particular case (see Box 3).

Box 3. How large must L be to swamp all plausible prior odds?

Suppose that the smallest prior same-source probability in any conceivable case were p = 1/1,000,000. The prior odds on the different-source hypothesis would be approximately 1/p = 1,000,000. According to Bayes' rule, the posterior odds on D then would be about (1/p)/L = 1,000,000/L.

How large would the support L for S have to be to make D a "negligible" possibility? If "negligible" means a probability below, say 1/100,000, then the threshold value of L (call it L*) would be such that 1,000,000 / L* < 1/100,000 (approximately). Hence L* > 10^11. Are examiners are able to reliably tell whether the toolmarks are such that L > 100 billion?

One can use different numbers, of course, but whether the swamping defense of the ULTR really works to justify actual testimony as to "source identification" defined according to the ULTR is none too clear.

The ULTR seems slightly embarrassed with the characterization of a "source identification" as an "opinion" on the small size of a probability. Parenthetically it calls the opinion "an inductive inference," which sounds more impressive. But the footnote that is supposed to explain the the more elegant phrase only muddies the waters. It reads as follows:

Inductive reasoning (inferential reasoning): A mode or process of thinking that is part of the scientific method and complements deductive reasoning and logic. Inductive reasoning starts with a large body of evidence or data obtained by experiment or observation and extrapolates it to new situations. By the process of induction or inference, predictions about new situations are inferred or induced from the existing body of knowledge. In other words, an inference is a generalization, but one that is made in a logical and scientifically defensible manner. Oxford Dictionary of Forensic Science 130 (Oxford Univ. Press 2012) [sic]. \4/

The flaws in this definition are many. First, "inferential reasoning" is not equivalent to "inductive reasoning." Inference is reaching a conclusion from stated premises. The argument from the premises to the conclusion can be deductive or inductive. Deductive arguments are valid when the conclusion is true given that the premises are true. Inductive arguments are sound when the conclusion is sufficiently probable given that the premises are true. Second, inductive reasoning can be based on a small body of evidence as well as on a large body of evidence. In other words, deduction produces logical certainty, whereas induction can yield no more than probable truth. Third, an induction -- that is, the conclusion of an inductive argument -- need not be particularly scientific or "scientifically defensible." Fourth, an inductive conclusion is not necessarily "a generalization." An inductive argument, no less than a deductive one, can go from the general to the specific -- as is the case for an inference that two toolmarks were made by the same source. Presenting an experience-based opinion as the product of "the scientific method" by the fiat of a flawed definition of "inductive reasoning" is puffery.

3

If the examiner has correctly discerned matching "individual characteristics" (as the ULTR calls them), why cannot the examiner "assert that a ‘source identification’ ... is based on ... ‘uniqueness’" or that there has been an "individualization"?

The ULTR states that a "source identification" is based on an examination of "class characteristics" and "individual characteristics." Presumably, "individual characteristics" are ones that differ in every source and thus permit "individualization." The dictionary on which the ULTR relies defines "individualization" as "assigning a unique source for a given piece of physical evidence" (which it distinguishes from "identification"). But the ULTR enjoins an examiner from using "the terms ‘individualize’ or ‘individualization’ when describing a source conclusion," from asserting "that a ‘source identification’ or a ‘source exclusion’ conclusion is based on the ‘uniqueness’ of an item of evidence," and from stating "that two toolmarks originated from the same source to the exclusion of all other sources."

The stated reason to avoid these terms is that a source attribution "is not based on a statistically-derived or verified measurement or comparison to all other firearms or toolmarks." But who would think that an examiner who "assert[s] that two '[t]oolmarks originated from the same source to the exclusion of all other sources'" is announcing "an actual comparison to all other toolmarks in the world"? The examiner apparently is allowed to report a plethora of matching "individual characteristics" and to opine (or "inductively infer") that there is virtually no chance that the marks came from a different source. Allowing such testimony cuts the heart out of the rules against asserting "uniqueness" and claiming "individualization."

NOTES

  1. E.g., United States v. Hunt, 464 F.Supp.3d 1252 (W.D. Okla. 2020) (discussed on this blog Aug. 10, 2020).
  2. The original version was adopted on 7/24/2018. It was revised on 6/8/2020.
  3. Are numerical versions of subjective likelihood ratios prohibited by the injunction in Part IV that "[a]n examiner shall not provide a conclusion that includes a statistic or numerical degree of probability except when based on relevant and appropriate data"? Technically, a likelihood ratio is not a "degree of probability" or (arguably) a statistic, but it seems doubtful that the drafters of the ULTR chose their terminology with the niceties of statistical terminology in mind.
  4. A.W.F. Edwards, Likelihood 31 (rev. ed. 1992) (citing H. Jeffreys, Further Significance Tests, 32 Proc. Cambridge Phil. Soc'y 416 (1936)).
  5. The correct name of the dictionary is A Dictionary of Forensic Science, and its author is Suzanne Bell. The quotation in the ULTR omits the following part of the definition of "inductive inference": "A forensic example is fingerprints. Every person's fingerprints are unique, but this is an inference based on existing knowledge since the only way to prove it would be to take and study the fingerprints of every human being ever born."

No comments:

Post a Comment