Friday, November 27, 2020

Mysteries of the Department of Justice's ULTR for Firearm-toolmark Pattern Examinations

The Department of Justice's Uniform Language for Testimony and Reports (ULTR) for the Forensic Firearms/toolmarks Discipline – Pattern Examination" offers a ready response to motions to limit overclaiming or, to use the pedantic term, ultracrepidarianism, in expert testimony. Citing the DoJ policy, several federal district courts have indicated that they expect the government's expert witnesses to follow this directive (or something like it). \1/

Parts of the current version (with changes from the original) are reproduced in Box 1. \2/ This posting poses three questions about this guidance. Although the ULTR is a step in the right direction, it has a ways to go in articulating a clear and optimal policy.

Box 1. The ULTR

DEPARTMENT OF JUSTICE
UNIFORM LANGUAGE FOR TESTIMONY AND REPORTS
FOR THE FORENSIC FIREARMS/TOOLMARKS DISCIPLINE –
PATTERN MATCH EXAMINATION
...
III. Conclusions Regarding Forensic Pattern Examination of Firearms/Toolmarks Evidence for a Pattern Match

The An examiner may offer provide any of the following conclusions:
1.Source identification (i.e., identified)
2.Source exclusion (i.e., excluded)
3.Inconclusive
Source identification
‘Source identification’ is an examiner’s conclusion that two toolmarks originated from the same source. This conclusion is an examiner’s decision opinion that all observed class characteristics are in agreement and the quality and quantity of corresponding individual characteristics is such that the examiner would not expect to find that same combination of individual characteristics repeated in another source and has found insufficient disagreement of individual characteristics to conclude they originated from different sources.

The basis for a ‘source identification’ conclusion is an examiner’s decision opinion that the observed class characteristics and corresponding individual characteristics provide extremely strong support for the proposition that the two toolmarks came originated from the same source and extremely weak support for the proposition that the two toolmarks came originated from different sources.

A ‘source identification’ is the statement of an examiner’s opinion (an inductive inference2) that the probability that the two toolmarks were made by different sources is so small that it is negligible. A ‘source identification’ is not based upon a statistically-derived or verified measurement or an actual comparison to all firearms or toolmarks in the world.

Source exclusion
‘Source exclusion’ is an examiner’s conclusion that two toolmarks did not originate from the same source.

The basis for a ‘source exclusion’ conclusion is an examiner’s decision opinion that the observed class and/or individual characteristics provide extremely strong support for the proposition that the two toolmarks came from different sources and extremely weak or no support for the proposition that the two toolmarks came from the same source two toolmarks can be differentiated by their class characteristics and/or individual characteristics.

Inconclusive
‘Inconclusive’ is an examiner’s conclusion that all observed class characteristics are in agreement but there is insufficient quality and/or quantity of corresponding individual characteristics such that the examiner is unable to identify or exclude the two toolmarks as having originated from the same source.

The basis for an ‘inconclusive’ conclusion is an examiner’s decision opinion that there is an insufficient quality and/or quantity of individual characteristics to identify or exclude. Reasons for an ‘inconclusive’ conclusion include the presence of microscopic similarity that is insufficient to form the conclusion of ‘source identification;’ a lack of any observed microscopic similarity; or microscopic dissimilarity that is insufficient to form the conclusion of ‘source exclusion.’

IV. Qualifications and Limitations of Forensic Firearms/Toolmarks Discipline Examinations
A conclusion provided during testimony or in a report is ultimately an examiner’s decision and is not based on a statistically-derived or verified measurement or comparison to all other firearms or toolmarks. Therefore, an An examiner shall not assert that two toolmarks originated from the same source to the exclusion of all other sources. This may wrongly imply that a ‘source identification’ conclusion is based upon a statistically-derived or verified measurement or an actual comparison to all other toolmarks in the world, rather than an examiner’s expert opinion.
○ assert that a ‘source identification’ or a ‘source exclusion’ conclusion is based on the ‘uniqueness’3 of an item of evidence.

○ use the terms ‘individualize’ or ‘individualization’ when describing a source conclusion.

○ assert that two toolmarks originated from the same source to the exclusion of all other sources.
• An examiner shall not assert that examinations conducted in the forensic firearms/toolmarks discipline are infallible or have a zero error rate.

• An examiner shall not provide a conclusion that includes a statistic or numerical degree of probability except when based on relevant and appropriate data.

• An examiner shall not cite the number of examinations conducted in the forensic firearms/toolmarks discipline performed in his or her career as a direct measure for the accuracy of a proffered conclusion provided. An examiner may cite the number of examinations conducted in the forensic firearms/toolmarks discipline performed in his or her career for the purpose of establishing, defending, or describing his or her qualifications or experience.

• An examiner shall not assert that two toolmarks originated from the same source with absolute or 100% certainty, or use the expressions ‘reasonable degree of scientific certainty,’ ‘reasonable scientific certainty,’ or similar assertions of reasonable certainty in either reports or testimony unless required to do so by a judge or applicable law.34


2 Inductive reasoning (inferential reasoning):
A mode or process of thinking that is part of the scientific method and complements deductive reasoning and logic. Inductive reasoning starts with a large body of evidence or data obtained by experiment or observation and extrapolates it to new situations. By the process of induction or inference, predictions about new situations are inferred or induced from the existing body of knowledge. In other words, an inference is a generalization, but one that is made in a logical and scientifically defensible manner. Oxford Dictionary of Forensic Science 130 (Oxford Univ. Press 2012).
3 As used in this document, the term ‘uniqueness’ means having the quality of being the only one of its kind.’ Oxford English Dictionary 804 (Oxford Univ. Press 2012).
34 See Memorandum from the Attorney General to Heads of Department Components (Sept. 9. 2016), https://www.justice.gov/opa/file/891366/download.

1

Are the two or three conclusions -- identification, exclusion, and inconclusive -- the only way the examiners are allowed to report their results?

In much of the world, examiners are discouraged from reaching only two conclusions--included vs. excluded (with the additional option of denominating the data as too limited to permit such a classification). They are urged to articulate how strongly the data support one classification over the other. Instead of pigeonholing, they might say, for example, that the data strongly support the same-source classification (because those data are far more probable for ammunition fired from the same gun than for ammunition discharged from different guns). 

The ULTR studiously avoids mentioning this mode of reporting. It states that examiners "may provide any of the following ... ." It does not state whether they also may choose not to -- and instead report only the degree of support for the same-source (or the different-source) hypothesis. Does the maxim of expressio unius est exclusio alterius apply? Department of Justice personnel are well aware of this widely favored alternative. They have attended meetings of statisticians at which straw polls overwhelmingly endorsed it over the Department's permitted conclusions. Yet, the ULTR does not list statements of support (essentially, likelihood ratios) as permissible. But neither are they found in the list of thou-shalt-nots in Part IV. \3/ Is the idea that if the examiners have a conclusion to offer, they must state it as one of the two or three categorical ones -- and that they may give a qualitative likelihood ratio if they want to?

2

Is the stated logic of a "source identification" internally coherent and intellectually defensible?

The ULTR explains that "[t]he basis for a 'source identification' is

an examiner’s opinion that the observed class characteristics and corresponding individual characteristics provide extremely strong support for the proposition that the two toolmarks originated from the same source and extremely weak support for the proposition that the two toolmarks originated from different sources.

Translated into likelihood language, the DoJ's "basis for a source identification" is the belief that the likelihood ratio is very large -- the numerator of L is close to one, and the denominator is close to zero (see Box 2).

On this understanding, a "source identification" is a statement about the strength of the evidence rather than a conclusion (in the sense of a decision about the source hypothesis). However, the next paragraph of the ULTR states that "[a] ‘source identification’ is the statement of an examiner’s opinion (an inductive inference2) that the probability that the two toolmarks were made by different sources is so small that it is negligible."

Box 2. A Technical Definition of Support
The questioned toolmarks and the known ones have some degree of observed similarity X with respect to relevant characteristics. Let Lik(S) be the examiner's judgment of the likelihood of the same-source hypothesis S. This likelihood is proportional to Prob(X | S), the probability of the observed degree of similarity X given the hypothesis (S). For simplicity, we may as well let the constant be 1. Let Lik(D) be the examiner's judgment of the likelihood of the different-source hypothesis (D). This likelihood is Prob(X | D). The support for S is the logarithm of the likelihood ratio L = Lik(S) / Lik(D) = Prob(X | S) / Prob(X | D). \4/

In this way, the ULTR jumps from a likelihood to a posterior probability. To assert that "the probability that the two toolmarks were made by different sources ... is negligible" is to say that Prob(D|X) is close to 0, and hence that Prob(S|X) is nearly 1. However, the likelihood ratio L = Lik(S) / Lik(D) is only one factor that affects Prob(D|X). Bayes' theorem establishes that

Odds(D|X) = Odds(D) / L.

Consequently, a very large L (great support for S) shrinks the odds in favor of S, but whether we end up with a "negligible" probability for D depends on the odds on D without considering the strength of the toolmark evidence. Because the expertise of toolmark analysts only extends to evaluating the toolmark evidence, it seems that the ULTR is asking them to step outside their legitimate  sphere of expertise by assessing, either explicitly or implicitly, the strength of the particular non-scientific evidence in the case.

There is a way to circumvent this objection. To defend a "source identification" as a judgment that Prob(D|X) is negligible, the examiner could contend that the likelihood ratio L is not just very large, as the ULTR's first definition required, but that it is so large that it swamps every probability that a judge or juror reasonably might entertain in any possible case before learning about the toolmarks. A nearly infinite L would permit an analyst to dismiss the posterior odds on D as negligible without attempting to estimate the odds on the basis of other evidence in the particular case (see Box 3).

Box 3. How large must L be to swamp all plausible prior odds?

Suppose that the smallest prior same-source probability in any conceivable case were p = 1/1,000,000. The prior odds on the different-source hypothesis would be approximately 1/p = 1,000,000. According to Bayes' rule, the posterior odds on D then would be about (1/p)/L = 1,000,000/L.

How large would the support L for S have to be to make D a "negligible" possibility? If "negligible" means a probability below, say 1/100,000, then the threshold value of L (call it L*) would be such that 1,000,000 / L* < 1/100,000 (approximately). Hence L* > 10^11. Are examiners are able to reliably tell whether the toolmarks are such that L > 100 billion?

One can use different numbers, of course, but whether the swamping defense of the ULTR really works to justify actual testimony as to "source identification" defined according to the ULTR is none too clear.

The ULTR seems slightly embarrassed with the characterization of a "source identification" as an "opinion" on the small size of a probability. Parenthetically it calls the opinion "an inductive inference," which sounds more impressive. But the footnote that is supposed to explain the the more elegant phrase only muddies the waters. It reads as follows:

Inductive reasoning (inferential reasoning): A mode or process of thinking that is part of the scientific method and complements deductive reasoning and logic. Inductive reasoning starts with a large body of evidence or data obtained by experiment or observation and extrapolates it to new situations. By the process of induction or inference, predictions about new situations are inferred or induced from the existing body of knowledge. In other words, an inference is a generalization, but one that is made in a logical and scientifically defensible manner. Oxford Dictionary of Forensic Science 130 (Oxford Univ. Press 2012) [sic]. \4/

The flaws in this definition are many. First, "inferential reasoning" is not equivalent to "inductive reasoning." Inference is reaching a conclusion from stated premises. The argument from the premises to the conclusion can be deductive or inductive. Deductive arguments are valid when the conclusion is true given that the premises are true. Inductive arguments are sound when the conclusion is sufficiently probable given that the premises are true. Second, inductive reasoning can be based on a small body of evidence as well as on a large body of evidence. In other words, deduction produces logical certainty, whereas induction can yield no more than probable truth. Third, an induction -- that is, the conclusion of an inductive argument -- need not be particularly scientific or "scientifically defensible." Fourth, an inductive conclusion is not necessarily "a generalization." An inductive argument, no less than a deductive one, can go from the general to the specific -- as is the case for an inference that two toolmarks were made by the same source. Presenting an experience-based opinion as the product of "the scientific method" by the fiat of a flawed definition of "inductive reasoning" is puffery.

3

If the examiner has correctly discerned matching "individual characteristics" (as the ULTR calls them), why cannot the examiner "assert that a ‘source identification’ ... is based on ... ‘uniqueness’" or that there has been an "individualization"?

The ULTR states that a "source identification" is based on an examination of "class characteristics" and "individual characteristics." Presumably, "individual characteristics" are ones that differ in every source and thus permit "individualization." The dictionary on which the ULTR relies defines "individualization" as "assigning a unique source for a given piece of physical evidence" (which it distinguishes from "identification"). But the ULTR enjoins an examiner from using "the terms ‘individualize’ or ‘individualization’ when describing a source conclusion," from asserting "that a ‘source identification’ or a ‘source exclusion’ conclusion is based on the ‘uniqueness’ of an item of evidence," and from stating "that two toolmarks originated from the same source to the exclusion of all other sources."

The stated reason to avoid these terms is that a source attribution "is not based on a statistically-derived or verified measurement or comparison to all other firearms or toolmarks." But who would think that an examiner who "assert[s] that two '[t]oolmarks originated from the same source to the exclusion of all other sources'" is announcing "an actual comparison to all other toolmarks in the world"? The examiner apparently is allowed to report a plethora of matching "individual characteristics" and to opine (or "inductively infer") that there is virtually no chance that the marks came from a different source. Allowing such testimony cuts the heart out of the rules against asserting "uniqueness" and claiming "individualization."

NOTES

  1. E.g., United States v. Hunt, 464 F.Supp.3d 1252 (W.D. Okla. 2020) (discussed on this blog Aug. 10, 2020).
  2. The original version was adopted on 7/24/2018. It was revised on 6/8/2020.
  3. Are numerical versions of subjective likelihood ratios prohibited by the injunction in Part IV that "[a]n examiner shall not provide a conclusion that includes a statistic or numerical degree of probability except when based on relevant and appropriate data"? Technically, a likelihood ratio is not a "degree of probability" or (arguably) a statistic, but it seems doubtful that the drafters of the ULTR chose their terminology with the niceties of statistical terminology in mind.
  4. A.W.F. Edwards, Likelihood 31 (rev. ed. 1992) (citing H. Jeffreys, Further Significance Tests, 32 Proc. Cambridge Phil. Soc'y 416 (1936)).
  5. The correct name of the dictionary is A Dictionary of Forensic Science, and its author is Suzanne Bell. The quotation in the ULTR omits the following part of the definition of "inductive inference": "A forensic example is fingerprints. Every person's fingerprints are unique, but this is an inference based on existing knowledge since the only way to prove it would be to take and study the fingerprints of every human being ever born."

Tuesday, November 24, 2020

Wikimedia v. NSA: It's Classified!

The National Security Agency (NSA) engages in systematic, warrantless "upstream" surveillance of Internet communications that travel in and out of the United States along a "backbone" of fiber optic cables. The ACLU and other organizations maintain that Upstream surveillance is manifestly unconstitutional. Whether or not that is correct, the government has stymied one Fourth Amendment challenge after another on the ground that plaintiffs lacked standing because they cannot prove that the surveillance entails intercepting, copying, and reviewing any of their communications. Of course, the reason plaintiffs have no direct evidence is that the government won't admit or deny it. Instead, the government has asserted that the surveillance program is a privileged state secret, classified its details, and resisted even in camera hearings in ordinary courts.

In Wikimedia Foundation v. National Security Agency, 857 F.3d 193 (4th Cir. 2017), however, the Court of Appeals for the Fourth circuit held that the Wikimedia Foundation, which operates Wikipedia, made "allegations sufficient to survive a facial challenge to standing." Id. at 193. The court concluded that Wikimedia's allegations were plausible enough to defeat a motion to dismiss the complaint because

Wikimedia alleges three key facts that are entitled to the presumption of truth. First, “[g]iven the relatively small number of international chokepoints,” the volume of Wikimedia's communications, and the geographical diversity of the people with whom it communicates, Wikimedia's “communications almost certainly traverse every international backbone link connecting the United States with the rest of the world.”

Second, “in order for the NSA to reliably obtain communications to, from, or about its targets in the way it has described, the government,” for technical reasons that Wikimedia goes into at length, “must be copying and reviewing all the international text-based communications that travel across a given link” upon which it has installed surveillance equipment. Because details about the collection process remain classified, Wikimedia can't precisely describe the technical means that the NSA employs. Instead, it spells out the technical rules of how the Internet works and concludes that, given that the NSA is conducting Upstream surveillance on a backbone link, the rules require that the NSA do so in a certain way. ...

Third, per the PCLOB [Privacy and Civil Liberties Oversight Board] Report and a purported NSA slide, “the NSA has confirmed that it conducts Upstream surveillance at more than one point along the [I]nternet backbone.” Together, these allegations are sufficient to make plausible the conclusion that the NSA is intercepting, copying, and reviewing at least some of Wikimedia's communications. To put it simply, Wikimedia has plausibly alleged that its communications travel all of the roads that a communication can take, and that the NSA seizes all of the communications along at least one of those roads. 

Id. at 210-11 (citations omitted).

The Fourth Circuit therefore vacated an order dismissing Wikimedia's complaint issued by Senior District Judge Thomas Selby Ellis III, the self-described "impatient" jurist who achieved later notoriety and collected ethics complaints (that were rejected last year) for his management of the trial of former Trump campaign manager Paul Manafort.

On remand, the government moved for summary judgment. Wikimedia Found. v. Nat'l Sec. Agency/Cent. Sec. Serv., 427 F.Supp.3d 582 (D. Md. 2019). Once more, the government argued that Wikimedia lacked standing to complain that the Upstream surveillance violated its Fourth Amendment rights. It suggested that the "plausible" inference that the NSA must be "intercepting, copying, and reviewing at least some of Wikimedia's communications” recognized by the Fourth Circuit was not so plausible after all. To support this conclusion, it submitted a declaration of Henning Schulzrinne, a Professor of Computer Science and Electrical Engineering at Columbia University. Dr. Schulzrinne described how companies carrying Internet traffic might filter transmissions before copying them by “mirroring” with “routers” or “switches” that could perform “blacklisting” or “whitelisting” if the NSA chose to give the companies information on its targets with which to create “access control lists.”

But Dr. Schulzrinne supplied no information and formed no opinion on whether it was at all likely that the NSA used the mirroring methods that he envisioned, and Wikimedia produced a series of expert reports from Scott Bradner, who had served as Harvard University’s Technology Security Officer and taught at that university. Bradner contended that the NSA could hardly be expected to give away the information on its targets and concluded that it is all but certain that the agency intercepted and opened at least one of Wikimedia's trillions of Internet communications.

The district court refused to conduct an evidentiary hearing on the factual issue. Instead, it disregarded the expert's opinion as inadmissible scientific evidence under Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993), because no one without access to classified information could "know what the NSA prioritizes in the Upstream surveillance program ... and therefore Mr. Bradner has no knowledge or information about it." Wikimedia, 427 F. Supp. 3d at 604–05 (footnotes omitted).

This reasoning resembles that from Judge Ellis's first opinion in this long-running case. In Wikimedia Found. v. Nat'l Sec. Agency, 143 F. Supp. 3d 344, 356 (D. Md. 2015), the judge characterized Wikimedia’s allegations as mere “suppositions and speculation, with no basis in fact, about how the NSA” operates and maintained that it was impossible for Wikimedia to prove its allegations “because the scope and scale of Upstream surveillance remain classified . . . .” Id. Rather than allow full consideration of the strength of the evidence that makes Wikimedia’s claim plausible, the district court restated its position that “Mr. Bradner has no [direct] knowledge or information” because that information is classified. Wikimedia, 427 F. Supp. 3d at 604–605.

In a pending appeal to the Fourth Circuit, Edward Imwinkelried, Michael Risinger, Rebecca Wexler, and I prepared a brief as amici curiae in support of Wikimedia. The brief expresses surprise at “the district court’s highly abbreviated analysis of Rule 702 and Daubert, as well as the court’s consequent decision to rule inadmissible opinions of the type that Wikimedia’s expert offered in this case.” It describes the applicable standard for excluding expert testimony. It then argues that the expert’s method of reasoning was sound and that its factual bases regarding the nature of Internet communications and surveillance technology, together with public information on the goals and needs of the NSA program, were sufficient to justify the receipt of the proposed testimony.

UPDATE (9/27/21): On 9/15/21, the Fourth Circuit affirmed the summary judgment order -- but not on the basis of Judge Ellis's theories about expert testimony. A divided panel reasoned that the suit had to be dismissed because the government had properly invoked the state secrets privilege and that because the government would have to disclose those secrets to defend itself, “further litigation would present an unjustifiable risk of disclosure.” Wikimedia Found. v. Nat'l Sec. Agency/Cent. Sec. Serv., 14 F.4th 276 (4th Cir. 2021).