Saturday, June 11, 2022

State v. Ghigliotti, Computer-assisted Bullet Matching, and the ASB Standards

In State v. Ghigliotti, 232 A.3d 468, 471 (N.J. App. Div. 2020), a firearms examiner concluded that a particular gun did not fire the bullet (or, more precisely, a bullet jacket) removed from the body of a man found shot to death by the side of a road in Union County, New Jersey. That was 2005, and the case went nowhere.

Ten years later, a detective prevailed on a second firearms examiner to see what he thought of the toolmark evidence. After considerable effort, this examiner reported that the microscopic comparisons with many test bullets from the gun in question were inconclusive.

However, at a training seminar in New Orleans he learned of two tools developed and marketed by Ultra Electronics Forensic Technology, the creator of the Integrated Ballistics Identification System (IBIS) that "can find the 'needle in the haystack', suggesting possible matches between pairs of spent bullets and cartridge cases, at speeds well beyond human capacity. The Bullettrax system “digitally captures the surface of a bullet in 2D and 3D, providing a topographic model of the marks around its circumference.” As “[t]he world’s most advanced bullet acquisition station” it uses “intelligent surface tracking that automatically adapts to deformations of damaged and fragmented bullets.”

The complementary Matchpoint is an “analysis station” with “[p]owerful visualization tools [that] go beyond conventional comparison microscopes to ease the recognition of high-confidence matches. Indeed, Matchpoint increases identification success rates while reducing efforts required for ultimate confirmations.” It features multiple side-by-side view of images from the Bullettrax data and score analysis. The court explained that “the Matchpoint software ... included tools for flattening and manipulating the images, adjusting the brightness, zooming in, and ‘different overlays of ... color scaling.’”

But the examiner did not make the comparisons based on the digitally generated and enhanced images, and he did not rely on any similarity-score analysis. Rather, he “looked at the images side-by-side on a computer screen using Matchpoint [only] ‘to try and target areas of interest to determine ... if (he) was going to go back and continue with further [conventional] microscopic comparisons or not.’” He found four such areas of agreement. Conducting a new microscopic analysis of these and other areas a few weeks later, he “‘came to an opinion of an identification or a positive identification’ ... grounded in his ‘training and experience and education as a practitioner in firearms identification’ and his handling of over 2300 cases.” 232 A.3d at 478–49.

The trial court “determined that a Frye hearing was necessary to demonstrate the reliability of the computer images of the bullets produced by BULLETTRAX before the expert would be permitted to testify at trial.” Id. at 471, The state filed an interlocutory appeal, arguing that the positive identification did not depend on Ultra’s products. The Appellate Division affirmed, holding that the hearing should proceed.

I do not know where the case stands, but its facts provide the basis for a thought experiment. At about the same time as the Ghigliotto court affirmed the order for a hearing, the American Academy of Forensic Sciences Standards Board (ASB) published a package of standards on toolmark comparisons. Created in 2015, ASB describes itself as “an ANSI [American National Standards Institute]-accredited Standards Developing Organization with the purpose of providing accessible, high quality science-based consensus forensic standards.” Academy Standards Board, Who We Are, 2022. Two of its standards concern three-dimensional (3D) data and inferences in toolmark comparisons, while the third is specific to software for comparing 2D or 3D data.

We can put the third to the side, for it is limited to software that "seeks to assess both the level of geometric similarity (similarity of toolmarks) and the degree of certainty that the observed similarity results from a common origin." ANSI/ASB Standard 062, Standard for Topography Comparison Software for Toolmark Analysis § 3.1 (2021). The data collection and visualization software here does neither, and the scoring feature of Matchpoint was not used.

ANSI/ASB Standard 061, Firearms and Toolmarks 3D Measurement Systems and Measurement Quality Control (2021), is more apposite although it is only intended “to ensure the instrument’s accuracy, to conduct instrument calibration, and to estimate measurement uncertainty for each axis (X, Y, and Z).” It promises “procedures for validation of 3D system hardware” but not software. It “does not apply to legacy 2D type systems,” leaving one to wonder whether there are any standards for validating them.

Even for "3D system hardware," the procedure for “developmental validity” (§ 4.1) is nonexistent. There are no criteria in this standard for recognizing when a measurement system is valid and no steps that a researcher must follow to study validity. Instead, the section on “Developmental Validation (Mandatory)” states that an “organization with appropriate knowledge and/or [sic] expertise” shall complete “a developmental validation”; that this validation “typically” (but not necessarily) consists of library research (“identifying and citing previously published scientific literature”); and that “ample”—but entirely uncited— literature exists “to establish the underlying imaging technology” for seven enumerated technologies. In full, the three sentences on “developmental validation” are

As per ANSI/ASB Standard 063, Implementation of 3D Technologies in Forensic Firearm and Toolmark Comparison Laboratories, a developmental validation shall be completed by at least one organization with appropriate knowledge and/or expertise. The developmental validation of imaging hardware typically consists of identifying and citing previously published scientific literature establishing the underlying imaging technology. The methods defined above of coherence scanning interferometry, confocal microscopy, confocal chromatic microscopy, focus variation microscopy, phase-shifting interferometric microscopy, photometric stereo, and structured light projection all have ample published scientific literature which can be cited to establish an underlying imaging technology.

Perhaps the section is merely there to point the reader to the different standard, ASB 063, on implementation of 3D technologies. \1/ But that standard seems to conceive of “developmental validation” as a process that occurs in a forensic laboratory or other organization by a predefined process with a “technical reviewer” to sign off on the resulting document that becomes the object of further review through “[p]eer-reviewed publication (or other means of dissemination to the scientific community, such as a peer-reviewed presentation at a scientific meeting).” § 4.1.3.4. The data and the statistics needed to assess measurement validity are left to the readers' imaginations (or statistical acumen). \2/

ASB 061 devotes more attention to what it calls “deployment validation” on the part of every laboratory that chooses to use a 3D measuring instrument. This part of the standard describes some procedures for checking whether X, Y, and Z “scales” that should reveal whether measurements of the coordinates of points on the surface of the material are close to what they should be. For example, § 4.2.5.4.1 specifies that

Using calibrated geometric standards (e.g., sine wave, pitch, step heights), measurements shall be conducted to check the X and Y lateral scales as well as the vertical Z scale. Ten measurements shall be performed consecutively ... . The measurement uncertainty of the repeatability measurements shall overlap with the certified value and uncertainty of the geometric standard used.

The phrasing is confusing (to me, at least). I assume that a “geometric standard” is the equivalent of a ruler of known length (a “certified value” of, say, 1 ± 0.01 microns). But what does the edict that “[t]he measurement uncertainty of the repeatability measurements shall overlap with the certified value and uncertainty of the geometric standard used” mean operationally?

The best answer I can think of is that the standard contemplates comparing two intervals. One is the scale value (along, say, the X-axis). Imagine that the “geometric standard” that is taken to be the truth is certified as having a length of 1± 0.01 microns. Let’s call this the “certified standard interval.”

Now the laboratory makes ten measurements for its “deployment validation” to produce what we can call a “sample interval” from the ten measurements. The ASB standard does not contain any directions on how this is to be done. One approach would be to compute a confidence interval on the assumption that the sample measurements are normally distributed. Suppose the observed sample mean for them is 0.80, and the standard error computed from the ten sample measurements is s = 0.10 microns. The confidence interval is then 0.80 ± k(0.10), where is k is some constant. If the confidence interval includes any part of the certified interval, this part of the deployment-validation requirement is met.

What values of k would be suitable for the instrument to be regarded as “deploymentally valid”? The standard is devoid of any insight into this critical value and its relationship to confidence. It does not explain what the interval-overlap requirement is supposed to accomplish, but if the confidence interval is part of it, it is an ad hoc form of hypothesis testing with an unstated significance level.

Is the question of whether the hypothesis that there is no difference between the standard reference value of 1 and the true sample mean can be rejected at some preset significance level all that important here? Should not the question be how much the disparities between the sample of ten measured values and the geometric-standard value would affect the efficacy of the measurements? An observed sample mean that is 20% too low does not lead to the rejection of the hypothesis that the instrument’s measurements are, in the long run, exactly correct, but with only ten measurements in the sample, that may tell us more about the lack of the statistical power of the test than about the ability of the instrumentation to measure what it seeks to measure with suitable accuracy for the applications to which it is put.

In sum, the standard’s section on “Developmental Validation (Mandatory)” mandates nothing that is not trivially obvious—the court already knows that it should look for support for the 3D scanning and image-manipulation methods in the scientific literature, and the standard does not reveal what the substance of this validation should be. “Deployment Validation (Mandatory)” is supposed to ensure that the laboratory is properly prepared to use a previously validated system for casework. It is of little use in a hearing on the general acceptance of the scanning system and the theories behind it. (One could argue that scientists would accept a system that a laboratory has been rigorously pretested and shown to be perform accurately, even with no other validation, but it is not clear that the standard describes an appropriate, rigorous pretesting procedure.)

Moreover, the standard explicitly excludes software from its reach, making it inapplicable to the Matchpoint image-manipulation tools the helped the examiner in Ghigliotti zero in on the regions that altered his opinion. The companion standard on software does not fill this gap, for it deals only with software that produces similarity scores or random-match probabilities. Finally, ASB 063's substantive requirements for "deployment validation" prior to laboratory implementation might well prohibit an examiner from going to the developer of hardware and software not yet adopted by his or her employer for help with locating features for further visual analysis, as occurred in Ghigliotti. But that is not responsive to the legal question of whether the developer's system is generally accepted as valid in the scientific community.

NOTES
  1. ANSI/ASB 063 is even more devoid of references. The entire bibliography consists of a webpage entitled “control chart.” There, attorneys, courts, or experts seeking to use the standard will discover that a “control chart is a graph used to study how a process changes over time.” That is great for quality control of instrumentation, but it is irrelevant to validation.
  2. Under § 4.1.2.4, "The plan for developmental validation study shall include the following:
    "a) the limitations of the procedure;
    "b) the conditions under which reliable results can be obtained;
    "c) critical aspects of the procedure that shall be controlled and monitored;
    "d) the ability of the resulting procedure to meet the needs of the given application."

Last updated: 12 June 2022