Saturday, February 13, 2016

Broken Glass: What Do the Data Show?

In Broken Glass, Mangled Statistics, I noted "a plethora of statistical issues" in ASTM E2926-13, a Standard Test Method for Forensic Comparison of Glass Using Micro X-ray Fluorescence (μ-XRF) Spectrometry, that is working its way through the process for inclusion on the OSAC Registry of Approved Standards. The questions I raised about the Standard's procedures and criteria for declaring matches between glass specimens were based on elementary statistical theory but not data. Even if the ASTM's hypothesis testing procedures are idiosyncratic or conceptually flawed, they could have desirable properties.

There are some collections of glass that have been used to test the performance of the matching rules for some of the variables used in forensic testing. An FBI publication from 2009 offers the following summary:
Databases of refractive indices and/or chemical compositions of glass received in casework have been established by a number of crime laboratories (Koons et al. 1991). Although these glass databases are undeniably valuable, it should be noted that they may not be representative of the actual population of glass, and the distribution of glass properties may not be normal. Although these are not direct indicators of the rarity in any specific case, they can be used to show that the probability of a coincidental match is rare.

Koons and Buscaglia (1999) used the data from a chemical composition database and refractive index database to calculate the probability of a coincidental match. They estimated that ... the chance of finding a coincidental match in forensic glass casework using refractive index and chemical composition alone is 1 in 100,000 to 1 in 10 trillion, which strongly supports the supposition that glass fragments recovered from an item of evidence and a broken object with indistinguishable [refractive index] and chemical composition are unlikely to be from another source and can be used reliably to assist in reconstructing the events of a crime.

Range overlap on glass analytical data that include chemical composition data is considered a conservative standard. In one study, on a data set consisting of three replicate measurements each for 209 specimens, the range-overlap test discriminated all specimens, and all other statistical analysis-based tests performed worse (Koons and Buscaglia 2002).

Range-overlap tests, however, may achieve their high discrimination by indicating that two specimens from the same source are differentiable. Another study showed that when using a range-overlap test, the number of specimens differentiated that were actually from the same source may have been as high as seven percent (Bottrell et al. 2007).

The range-overlap approach, however, seems prudent given that other tests with higher thresholds for differentiation, such as t-tests with Welch modification (Curran et al. 2000) or Bayesian analysis (Walsh 1996), lower the number of specimens differentiated that were actually from the same source by worsening the ability to differentiate specimens that are genuinely different, a result that is unacceptable.
If I understand the argument, the author contends that high sensitivity is more important than high specificity. That makes sense for a screening test that will be followed by a more specific test, but in general, is it better to avoid falsely associating a defendant with crime-scene glass or to avoid falsely associating the defendant with the known glass? Any decision rule as to what is "indistinguishable" will generate a mix of false positives and false negatives. Should not the ASTM standards provide estimates from data (that might be representative of some relevant population) of these risks for each decision rule that the standards endorse or mandate?


No comments:

Post a Comment