Saturday, January 15, 2022

Bones of Contention: A Standard for Analyzing Skeletal Trauma in Forensic Anthropology

The Academy Standards Board (ASB) of the American Association of Forensic Sciences (AAFS) posted the second proposed draft of a "Standard for Analyzing Skeletal Trauma in Forensic Anthropology" for public comment. The standard does not go far toward standardizing procedures or showing that the procedures to which it applies have been scientifically tested. Of course, it could well be that ample, well designed studies have demonstrated that forensic anthropologists can consistently and accurately classify skeletal defects in human remains according to the categories the standard mentions. But the standard contains no bibliography and no citations to show that this is the case. 

It contains some negative injunctions and a few positive suggestions about reporting -- for example:

  • Forensic anthropologists shall not determine cause or manner of death.
  • Practitioners shall not estimate the temperature or duration of heat exposure based on thermal defects to bone.
  • Practitioners may report the minimum number of traumatic events (e.g., blunt impacts, projectile entry defects, or sharp defects) observed skeletally, but shall not report a definitive maximum number of impacts, as skeletal trauma evidence may not reflect all impacts to the body.
  • When a suspect tool is submitted for analysis, similarities between the tool and defect may be reported; conclusions shall be reported in terms of an exclusion or failure to exclude.

As such, ASB 147-21 is not without any redeeming legal value. Nevertheless, it does not articulate any analytical process by which the classifications it calls for should be made (cf. "vacuous standards"); it requires no reporting of the uncertainty in this process; it does not contemplate the possibility of evidence-based rather than conclusion-based statements of the implications of the data; and it refers to an all-inclusive list of methods as "acceptable." If I may elaborate:

File:Human skeleton remains.jpg - Wikimedia Commons

Is "Interpretation" Limited to an Opinion on the Inference (Conclusion) from the Data?

The revision defines "trauma interpretation" as "Opinion regarding the mechanism of, timing, direction of impact(s) or minimum number of impacts associated with skeletal defect(s) based on quantitative and/or qualitative observations." The phrase "based on ... observations" indicates that the opinion expresses a belief in the truth, falsity, or probability of an inference being drawn from the data. Interpretation should include the possibility of describing the strength of the evidence in favor of the inference rather than opining on the truth, falsity, or probability of the conclusion itself. In addition, if the opinion-statement is an assertion that the hypothesis about what happened is true or false (either categorically or to some probability), it is not just based on the data but on a prior probability for the hypothesis as well.

Despite these definitions, the standard sanctions "interpretation" in the form of rudimentary statements about the extent to which the data prove the hypothesis in question. Section 6 notes that "Trauma interpretation shall be clearly identified in the report using terms such as ‘indicative of’ and ‘consistent with’ or by using a subheading titled ‘Interpretation.’"These phrases have their problems, but they are one manner of referring to the probability of the evidence given the truth of certain probabilities rather than vice versa.

Is Interpretation Based on Non-scientific Evidence and Inference?

The revision introduces the following (non)criteria for deciding that blasts or explosions caused skeletal trauma: "Blasts/explosive events often cause blunt (including concussive) and projectile trauma to the body. When the trauma pattern and circumstantial information support a blast event, the trauma mechanism should be classified as 'blast trauma'”. The undefined notion of "support" is too vague to give any guidance. Is "consistent with" considered "support"? Let's hope not -- patterns can be "consistent with" one hypotheses (it could occur when the hypothesis is true) but much more probable under the opposite hypothesis.

And then there is the green light this recommendation gives to presenting a conclusion based on nonscientific "circumstantial evidence" as if it were based on expertise involving the skeletal evidence. Knowing that a blast occurred can drive the conclusion that the damage to the skeleton is "blast trauma." Should there also be a report on the skeletal evidence from an analyst blinded to the other information uncovered in the investigation?

Is Everything Acceptable?

ASB 147-21 states that "Skeletal trauma shall be examined. Acceptable methods to examine trauma include gross, microscopic, radiographic, and other analytical methods." This formulation deems every conceivable analytical method as "acceptable" no matter how poorly conceived it may be. Labeling everything as "acceptable" is troublesome in a standard that does not include criteria and procedures for performing the analysis and that does not lead the reader to any evidence of the reliability and validity of the undefined "analytical procedures."

Of course, forensic anthropologists know that some procedures do not work well, and only an outlier would use them. The drafters of ASB 147-21 undoubtedly appreciate the need for suitable methods (and hence prohibit certain conclusions that cannot be drawn with any existing method). Well motivated and informed forensic anthropologists will not be led astray if they consult the standard. But outliers do appear in court. Remember Louise Robbins. Unless the dubious method yields one of the explicitly prohibited statements in this standard, the outlier witnesses could maintain that they have proceeded exactly as the standard requires. Standards with this potential for abuse should be reformed.They should strive to standardize the methods they govern, and they should state what is known about the accuracy and reliability of these methods.

Monday, January 3, 2022

Fitting "Physical Fit" into the Courtroom

The logic of piecing together fragments of broken glass, torn tape, cut paper, and the like seems simple enough. \1/ If the pieces fit in all their details at the edges, and if all surface marks or impressions that would cross an edge also align nicely, one has circumstantial evidence that they were once part of the same object.

The strength of this evidence for a single source depends on the extent and detail of the concordance between the recovered pieces. A physical fit between two halves of a broken plank of wood is powerful evidence for the hypothesis that the two pieces resulted from breaking this one plank. But if the pieces are weathered and the splintered edges dulled, the physical fit will be less precise and less supportive of the claim that they came from the same original plank.

At the other extreme, if two pieces are plainly discordant, they might have come from different places on the same object, with the intermediate pieces being missing. Or they might have come from different objects entirely. Consider tearing off five pieces of duct tape from the same roll of tape and comparing the edges of the first and the last segments. The detailed structure of the edges should not be complementary. Likewise, tearing segments of tape from five different rolls should result in a mismatch between the first and the fifth segment.

Criminalists or materials experts can be extremely helpful in examining the recovered pieces of objects to determine the degree of physical fit -- that is, in elucidating how well the edges fit together and the extent to which a mark on the surface of one piece lines up with a mark on the other when the pieces are aligned. But how they should describe their findings seems to be muddled in forensic-science standards. This posting describes the current vocabulary and argues that it is articificial and a departure from the ordinary meaning of the term "fit." It then outlines better alternatives to reporting the results of an investigation into physical fit.

I. The Standard Approach

Let’s look at a couple of ASTM standards. E2225-19a (Standard Guide for Forensic Examination of Fabrics and Cordage) instructs that “[i]f a physical match is found, it should be reported in a manner that will demonstrate that the two or more pieces of material were at one time a continuous piece of fabric or cordage” (§ 7.2.2). This standard treats the “physical match” as an observable property of the specimens (concordant edges and surface marks) that is conclusive of the hypothesis of a single source (the inference from the data).

ASTM E3260−21 (Standard Guide for Forensic Examination and Comparison of Pressure Sensitive Tapes), on the other hand, characterizes “physical fit” not as a property of the materials, but as a “type of examination that can be performed” (§ 10.5.1). This “conclusive type of examination ... is a physical end match.” Id. It “involves the comparison of edges, fabric (if present), surface striae, and other surface irregularities between samples in which corresponding features provide distinct characteristics that indicate the samples were once joined at the respective separated edges.” Of course, "distinct characteristics that indicate the samples were once joined at the respective separated edges” are not necessarily "conclusive," making this definition of "physical fit" as a "type of examination" puzzling. The intent, it seems, is to define a physical fit examination (rather than a physical fit) as one that is capable of conclusively proving that the pieces were once joined together.

A Proposed New Standard Guide for the Collection, Analysis and Comparison of Forensic Glass Samples, ASTM WK72932, released for public comment late last year states that “broken objects can be reassembled to their original configuration ... called a ‘physical fit’ (§ 11.1). But a physical fit is the original configuration of a broken object only if the pieces come from that original object, and this origin story is not true just because a standard defines "physical fit" that way. The evidence from the examination may be that the separate pieces fit together extremely well. If so, the conclusion is that they were once together within or as a unitary object. This conclusion may well be true, but one cannot decide, by the fiat of a definition, that the pieces that are observed to fit together well have been realigned as they once were. Yet, a later section similarly asserts that “[a] glass physical fit is a determination that two or more pieces of glass were once part of the same broken glass object” (§ 11.2.8). This effort to define "physical fit" as inherently conclusive prompted eleven lawyers (including me) \2/ to caution ASTM that “[t]he hypothesis or conclusion that fragments come from the same object is not a physical fit. It is an inference drawn from the observations that produce the designation of a physical fit.”

Still more recently, an OSAC subcommittee released a Standard Guide for Forensic Physical Fit Examination (OSAC 2022-S-0015) for public comment before it is delivered to ASTM for consideration there. This proposed standard goes off in another direction. It equates a “physical fit” with the examiner’s state of mind about a hypothetical ensemble of experiments:

13.1 Physical Fit
13.1.1 The items that have been broken, torn, separated, or cut exhibit physical features that realign in a manner that is not expected to be replicated.
13.1.1.1 Physical Fit is the highest degree of association between items. It is the opinion that the observations provide the strongest support for the proposition that the items originated from the same source as opposed to the proposition they originated from different sources.

13.2 No Physical Fit
13.2.1 The items correspond in observed class characteristics, but exhibit physical features that do not realign, or they realign in a manner that could be replicated.
13.2.2 Alternatively, the items can exhibit physical features that partially realign, display simultaneous similarities and differences, show areas of discrepancy (e.g., warped areas, burned areas, missing pieces), or have insufficient individual characteristics that hinder the ability to determine the presence or absence of a physical fit.

Statisticians will notice the shift from (1) the incompletely expressed frequentist idea of an infinite sequence of trials in which different objects A and B are broken and the pieces from A never align with those from B to (2) the likelihoodist conception of support for the same-source hypothesis. But that implicit change in the theory of inference is hardly a cardinal sin in this context. If the probability of a fit at least as good as the one observed is practically zero for different sources, and if the probability of such a fit for the same source is much higher, then the support (the log-likelihood ratio) is very high.

Nevertheless, defining physical fit as a categorical opinion rather than a more variable degree of congruency that generates the opinion — and dumping everything short of a perceived fit into the category of ”no physical fit” — deviates from the common understanding that physical fit comes in degrees. There can be a remarkably great fit, a pretty good fit, and so on, down to a blatant misfit. The question the examiner must answer, at least intuitively, before the fit/no-fit classification can be made is just how well the pieces fit together. Fit is not a uniform degree of association that springs into existence exactly when a particular examiner is convinced that no other source could account for the complexity and extent of the fit. There is no such thing as “the strongest support.” One can always conceive of a situation with still stronger support (because a fracture or other separation of the pieces could generate an even richer set of irregularities in the edges).

The current approach of defining a physical fit as a single source for the pieces and calling everything else “no fit” does not create a vocabulary that judges or jurors will easily understand. A vocabulary in which physical congruency (fit) lies on a continuum — and that then addresses the inference that should be drawn from the observations — is more transparent.The definitions in the standards collapse the two steps of data acquisition and inference into one.

II. Inference: From Data to Conclusions

So how should examiners answer the question of how well the pieces fit together? An examination for fit yields multidimensional, spatial data. An examiner could present photographs of the aligned edges and surfaces and highlight the concordant and discordant features. Although the highlighting involves some interpretative thinking, I have called a courtroom presentation that stops at this point "features-only testimony." \3/ It is appropriate when examiners have no special expertise at interpreting how strongly their results support the same-source hypothesis. If they are no better than lay judges and jurors at discerning how improbable the features are in the hypothetical cases of repeatedly breaking the same object, it could be argued that these witnesses should not try to interpret the results any further. Such interpretation would not actually assist the trier of fact, as required by Federal Rule of Evidence 702.

For example, a few days ago, a forensic scientist told me of a case in which a criminalist was able to reassemble pieces of glass recovered at the site of a hit-and-run accident so that they fit neatly into the metal holder of a side rear mirror on the suspect’s car that was missing its glass. That’s good detective work, but did the criminalist have any special insights to offer into the obvious implications of this solution to the jigsaw puzzle? (The work was not presented in court because the crime laboratory’s management was concerned that there was no written protocol for pasting mirror fragments back in place. As the scientist observed, that's silly. The evidence practically speaks for itself, and its message is the same with or without a written protocol.)

Nevertheless, let’s assume that examiners do have specialized skill at interpreting the findings about the alignment of the features. The ASTM and OSAC-proposed standards ignore the possibility of a qualitative expression of relative support — for example, “It is far more likely to get the detailed alignment of the features I just showed you if the pieces were broken parts of the same objects than if they were from different objects.” Or, similarly, “The detailed alignment gives very strong support to the idea that the pieces broke off of the same object as opposed to two different objects.”

As Part I showed, the standards advocate a fit/no-fit classification in which “fit” is either a statement about the probability of the same-source hypothesis (that the pieces had to have come from the same object) or a statement of belief in the hypothesis (“my opinion is that they were together in the same object — that’s what makes it a physical fit). No-fit does not have a comparably sharp meaning. It could mean anything from no realistic possibility that the pieces were once contiguous parts of the same object to “partial fit features [that] increase the significance of the finding” (OSAC 2022-S-0015 § 13.2.4).

A more straightforward and comprehensible approach would be to have a three-tiered reporting scale for the support the data give to the same-source hypothesis. What is now called a physical fit would be designated a highly probative physical fit (that is, a physical fit that strongly supports the same-source hypothesis). “Partial fit features” would be described as a limited fit (that gives some support to the same-source hypothesis). Finally, an obvious mismatch could be called a misfit (which strongly supports the conclusion that the pieces were never adjacently located on the same object).

This tripartite classification is an imperfect way to express an underlying likelihood ratio formed from subjective probabilities. Whether better results would be achieved if analysts were forced to articulate their probabilities, either quantitatively or in the qualitative way mentioned earlier, is an interesting question. But the three-tiered reporting scale is closer to the current practice and seems feasible. \4/ It offers a framework for a better standard on reporting the results of a physical fit examination. Or so it seems to me — those who disagree are encouraged to hit the comment button.

NOTE

  1. But see Forensic Science’s Latest Proof of Uniqueness, Dec. 22, 2013, http://for-sci-law.blogspot.com/2013/12/forensic-sciences-latest-proof-of.html.
  2. The other commenters were Alyse Bertenthal, Amanda Black, Jennifer Friedman, Julia Leighton, Kate Philpott, Emily Prokesch, Matt Redle, Andrea Roth, Maneka Sinha, and Pate Skene.
  3. David H. Kaye et al., The New Wigmore on Evidence" Expert Evidence (2d ed. 2011).
  4. When there is a mismatch, testimony about a physical match has little value. Other features than the alignment of edges and surface markings will need to be studied if the expert is to shed light on whether the pieces came from a single object. The current and proposed standards are clear on this point.