Monday, August 14, 2017

PCAST's Review of Firearms Identification as Reported in the Press

According to the Washington Post,
The President’s Council of Advisors on Science and Technology [PCAST] said that only one valid study, funded by the Pentagon in 2014, established a likely error rate in firearms testing at no more than 1 in 46. Two less rigorous recent studies found a 1 in 20 error rate, the White House panel said. 1/
The impression that one might receive from such reporting is that errors (false positives? false negatives?) occur in about one case in every 20, or omaybe one in 40.

Previous postings have discussed the fact that a false-positive probability is not generally the probability that an examiner who reports an association is wrong. Here, I will indicate how well the numbers in the Washington Post correspond to statements from PCAST. Not all of them can be found in the section on "Firearms Analysis" (§ 5.5) in the September 2016 PCAST report, and there are other numbers provided in that section.

But First, Some Background

By way of background, the 2016 report observes that
AFTE’s “Theory of Identification as it Relates to Toolmarks”—which defines the criteria for making an identification—is circular. The “theory” states that an examiner may conclude that two items have a common origin if their marks are in “sufficient agreement,” where “sufficient agreement” is defined as the examiner being convinced that the items are extremely unlikely to have a different origin. In addition, the “theory” explicitly states that conclusions are subjective. 2/
A number of thoughtful forensic scientists agree that such criteria are opaque or circular. 3/ Despite its skepticism of the Association of Firearm and Tool Mark Examiners' criteria for deciding that components of ammunition come from a particular, known gun, PCAST acknowledged that
relatively recently ... its validity [has] been subjected to meaningful empirical testing. Over the past 15 years, the field has undertaken a number of studies that have sought to estimate the accuracy of examiners’ conclusions.
Unfortunately, PCAST finds almost all these studies inadequate. "While the results demonstrate that examiners can under some circumstances identify the source of fired ammunition, many of the studies were not appropriate for assessing scientific validity and estimating the reliability because they employed artificial designs that differ in important ways from the problems faced in casework." 4/ "Specially, many of the studies employ 'set-based' analyses, in which examiners are asked to perform all pairwise comparisons within or between small samples sets." Some of these studies -- namely, "closed-set" designs "may substantially underestimate the false positive rate." The only valid way to study validity and reliability, the report insists, is with experiments that require examiners to examine pairs of items in which the existence of a true association is independent of an association in each and every other pair.

The False-positive Error Rate in the One Valid Study

According to the Post, the "one valid study ... established a likely error rate in firearms testing at no more than 1 in 46." This sentence is correct. PCAST reported a "bound on rate" of "1 in 46." 5/ This figure is the upper bound of a one-sided 95% confidence interval. Of course, the "true" error rate -- the one that would exist if there were no random sampling error in the selection of examiners -- could be much larger than this upper bound. Or, it could be much smaller. 6/ The Post omits the statistically unbiased "estimated rate" of "1 in 66" given in the PCAST report.

The 1 in 20 False-positive Error Rate for "Less Rigorous Recent Studies"

The statement that "[t]wo less rigorous recent studies found a 1 in 20 error rate" seems even less complete. The report mentioned five other studies. Four "set-to-set/closed" studies suggested error rates of 1 in 5103 (1 in 1612 for the 95% upper bound). Presumably, the Post did not see fit to mention all the "less rigorous" studies because these closed-set studies were methodologically hopeless -- at least, that is the view of them expressed in .the PCAST report.

The Post's "1 in 20 figure" apparently came from PCAST's follow-up report of 2017. 7/ The addendum refers to a re-analysis of a 14-year-old study of eight FBI examiners co-authored by Stephen Bunch, who "offered an estimate of the number of truly independent comparisons in the study and concluded that the 95% upper confidence bound on the false-positive rate in his study was 4.3%." 8/ This must be one of the Post's "two less rigorous recent studies."  In the 2016 report, PCAST identified it as a "set-to-set/partly open" study with an "estimated rate" of 1 in 49 (1 in 21 for the 95% upper bound). 9/

The second "less rigorous" study is indeed more recent (2014). The 2016 report summarizes its findings as follows:
The study found 42 false positives among 995 conclusive examinations. The false positive rate was 4.2 percent (upper 95 percent confidence bound of 5.4 percent). The estimated rate corresponds to 1 error in 24 cases, with the upper bound indicating that the rate could be as high as 1 error in 18 cases. (Note: The paper observes that “in 35 of the erroneous identifications the participants appeared to have made a clerical error, but the authors could not determine this with certainty.” In validation studies, it is inappropriate to exclude errors in a post hoc manner (see Box 4). However, if these 35 errors were to be excluded, the false positive rate would be 0.7 percent (confidence interval 1.4 percent), with the upper bound corresponding to 1 error in 73 cases.) 10/
Another Summary

Questions of which studies count, how much they count, and what to make of their limitations are intrinsic to scientific literature reviews. Journalists limited to a few sentences hardly can be expected to capture all the nuances. Even so, a slightly more complete summary of the PCAST review might read as follows:
The President’s Council of Advisors on Science and Technology said that an adequate body of scientific studies does not yet exist.to show that toolmark examiners can associate discharged ammunition to a specific firearm with very high accuracy. Only one rigorous study with one type of gun, funded by the Defense Department, has been conducted. It found that examiners who reached firm conclusions made positive associations about 1 time in 66 when examining cartridge cases from different guns. Less rigorous studies have found both higher and lower false-positive error rates for conclusions of individual examiners, the White House panel said.
NOTES
  1. Spencer S. Hsu & Keith L. Alexander, Forensic Errors Trigger Reviews of D.C. Crime Lab Ballistics Unit Prosecutors Say, Wash. Post, Mar. 24, 2017.
  2. PCAST, at 104 (footnote omitted).
  3. See, e.g., Christophe Champod, Chris Lennard, Pierre Margot & Milutin Stoilovic, Fingerprints and Other Ridge Skin Impressions 71 (2016) (quoted in David H. Kaye, "The Mask Is Down": Fingerprints and Other Ridge Skin Impressions, Forensic Sci., Stat. & L., Aug. 11, 2017, http://for-sci-law.blogspot.com/2017/08/the-mask-is-down-fingerprints-and-other.html)
  4. PCAST, at 105.
  5. Id. at 111, tbl. 2.
  6. The authors of the study had this to say about the false-positive errors:
    [F]or the pool of participants used in this study the fraction of false positives was approximately 1%. The study was specifically designed to allow us to measure not simply a single number from a large number of comparisons, but also to provide statistical insight into the distribution and variability in false-positive error rates. The result is that we can tell that the overall fraction is not necessarily representative of a rate for each examiner in the pool. Instead, examination of the data shows that the rate is a highly heterogeneous mixture of a few examiners with higher rates and most examiners with much lower error rates. This finding does not mean that 1% of the time each examiner will make a false-positive error. Nor does it mean that 1% of the time laboratories or agencies would report false positives, since this study did not include standard or existing quality assurance procedures, such as peer review or blind reanalysis. What this result does suggest is that quality assurance is extremely important in firearms analysis and that an effective QA system must include the means to identify and correct issues with sufficient monitoring, proficiency testing, and checking in order to find false-positive errors that may be occurring at or below the rates observed in this study.
    David P. Baldwin, Stanley J. Bajic, Max Morris, and Daniel Zamzow, A Study of False-Positive and False-Negative Error Rates in Cartridge Case Comparisons, May 2016, at 18, available at https://www.ncjrs.gov/pdffiles1/nij/249874.pdf.
  7. PCAST, An Addendum to the PCAST Report on Forensic Science in Criminal Courts, Jan. 6, 201.
  8. Id. at 7.
  9. PCAST, at 111, tbl. 2
  10. Id. at 95 (footnote omitted).

Friday, August 11, 2017

"The Mask Is Down": Fingerprints and Other Ridge Skin Impressions

The mask is down, and this should lead to heated debates in the near future as many practitioners have not yet realized the earth-shattering nature of the changes. (Preface, at xi).
If you thought that fingerprint identification is a moribund and musty field, you should read the second edition of Fingerprints and Other Ridge Skin Impressions (FORSI for short), by Christophe Champod, Chris Lennard, Pierre Margot, and Milutin Stoilovic.

The first edition "observed a field that is in rapid progress on both detection and identification issues." (Preface 2003). In the ensuing 13 years, "the scientific literature in this area has exploded (over 1,000 publications) and the related professions have been shaken by errors, challenges by courts and other scientists, and changes of a fundamental nature related to previous claims of infallibility and absolute individualization." (Preface 2016, at xi).

The Scientific Method

From the outset, the authors -- all leading researchers in forensic science -- express dissatisfaction with "standard, shallow statements such as 'nature never repeats itself'" and "the tautological argument that every entity in nature is unique." (P. 1). They also dispute the claim, popular among latent print examiners, that the "ACE-V protocol" is a deeply "scientific method":
ACE-V is a useful mnemonic acronym that stands for analysis, comparison, evaluation, and verification ... . Although [ACE-V was] not originally named that way, pioneers in forensic science were already applying such a protocol (Heindi 1927; Locard 193]). ... Its. It is a protocol that does not, in itself give details as to how the inference is conducted. Most authors stay at this descriptive stage and leave the inferential or decision component of the process to "training and experience" without giving any more guidance as to how examiners arrive at their decisions. As rightly highlighted in the NRC report (National Research Council 2009, pp. 5-12): "ACE-V provides a broadly stated framework for conducting friction ridge analyses. However, this framework is not specific enough to qualify as a validated method for this type of analysis." Some have compared the steps of ACE-V to the steps of standard hypothesis testing, described generally as the "scientific method" (Wertheim 2000; Triplett and Cooney 2006; Reznicek et al. 2010: Brewer 2014). We agree that ACE-V reflects good forensic practice and that there is an element of peer review in the verification stage ... ; however, draping ACE-V with the term "scientific method" runs the risk of giving this acronym more weight than it deserves. (Pp. 34-35).
Indeed, it is hard to know what to make of claims that "standard hypothesis testing" is the "scientific method." Scientific thinking takes many forms, and the source of its spectacular successes is a set of norms and practices for inquiry and acceptance of theories that go beyond some general steps for qualitatively assessing how similar two objects are and what the degree of similarity implies about a possible association between the objects.

Exclusions as Probabilities

Many criminalists think of exclusions as logical deductions. They think, for example, that deductively valid reasoning shows that the same finger could not possibly be the source of two prints that are so radically different in some feature or features. I have always thought that exclusions are part of an inductive logical argument -- not, strictly speaking, a deductive one. 1/ However, FORSI points out that if the probability is zero that "the features in the mark and in the submitted print [are] in correspondence, meaning within tolerances, if these have come from the same source," then "an exclusion of common source is the obvious deductive conclusion ... ." (P. 71). This is correct. Within a Boolean logic (one in which the truth values of all propositions are 1 or 0), exclusions are deductions, and deductive arguments are certainly valid or invalid.

But the usual articulation of what constitutes an exclusion (with probability 1) does not withstand analysis. Every pair of images has some difference in every feature (even when the images come from the same source). How does the examiner know (with probability 1) that a difference "cannot be explained other than by the hypothesis of different sources"? (P. 70). In some forensic identification fields, the answer is that the difference must be "significant." 2/ But this is an evasion. As FORSI explains,
In practice, the difficulty lies in defining what a "significant difference" actually is (Thornton 1977). We could define "significant as being a clear difference that cannot be readily explained other than by a conclusion that the print and mark are from different sources. But it is a circular definition: Is it "significant" if one can cannot resolve it by another explanation than a different source or do we conclude to an exclusion because of the "significant" difference? (Page 71).
Fingerprint examiners have their own specialized vocabulary for characterizing differences in a pair of prints. FORSI defines the terms "exclusion" and "significant" by invoking a concept familiar (albeit unnecessary) in forensic DNA analysis -- the match window within which two measurements of what might be the same allele are said to match. In the fingerprint world, the analog seems to be "tolerance":
The terms used to discuss differences have varied over the years and can cause confusion (Leo 1998). The terminology is now more or less settled (SWGFAST 2013b). Dissimilarities are differences in appearance between two compared friction ridge areas from the same source, whereas discrepancy is the observation of friction ridge detail in one impression that does not exist in the corresponding area of another impression. In the United Kingdom, the term disagreement is also used for discrepancy and the term explainable difference for dissimilarity (Forensic Science Regulator 2015a).

A discrepancy is then a "significant" difference and arises when the compared features are declared to be "out of tolerance" for the examiner, tolerances as defined during the analysis. This ability to distinguish between dissimilarity (compatible to some degree with a common source) and discrepancy (meaning almost de facto different sources) is essential and relies mainly on the examiner's experience. ... The first key question ... then becomes ... :
Ql, How probable is it to observe the features in the mark and in the submitted print in correspondence. meaning within tolerances, if these have come from the same source? (P. 71).
The phrase "almost de facto different sources" is puzzling. "De facto" means in fact as opposed to in law. Whether a print that is just barely out of tolerance originated from the same finger always is a question of fact. I presume "almost de facto different sources" means the smallest point at which probability of being out of tolerance is so close to zero that we may as well round it off to exactly zero. An exclusion is thus a claim that it is practically impossible for the compared features to be out of tolerance when they are in an image from the same source.

But to insist that this probability is zero is to violate "Cromwell's Rule," as the late Dennis Lindley called the admonition to avoid probabilities of 0 or 1 for empirical claims. As long as there is a non-zero probability that the perceived "discrepancy" could somehow arise -- as there always is if only because every rule of biology could have a hitherto unknown exception -- deductive logic does not make an exclusion a logical certainty. Exclusions are probabilistic. So are "identifications" or "individualizations."

Inclusions as Probabilities

At the opposite pole from an exclusion is a categorical "identification" or "source attribution." Categorical exclusions are statements of probability -- the examiner is reporting "I don't see how these differences could exist for a common source" -- from which it follows that the hypothesis of a different source has a high probability (not that it is deductively certain to be true). Likewise, categorical "identifications" are statements of probability -- now the examiner is reporting "I don't see how all these features could be as similar as they are for different sources" -- from which it follows that the hypothesis of a common source has a high probability (not that it is certain to be true). This leaves a middle zone of inclusions in which the examiner is not confident enough to declare an identification or an exclusion and the examiner makes no effort to describe its probative value -- beyond saying "It is not conclusive proof of anything."

The idea that examiners report all-but-certain exclusions and all-but-certain inclusions ("identifications") has three problems. First, how should examiners get to these states of subjective near-certainty? Second, each report seemed to involve the probability of the observed features under only a single hypothesis -- different source for exclusions and same source for inclusions. Third, everything between the zones of near-certainty gets tossed in the dust bin.

I won't get into the first issue here, but I will note FORSI's treatment of the second two. FORSI seems to accept exclusions (in the sense of near-zero probabilities for the observations given the same-source hypothesis) as satisfactory; nevertheless, for inclusions, it urges examiners to consider the probability of the observations under both hypotheses. In doing so, it adopts a mixed perspective, using a match-window  p-value for the exclusion step and a likelihood ratio for an inclusion. Some relevant excerpts follow:
The above discussion has considered the main factors driving toward an exclusion (associated with question Q1; we should now move to the critical factor that will drive toward an identification, with this being the specificity of the corresponding features. ...

Considerable confusion exists among laymen, indeed also among fingerprint examiners, on the use of words such as match, unique, identical, same, and identity. Although the phrase "all fingerprints are unique" has been used to justify fingerprint identification opinions, it is no more than a statement of the obvious. Every entity is unique, nu because an entity can only be identical to itself. Thus, to say that "this mark and this print are identical to each other" is to invoke a profound misconception; the two might be indistinguishable, but they cannot be identical. In turn, the notion of "indistinguishability" is intimately related to the quantity and quality of detail that has been observed. This leads to distinguishing between the source variability derived from good-quality prints and the expressed variability in the mark, which can be partial, distorted, or blurred (Stoney 1989). Hence, once the examiner is confident that they cannot exclude, the only question that needs to be addressed is simply:
Q2. What is the probability of observing the features in the mark (given their tolerances) if the mark originates from an unknown individual?
If the ratio is calculated between the two probabilities associated with Ql. and Q2, we obtain what is called a likelihood ratio (LR). Ql becomes the numerator question and Q2 becomes the denominator question. ...

In a nutshell, the numerator is the probability of the observed features if the mark is from the POI, while the denominator is the probability of the observed features if the mark is from a different source. When viewed as a ratio, the strength of the observations is conveyed not only by the response to one or the other of the key questions, but by a balanced assessment of both. ... The LR is especially ... applies regardless of the type of forensic evidence considered and has been put at the core of evaluative reporting in forensic science (Willis 2015). The range of values for the LR is between 0 and infinity. A value of 1 indicates that the forensic findings are equally likely under either proposition and they do not help the case in one direction or the other. A value of 10,000, as an example, means that the forensic finding provides very strong support for the prosecution proposition (same source) as opposed to its alternative (the defense proposition—different sources). A value below 1 will strengthen the case in favor of the view that the mark is from a different source than the POI. The special case of exclusion is when the numerator of the LR is equal to 0, making the LR also equal to 0. Hence, the value of forensic findings is essentially a relative and conditional measure that helps move a case in one direction or the other depending on the magnitude of the LR. The explicit formalization of the problem in the form of a LR is not new in the area of fingerprinting and can be traced back to Stoney (1985). (P. 75)
In advocating a likelihood ratio (albeit one for an initial "exclusion" with poorly defined statistical properties), FORSI is at odds with the historical practice. This practice, as we saw, demands near certainty if an inclusion is to labelled an "identification" or an "exclusion." In the middle range, examiners "report 'inconclusive' without any other qualifiers of the weight to be assigned to the comparison." (P. 98). FORSI disapproves of this "peculiar state of affairs." (P. 99). It notes that
Examiners could, at times, resort to terms such as "consistent with, points consistent with," or "the investigated person cannot be excluded as the donor of the mark," but without offering any guidance as to the weight of evidence [see, for example, Maceo (2011a)]. In our view, these expressions are misleading. We object to information formulated in such broad terms that may be given more weight than is justified. These terms have been recently discouraged in the NRC report (National Research Council 2009) and by some courts (e.g., in England and Wales R v. Puacca [2005] EWCA Crim 3001). And this is not a new debate. As early as 1987, Brown and Cropp (1987) suggested to avoid using the expressions "match," "identical" and "consistent with."

There is a need to find appropriate ways to express the value of findings. The assignment of a likelihood ratio is appropriate. Resorting to the term "inconclusive" deprives the court of information that may be essential. (P. 99).
The Death of "Individualization" and the Sickness of "Individual Characteristics"

The leaders of the latent print community have all but abandoned the notion of "individualization" as a claim that one and one finger that ever existed could have left the particular print. (Judging from public comments to the National Commission on Forensic Science, however, individual examiners are still comfortable with such testimony.) FORSI explains:
In the fingerprint held, the term identification is often used synonymously with individualization. It represents a statement akin to certainty that a particular mark was made by the friction ridge skin of a particular person. ... Technically identification refers to the assignment of an entity to a specific group or label. whereas individualization represents the special case of identification when the group is of size 1. ... [Individualization] has been called the Earth population paradigm (Champod 2009b). ... Kaye (2009) refers to "universal individualization" relative to the entire world. But identification could also be made without referring to the Earth's population, referring instead to a smaller subset, for example, the members of a country, a city, or a community. In that context, Kaye talks about "local individualization" (relative to a proper subset). This distinction between "local" and "global" was used in two cases ... [W]e would recommend avoiding using the term "individualization." (P. 78).
The whole earth definition of "individualization" also underlies the hoary distinction in forensic science between "class" and "individual" characteristics. But a concatenation of class characteristics can be extremely rare and hence of similar probative value as putatively individual characteristics, and one cannot know a priori that "individual" characteristics are limited to a class of size 1. In the fingerprinting context, FORSI explains that
In the literature, specificity was often treated by distinguishing "class" characteristics from "individual" characteristics. Level 1 features would normally be referred to as class characteristics, whereas levels 2 and 3 deal with "individual" characteristics. That classification had a direct correlation with the subsequent decisions: only comparisons involving "individual" characteristics could lead to an identification conclusion. Unfortunately, the problem of specificity is more complex than this simple dichotomy. This distinction between "class" and "individual" characteristics is just a convenient, oversimplified way of describing specificity. Specificity is a measure on a continuum (probabilities range from 0 to 1, without steps) that can hardly be reduced to two categories without more nuances. The term individual characteristic is particularly misleading, as a concordance of one minutia (leaving aside any consideration of level 3 features) would hardly be considered as enough to identify The problem with this binary categorization is that it encourages the examiner to disregard the complete spectrum of feature specificity that ranges from low to high. It is proposed that specificity at each feature level be studied without any preconceived classification of its identification capability by itself Indeed, nothing should prevent a specific general pattern—such as, for example, an arch with continuous ridges from one side to the other (without any minutiae)—from being considered as extremely selective, since no such pattern has been observed to date. (P.74)
FORSI addresses many other topics -- errors, fraud, automated matching systems, probabilistic systems, chemical methods for detection of prints, and much more. Anyone concerned with latent-fingerprint evidence should read it. Those who do will see why the authors express views like these:
Over the years, the fingerprint community has fostered a state of laissez-faire that left most of the debate to the personal informed decisions of the examiner. This state manifests itself in the dubious terminology and semantics that are used by the profession at large ... . (P. 344).
We would recommend, however, a much more humble way of reporting this type of evidence to the decision maker. Fingerprint examiners should be encouraged to report all their associations by indicating the degree of support the mark provides in favor of an association. In that situation, the terms "identification" or "individualization" may disappear from reporting practices as we have suggested in this book. (P. 345).

Notes
  1. David H. Kaye, Are "Exclusions" Deductive and "Identifications" Merely Probabilistic?, Forensic Sci., Stat. & L., Apr. 28, 2017, http://for-sci-law.blogspot.com/2017/04/
  2. E.g., SWGMAT, Forensic Paint Analysis and Comparison Guidelines 3.2.9 (2000), available at https://drive.google.com/file/d/0B1RLIs_mYm7eaE5zOV8zQ2x5YmM/view

Saturday, August 5, 2017

Questions on a Bell krater and Certainty in Forensic Archaeology

IN THE NAME OF THE PEOPLE OF THE STATE OF NEW YORK
TO ANY LAW ENFORCEMENT OFFICER OR POLICE OFFICER OF NEW YORK

YOU ARE THEREFORE COMMANDED, between 6:00 a.m. and 9:00 p.m., to enter and to search the Metropolitan Museum of Art, 1000 Fifth Avenue, New York, NY 10028 (“the target premises”), for the above described property, and if you find such property or any part thereof, to bring it before the Court without unnecessary delay.
So reads a search warrant for
A Paestan Red-Figure B 11-Krater (a wide, round, open container used for holding wine at social events), attributed to Python from the 360 to 350 B.C., approximately 14 1/2 inches in diameter, and depicting the Greek god Dionysos in his youth with a woman on a cart being drawn by Papposilenos on one side and two youths standing between palmettes on the reverse side.
A New York court issued the warrant on July 24 to the District Attorney for New York County. The warrant seems to have been based on “photos and other evidence sent to them in May by a forensic archaeologist in Europe who has been tracking looted artifacts for more than a decade. The museum said that it hand-delivered the object to prosecutors the next day and anticipates that the vase, used in antiquity for mixing water and wine, will ultimately return to Italy.” 1/

The archaeologist, Christos Tsirogiannis, lists himself on LinkedIn as a research assistant at the Scottish Centre for Crime and Justice Research, University of Glasgow, and a forensic archaeologist and illicit antiquities researcher at the University of Cambridge. He contacted the New York district attorney’s office after the museum previously had notified Italian authorities, with no apparent effect, of the evidence that the Bell krater, as this type of container is called, had been looted from a grave in Southern Italy. Dr. Tsirogiannis compared photos on the museum’s website to “Polaroid photos shot between 1972 and 1995 that he said were seized ... in 1995” from storehouses of an Italian antiquities dealer convicted of conspiring to traffic in ancient treasures” to conclude “that the item was disinterred from a grave site in southern Italy by looters.” 2/

Dr. Tsirogiannis was asked about how he could be certain of his photographic identification in an interview on NPR's Morning Edition. His answer was “that’s my job” -- I've done it over a thousand times.
Transcript (excerpt), Morning Edition, Aug. 4, 2017, 5:07 AM ET

AILSA CHANG, HOST: So how did you first discover that this vase in the Met was an artifact looted from a grave in Italy in the 1970s?
CHRISTOS TSIROGIANNIS: I have granted official access to a confiscated archive of a convicted Italian dealer convicted for antiquities trafficking. And the archive is full of photographs, among which I discovered five depicting this particular object. And by comparing these images with the image that was at the Metropolitan Museum of Art website, I identified that it is the same object.
CHANG: How can you know for certain?
TSIROGIANNIS: That's my job. ... I'm a forensic archaeologist, and I am doing this for more than 10 years now, identifying 1,100 of antiquities in the same actual way.
CHANG: Eleven hundred stolen antiquities you have identified?
TSIROGIANNIS: So far.
The response is reminiscent of testimony heard over the years from forensic analysts of many types of trace evidence -- things like fingerprints, toolmarks, hair, shoeprints, and bitemarks. In those fields (which should not be regarded as equivalent), such assurances are much less acceptable today. The identification here could well be correct (although the previously convicted antiquities dealer staunchly denies it), but would it be objectionable because the procedure for comparing the photographs is subject to cognitive bias, lacks well-defined standards, and is not validated in studies of the accuracy with which forensic archaeologists match photographs of similar vases, and so on?

The vase surrendered by the museum certainly "vividly ... depicts Dionysus, god of the grape harvest, riding in a cart pulled by a satyr" and is attributed "to the Greek artist Python, considered one of the two greatest vase painters of his day." 3/ Are there be statistics on the distinctiveness of the designs on the various Bell kraters in use over 2,000 years ago or is each assumed to be visibly unique? How should the photographic evidence in such a case be presented in court?

Notes
  1. Tom Mashberg, Ancient Vase Seized From Met Museum on Suspicion It Was Looted, N.Y. Times, July 31, 2017 (printed as Vase, Thought to Be Looted, Is Seized From Met., N.Y. Times, Aug. 1, 2017, at A1).
  2. Id.
  3. Id.