Thursday, April 13, 2017

Fact Check: The National Commission on Forensic Science Vote That Wasn't

Forensic Magazine continues to report that a majority of the National Commission on Forensic Science voted in favor of its own dissolution. In a mostly recycled paragraph from an earlier article, 1/ its senior science writer, Seth Augenstein, wrote today that “the commission itself had voted against its own renewal at its January meeting, by a 16-15 vote.” 2/

The Commission never took any vote on whether it would be a good idea to extend the Commission's life. The question put to a vote was whether to include a statement to this effect in an historical document summarizing the activities of the Commission. 3/ The subject of the vote could not have much clearer. 4/ The meeting synopsis states
[A] vote was taken to determine whether this summary report should include a statement that the Commission should continue in its current form. As a business document a simple majority of 50% “yes” votes was required to approve inclusion of this statement. A total of 42% “yes” votes were received, and therefore no statement would be included regarding the continuation of the Commission. 5/
The precise question posed and the complete vote on it were as follows: 6/
Document or Vote Question Asked Total Votes # Yes # No # Abstain
Does the NCFS Summary Report include a sentence that NCFS continues in its current form? 38 16 15 7
  1. Seth Augenstein, Final Meeting of National Commission on Forensic Science ‘Reflects Back,’ Apr. 10, 2017, 11:59am, The paragraph stated that
    The NCFS produced 45 documents and recommendations in three years of work, which encompassed 600 public comments. But the commission itself had voted against its own renewal at its January meeting, by a 16-15 vote."
  2. Seth Augenstein, Even Without Forensic Commission, Forensic Science Overhaul Proceeds at OSAC, Apr. 13, 2017, 12:12pm, The latest paragraph states that
    The NCFS, by the end of its last meeting on Tuesday, produced 45 documents and recommendations in three years of work—many of which directed OSAC’s explorations into forensic disciplines. But the commission itself had voted against its own renewal at its January meeting, by a 16-15 vote. Sessions announced that it would not be renewed on Monday.
    The additions are also inaccurate. Very few of the NCFS Views documents and Recommendations documents seem to have "directed OSAC's explorations."
  3. Reflecting Back—Looking Toward the Future, Dec. 16, 2016 (draft),
  4. The discussion as recorded on the meeting webcast includes the following (with intervening speaker statements omitted without ellipses):
    HON. PAM KING: This is a business record ... of this particular Commission. ... This is a document that does not take any real position as to whether something should or should not be done. ...I did get some comments from Commissioners before this meetings ... One of the ones that I really would like to get some discussion on is [the] strong feelings among some Commissioners that maybe we do want to make a statement about whether or not this Commission should continue. ...
    JULIA LEIGHTON: I would not shy away from a recommendation ... I think to scrap it altogether ... is to give up on the work we’ve done.
    GERALD LAPORTE: So I don’t agree — disagree — with anything Julia has said. ... but I don’t know if we really are in a position to make a recommendation ... .
    ARTURO CASADEVALL: I want to support what Julia said. Commissions like this develop an institutional memory. ... I strongly think we should make a recommendation that something like this continue.
    S. JAMES GATES: [A]bsent a committee like this, I don’t see a consistent driver for making progress. ...
    MATTHEW REDLE: Whether it is this form or not, ... there ought to be more work done to continue the progress that we have made ...
    JULIA LEIGHTON: [W]e need a national body [with] the gravitas of being a nonpartisan federal advisory commission. ...
    HON. JED RAKOFF: ... I do think it is important that we say some something [to] indicate that we believe the Commission should continue. ...[J]udges do pay some attention to what this Commission says and does. So I think it plays a role there that is not played by other very wonderful groups ... and some very wonderful reports. I would very much strongly encourage that we have something in there ...
    WILLIAM THOMPSON: This Commission is uniquely well situated to address those [human factors] issues ... so I hope the Commission continues to address those kinds of questions ... .
    JULES EPSTEIN: So ... for this concluding portion ... yeah, we should keep going in some shape or form. ... [M]ore needs to be done. More constituencies will look to us than to other segregated constituencies. [T]he federal advisory commission should continue.
    WILLIE MAY: Certainly, I think that the Commission’s work is not completed. [I]t would serve the country very well to continue this ... .
  5. National Commission on Forensic Science Meeting #12, Jan. 9-10, 2017, at 6,
  6. Id. at 10.

Wednesday, April 12, 2017

Whither OSAC? NIST's Plans for Forensic Science Standards and Research As Told to the NCFS

The National Commission on Forensic Science held its thirteenth and final meeting on Monday. The second speaker to discuss some of the administration's plans for improving forensic science was the Acting Director of the National Institute of Standards and Technology (NIST), Dr. Kent Rochford. I edited and abridged the computer-generated transcript slightly. It includes a question from Commissioner Peter Neufeld. Finally, there is a question from Commissioner Jules Epstein to a Justice Department official about future funding for OSAC. I cannot promise riveting reading, but for anyone who wants to know what was said, here is most of it:
KENT ROCHFORD: I'd like to address the future of OSAC. OSAC was conceived under the 2013 MOU [Memorandum of Understanding] between NIST and the Department of Justice and established the Commission. The Department of Justice provides funding for the OSAC, which NIST cannot sustain on its own. The OSAC organization does not have term limits but does require funding to continue.

From the introduction of OSAC, NIST addressed the need to evolve and eventually spin off OSAC. We termed this “OSAC 2.0.” We have learned a lot from OSAC 1.0. Over the past years of operation, the organization has continued to mature as members of the group have come to a better appreciation of the standards development process. One example was seen in interested key researchers and scientists joining the FSSB [Forensic Science Standards Board]. Thank you for your assistance in supporting and strengthening the OSAC.

NIST is committed to improving OSAC, including the establishment of a clear model that will support these important goals. We are working to create a stable, sustainable operational model that provides independence from NIST. Internally a small group led by Rich Cavanaugh, who runs our special programs office, has been exploring model concepts for OSAC 2.0.

Each model is distinct yet consistent with the following goals: The new OSAC has to have a defined structure and authority. It needs to engage key stakeholders. We need to provide free access to our products. There has to be a smooth transition from the current OSAC that would create the potential for long-term sustainability. Currently, Rich's group has been looking at three models, exploring further. These involve creating federal and state partnerships that develop codes, standards, and model laws. Restructuring the OSAC so subcommittee functions are dispersed to standards development organizations, and the roles at the FSSB and SAC levels are changed to focus on quality of science and utility, respectively. And establishing a development and testing — a process we are starting, and we intend to engage the broader community to better understand the strengths and weaknesses of these possible approaches. So if you have questions about the OSAC 2.0, please reach out to Rich Cavanaugh.

I'd like to talk about NIST research efforts in forensic science. NIST remains committed to remaining its measurements and standards expertise to challenges in forensics. We played a role in strengthening forensic science since at least the 1920s. You may have seen the recent National Geographic article about William Souther, a physicist from NIST who played a role in numerous forensic cases during the 1930s, including the famous Lindbergh baby kidnapping case. The current forensic research focuses include, DNA, digital fingerprint evidence, ballistics, statistics, toxins, and trace evidence. We plan to continue working these research areas as funding is available to do so. You will see an example of how research expertise provides benefit to the forensic science community when [Dr. Elham Tabassi] talks to you about development of an ISO standard on method validation.

Let me turn to technical merit review. This past September, the President's Council of Advisors on Science and Technology recommended an expanded role for NIST in assessing the scientific foundations and maturity of various forensic disciplines. We recognize the need for and the value of such studies and are exploring ways to conduct work in this area. Without additional funding recommended by PCAST, NIST cannot make large-scale commitments to technical merit review. We are planning an exploratory study to address concerns raised by PCAST regarding DNA mixtures. This will likely involve assessing the scientific literature, developing a detailed plan for evaluating scientific validity that would include probabilistic genotyping, and assigning interlaboratory studies to measure forensic laboratory performance of DNA interpretation. These laboratory studies would build upon DNA mixture studies conducted in 2003, 2005 in 2013. NIST has a history of involving external partners in his research and standard efforts and anticipate external and internal and international collaboration.

In closing, I want to personally thank you for your efforts on this Commission and your commitment to strengthening forensic science through your participation in the activities of this group. Your work is made a difference, and we are grateful for your service to the nation. Thank you.
PETER NEUFELD: The second question for Kent, when you talked about things NIST was doing, you mentioned your current evaluation of DNA mixtures. Your predecessor stated in response to this Commission making a recommendation that NIST take on the task of making an evaluation of foundational validity and reliability of different forensic methods that they intended to do a trial. They were going to start a trial in three different areas, and the other two areas in addition to the DNA were ballistics and bitemarks. We have been told at each meeting leading up to this meeting NIST was going ahead with those trials. I noticed you only mentioned DNA. Is it still the position of NIST that they will go ahead with the trial of some ballistics and bitemarks?

KENT ROCHFORD: We still continue to do the work on ballistics and bitemarks. Given the resources we have, we're going to do the trials of the interlaboratory studies with the DNA mixtures first. Right now, the PCAST report provided a number of trials we should take on [and] it is also recommending the funding to do this. Given our current funding, we intend to start with the DNA programs. As funding may become available, we can wrap up these others areas to include trials. Currently we are doing the internal work but do not right now have the bandwidth to do the ballistics trials.
JULES EPSTEIN: Good morning. *** The other substantive question is, can I get clarification on OSAC? Is it now the status there is currently no further funding for OSAC?


JULES EPSTEIN: Can we understand what is in the pipeline or the projected longevity at this moment or sustainability?

ANSWER: Right now we don't have a budget and we are in a continuing resolution. We just don't know the status so I really can't predict what it will look like.

Two Misconceptions About the End of the National Commission on Forensic Science

Several days ago, the Justice Department (DOJ) announced the end of the National Commission on Forensic Science (NCFS). Initially established to advise the Department of Justice for a two-year period, the NCFS had had its charter extended once before, in April 2015. Attorney General Sessions declined to renew it a second time.

Several explanations for this decision could be offered: (1) the current administration is unreceptive to scientific knowledge and advice — alternate facts and alternate science are more appealing; (2) DOJ does not want outside advice on producing and presenting scientific evidence; (3) DOJ is tired of spending millions of dollars for advice from this particular group of lawyers, judges, administrators, forensic scientists, and others. (4) DOJ believes that NCFS has outlived its usefulness and there are better ways to obtain advice. And of course, some combination of these things might have been at work. I have no inside information, but I thought it might be helpful to collect some of what is publicly known if only to correct misconceptions about the Commission and its role vis-a-vis the Justice Department. I'll begin by noting two such misconceptions.

Misconception 1
NCFS Was Evaluating the Validity of Forensic Science Tests and Methods

One misconception is that the Commission was conducting independent evaluations of accepted methods in forensic science. Thus, the Associated Press reported that “National Association of Criminal Defense Lawyers ... President Barry Pollack said the commission was important because it allowed ‘unbiased expert evaluation of which techniques are scientifically valid and which are not.’” 1/ But not one of the 44 documents identified as “work products” on the NCFS website examines the validity of any technique.

The two documents that directly address “technical merit” are views and then recommendations 2/ about the need to study validity and reliability of techniques. No surprise there. More importantly, the documents underscore the importance of bringing what we might call "outsider" scientific expertise to bear in these efforts, and one of them contains pointed advice to other organizations. Specifically, a recommendation calls on NIST and its creation, the Organization of Scientific Area Committees for Forensic Science (OSAC), to reform the procedure OSAC uses to review and endorse standards for test methods. It states:
The Organization of Scientific Area Committees for Forensic Science (OSAC) leadership, the Forensic Science Standards Board (FSSB), should commit to placing consensus documentary standards on the OSAC Registry of Approved Standards for only those forensic science test methods and practices where technical merit has been established by NIST, or in the interim, established by an independent scientific body. An example of an interim independent scientific body could be an OSAC-created Technical Merit Resource Committee composed of measurement scientists and statisticians appointed by NIST and tasked with the evaluation of technical merit. 3/
This recommendation, by the way, has had limited impact. Yes, NIST has announced that it will do further research in a few areas such as DNA-mixture analysis. No, OSAC has not established a Resource Committee to check the technical merit of the documents that filter up from its subject-area committees and subcommittees.  4/

Rather than performing literature reviews (or promulgating scientific standards for forensic laboratories to follow), NCFS focused on broader issues of policy, needs, and legal reforms for generating or evaluating scientific evidence. This role for the Commission relates to a second misconception.

Misconception 2
NCFS Was a Worthless “Think Tank”

According to the Washington Post,
[T]he National District Attorneys Association, which represents prosecutors, applauded the end of the commission and called for it to be replaced by an Office of Forensic Science inside the Justice Department. Disagreements between crime lab practitioners and defense community representatives on the commission had reduced it to “a think tank,” yielding few accomplishments and wasted tax dollars, the association said. 5/
A press release from the NDAA does “applaud” the DOJ’s decision not to nonrenew the Commission, but not because the NCFS was a “think tank.” The group representing “2,500 elected and appointed District Attorneys across the United States, as well as 40,000 Assistant District Attorneys” complained that
The Commission lacked adequate representation from the state and local practitioner community, was dominated by the defense community, and failed to produce work products of significance for the forensic science community. Very few of the recommendations from the Commission were adopted and signed by the previous Attorney General during its existence. Those that were signed, such as universal accreditation, had already begun to develop organically within the forensic science community as accepted best practices, thus replicating ongoing work and wasting taxpayer dollars. 6/
I have not checked the percentage of recommendations “signed” by the Attorney General, but the Commission’s views documents never were intended to be signed by anyone, and the notion that only recommendations for specific action by the Attorney General benefit “the forensic science community” is shortsighted. Among the Commission’s documents of lasting value are the following:
  • Recommendation on Transparency of Quality Management System
  • Recommendation on Model Legislation for Medicolegal Death Investigation Systems
  • Views Document on Recognizing the Autonomy and Neutrality of Forensic Pathologists
  • Recommendations on Use of the Term “Reasonable Scientific Certainty”
  • Recommendation on Pretrial Discovery
  • Recommendations on Use of the Term “Reasonable Scientific Certainty”
  • Views Document on Judicial Vouching
  • Views Document on Ensuring that Forensic Analysis is Based Upon Task-Relevant Information
  • Views Document on Facilitating Research on Laboratory Performance
  • Views Document on Identifying and Evaluating Literature that Supports the Basic Principles of a Forensic Science Method or Forensic Science Discipline
It is instructive to compare the NDAA's dismissal of the "universal accreditation" recommendation  with the assessment of it by Associate Deputy Attorney General Andrew Goldsmith, who stated in his remarks at the final NCFS meeting that
[T]here is no single commission recommendation more important for the practice of forensic science than the recommendation regarding universal accreditation. I have been told the Department's decision to publicly announce the policy on accreditation and to mandate our prosecutors to rely on accredited labs when practicable has made a difference in laboratories and in moving to accreditation. These recommendations and the Department's review and implementation are a demonstration of the measurable impact of the work of this Commission ... .
Naturally, many of the ideas or actions that the Commission endorsed were not original. The idea of accreditation was prominent in the 2009 National Research Council (NRC) report on forensic science as well as NRC reports on DNA evidence in 1992 and 1996. NCFS was not a think tank, but a mixed bag of administrators, prosecutors, defenders, judges, law professors, police officials, laboratory scientists, medical examiners and coroners, research scientists, and other individuals. It could be criticized as wasteful — 13 meetings of 41 members (including the ex officio ones) plus an unlisted number of nonmembers appointed to subcommittees at a cost of millions of dollars for taxpayers (not to mention the opportunity costs to the volunteers). Consequently, it certainly is fair to ask how much additional benefit would have come from another two years of Commission life. 7/ But the Justice Department does not plan to halt all study of in-house forensic science reform. It has announced that some of it will continue via a newly created -- and surely not costless -- task force run by the incoming Deputy Attorney General. Given that plan, is the restructuring really an effort to save taxpayer money because of a perception that NCFS had reached the point of diminishing returns? Or is it a move to control the agenda and to modify the list of people who provide input? More on that later.

  1. Sadie Gurman, Sessions' Justice Dep't Will End Forensic Science Commission, AP News, Apr. 11, 2017,'-Justice-Dep't-will-end-forensic-science-commission
  2. NCFS often prepared two “work products” per topic for its recommendations — a preliminary “views” document followed by a final, more concrete  “recommendations” document. Consequently, the total number of its "work products" is a poor quantitative measure of its accomplishments.
  3. Recommendation to the Attorney General: Technical Merit Evaluation of Forensic Science Methods and Practices, Dec. 9, 2016, at 3,
  4. A more cheerful description of the response from NIST and OSAC can be found in a letter from the six research scientists (not forensic scientists) on the Commission pleading for a renewal of the charter.
  5. Spencer Hsu, Sessions Orders Justice Dept. To End Forensic Science Commission, Suspend Review Policy, Wash. Post, Apr. 10, 2017,
  6. NDAA, Press Release, National District Attorneys Association Applauds Expiration of National Commission on Forensic Science, Apr. 10, 2017,
  7. Erin Murphy, Op-ed, Sessions Is Wrong to Take Science Out of Forensic Science, N.Y. Times, Apr. 11, 2017, (asserting that NCFS "was even poised to issue a raft of best practices for the wild west of digital forensics, which has exploded without supervision over the years.")

Wednesday, March 29, 2017

After Moore v. Texas Is a Single IQ Score Really Determinative?

Bobby J. Moore has been on death row for the last 37 years. On Monday, the Supreme Court ruled that the Texas Court of Criminal Appeals (the state’s highest court for criminal cases) erred in finding that Moore is not intellectually disabled. Justice Ginsburg wrote for the five-member majority. The Chief Justice wrote a strong dissent for the other three justices. Neither opinion (on my quick reading at least) comes to grips with an obvious statistical principle—that combining information reduces uncertainty.

Moore v. Texas is the third case to try to clarify the rule in Atkins v. Virginia, 536 U.S. 304 (2002). There, the Supreme Court held that the Eighth Amendment’s Cruel and Unusual Punishment Clause prevents a state from executing an intellectually disabled offender, but it left the states with latitude in defining the disability. In Moore, the Court held that the Texas tribunal applied a medically outdated—and (hence?) constitutionally impermissible—standard in rejecting Moore’s claim of disability. Most of the majority opinion concerns “adaptive functioning,” which must be substantially impaired for a diagnosis of intellectual disability to be made.

However, the Court in Hall v. Florida, 572 U.S. __ (2014), allowed a state to refuse to inquire into adaptive functioning if an offender’s true IQ score is at least 70. Hall explicitly stated that the following statutory definition of intellectual disability was constitutionally acceptable:
“significantly subaverage general intellectual functioning existing concurrently with deficits in adaptive behavior and manifested during the period from conception to age 18,” where “significantly subaverage general intellectual functioning” is “performance that is two or more standard deviations from the mean score on a standardized intelligence test.”
Because IQ scores for the whole population are roughly normally distributed with a mean of approximately 100 and a standard deviation of about 15, Hall allows the state to execute offenders whose "true scores" are above 70.

In deciding whether a true score is above 70, Hall demanded that the state attend to the error of measurement. As the Moore Court, quoting from Hall, explained, "'[f]or purposes of most IQ tests,' [the] imprecision in the testing instrument 'means that an individual’s score is best understood as a range of scores on either side of the recorded score . . . within which one may say an individual’s true IQ score lies.'" For a single test with a standard error of 2.5 IQ points, it follows (for normally distributed errors) that the measured score must be greater than or equal to 75 (= 70 + two standard errors) to avoid "an unacceptable risk that persons with intellectual disability will be executed." 1/

But what about multiple scores? In that common situation, the Hall Court seemed conflicted. Justice Kennedy opaquely opined that “[e]ven when a person has taken multiple tests, each separate score must be assessed using the SEM, and the analysis of multiple IQ scores jointly is a complicated endeavor.” Does this mean that no matter how many IQ tests have been administered and no matter how many of them lie above 70, a single score of 75 or less makes a conclusive case for “significantly subaverage general intellectual functioning”?

From a statistical perspective, a lowest-single-score seems very strange indeed. If I want to know whether I have a fever and I take ten measurements of my temperature (with ten thermometers), I would not say that I have a fever just because one thermometer gives a high reading. I would use an average, and the mean temperature would have greater precision (smaller standard error) than the single highest reading of the ten.

Justice Ginsburg’s opinion in Moore seems to fly in the face of this common-sense statistical point. The Texas court focused on two test scores — "a 78 in 1973 and 74 in 1989." It pointed to factors that might have biased the latter score toward the low end, leaving the higher one as entitled to more weight. Specifically, it wrote that there was expert testimony that Moore might not have been putting much effort into answering the questions in the lower-scoring test, which was given to him in prison, and that he "also took the WAIS–R under adverse circumstances; he was on death row and facing the prospect of execution, and he had exhibited withdrawn and depressive behavior." Ex Parte Moore, 470 S.W.3d 481, 519 (Tex. Ct. Crim. App. 2015). Thus, the court concluded,
These considerations might tend to place his actual IQ in a somewhat higher portion of that 69 to 79 range. ... Considering these factors together, we find no reason to doubt that applicant's [higher] WAIS–R score accurately and fairly represented his intellectual functioning as being above the intellectually disabled range.
The Supreme Court assumed that it was necessary to consider each test in isolation and without making a clinical adjustment to the statistically determined plus-or-minus-five-point margin of error. Justice Ginsburg called the statistical range of error "clinically established." She described and condemned the Texas court's evaluation of the clinical testimony as follows:
Based on the two scores, but not on the lower portion of their ranges, the court concluded that Moore’s scores ranked “above the intellectually disabled range” (i.e., above 70). ... But the presence of other sources of imprecision in administering the test to a particular individual, cannot narrow the test-specific standard-error range. [W]e require that courts continue the inquiry and consider other evidence of intellectual disability where an individual’s IQ score, adjusted for the test’s standard error, falls within the clinically established range for intellectual-functioning deficits.
Thus, she insisted that just because "Moore’s score of 74, adjusted for the standard error of measurement, yields a range of 69 to 79" so that "the lower end ... falls at or below 70, the [Court of Criminal Appeals] had to move on to consider Moore’s adaptive functioning." (Emphasis added.)

In sum, Moore seems to say that a clinician cannot tinker with the statistical margin of error (two standard errors as constitutionalized in Hall). The dissent vigorously disagreed with this rule and maintained that the constitution permits states to make adjustments for individual circumstances that experts agree affect performance. A statistical argument for the dissent's position would be this: Computationally, the standard error reflects the variation in performance of a population of test-takers. This population-based figure is then applied to all individuals regardless of how strongly the sources of error apply to them. IQ tests administered in prison to inmates exhibiting signs of depression may not be part of that population. Those scores might have a larger or a smaller standard error, and they are generally lower than the true score for a person taking the test in normal circumstances. In other words, the clinician is not modifying the margin of error as much as adjusting the entire estimate upward.

This analysis does not necessarily render Moore's rule legally faulty. It might be undesirable to give clinicians this latitude to adjust scores. Under the majority's approach to the Eighth Amendment, the issue becomes whether the clinical guidelines for diagnosing disability allow individualized modifications of the statistical rule. The guidelines discussed in Hall are not completely clear. The dissent reads them as requiring an expert or a court to take the usual standard error seriously in interpreting an IQ score, but permitting reasoned and reasonable departures from them.

Even if Moore forbids individual adjustments to the statistical rule of plus-or-minus two standards errors (for the general population), why allow the confidence interval for a single test score to be dispositive when multiple tests scores are present? The clinical guidelines do not mandate this rule, and it is not so obvious that Moore does. Texas apparently made no effort to combine the two scores into a single point estimate with a margin of error applicable to the combined statistic. Hall claimed that combining scores from different IQ test forms was "complicated," although the literature it cited gave a simple procedure for doing so. So neither Hall nor Moore can be said to firmly establish that an appropriately averaged score is impermissible. After all, neither case presented the Court with an interval estimate for the true IQ score derived from multiple scores by an accepted statistical procedure, and the many-thermometer example given above illustrates the statistical deficiency in a rule that looks to every measured IQ score in isolation.

The single-score-too-low rule bends over backward to avoid misclassifying a disabled offender as normal. The rule might be defended on exactly that ground. But that is not the logic of Moore, which only asks what clinical guidelines for interpreting IQ scores allow. Moreover, if the real objective is make determinations of intellectual disability as fully informed as possible, it would seem more direct just to demand the inquiry into adaptive functioning along with IQ scores in all cases. On the other hand, if true IQ scores matter as a threshold to a richer inquiry into both intellectual and adaptive functioning, then statistically sound procedures for integrating all the IQ test results ought to be followed.

Further reading: David H. Kaye, Deadly Statistics: Quantifying an "Unacceptable Risk" in Capital Punishment, 15 Law, Probability & Risk __ (2017).

  1. If the standard error were substantially less than 2.5, then the measured score would not have to be all five points above 70. The use of two standard errors also is on the high side; 1.96 standard errors provides 95% coverage. Justice Kennedy's opinion in Hall was not as clear as it should have been on these points, but this is the only interpretation consistent with the concept of confidence intervals and standard errors used in the opinion. In Bromfield v. Cain, 135 S.Ct. 2269 (2015), however, the Court wrote that after "[a]ccounting for this margin of error, Brumfield's reported IQ test result of 75 was squarely in the range of potential intellectual disability." Id. at 2278. The Court did not disclose the standard error of measurement for the test.

Friday, February 3, 2017

Connecticut Trial Court Deems PCAST Report on Footwear Mark Evidence Inapplicable and Unpersuasive

In an unpublished (but rather elaborate) opinion, a trial court in Connecticut found no merit in a motion “to preclude admission of footwear comparison evidence relative to footwear found on Wolfe Road in Warren, Connecticut and footprints found at the residence where the victim was killed.” State v. Patel, No. LLICR130143598S (Conn. Super. Ct., Dec. 28, 2016). The court did not describe the case or the footwear evidence, but its opinion responded to the claim of defendant Hiral Patel that “the scientific community has rejected the validity of the footwear comparison proposed by the state.” Judge John A. Danaher III was unimpressed by Patel's reliance on
a September 2016 report by the President's Council of Advisors on Science and Technology [stating] that ‘there are no appropriate empirical studies to support the foundational validity of footwear analysis to associate shoeprints with particular shoes based on specific identifying marks (sometimes called randomly 'randomly [sic] acquired characteristics'). Such conclusions are unsupported by any meaningful evidence or estimates of their accuracy and thus are not scientifically valid.’
The court reasoned that the state had no need to prove that the “expert testimony ... albeit scientific in nature” was based on a scientifically validated procedure because the physical comparison was “neither scientifically obscure nor instilled with 'aura of mystic infallibility' ... which merely places a jury ... in in [sic] a position to weigh the probative value of the testimony without abandoning common sense and sacrificing independent judgment to the expert's assertions.” Patel (quoting Maher v. Quest Diagnostics, Inc., 269 Conn. 154, 170-71 n.22, 847 A.2d 978 (2004)).

But the Superior Court did not stop here. Judge Danaher wrote that the President’s Council (PCAST) lacked relevant scientific expertise, and their skepticism did not alter the fact that courts previously had approved of “the ACE-V method under Daubert for footwear and fingerprint impressions.” He declared that "[t]here is no basis on which this court can conclude, as the defendant would have it, that the PCAST report constitutes 'the scientific community.'" These words might mean that the relevant scientific community disagrees with the Council that footwear-mark comparisons purporting to associate a particular shoe with a questioned impression lack adequate scientific validation. Other scientists might disagree either because they do not demand the same type or level of validation, or because they find the existing research satisfies PCAST's more demanding standards. The former is more plausible than the latter, but it is not clear which possibility the court accepted as true.

To reject the PCAST Report's negative finding, Judge Danaher relied exclusively on the testimony of “Lisa Ragaza, MSFS, CFWE, a ‘forensic science examiner 1’ ... who holds a B.S. degree from Tufts University and an M.S. degree from the University of New Haven.” What did the forensic-science examiner say to support the conclusion that PCAST erred in its determination that no adequate body of scientific research supports the accuracy of examiner judgments? To begin with,
Ms. Ragaza testified that, in her opinion, footwear comparison analysis is generally accepted in the relevant scientific community. She testified that such evidence has been admitted in 48 or 49 of the 50 states in the United States, in many European countries, and also in India and China. In fact, she testified, such analyses have been admitted in United States courts since the 1930s, although she is also aware that one such analysis was carried out in Scotland as early as 1786.
It seems odd to have forensic examiners instruct the court in the law. That the courts in these jurisdictions (not all of which even require a showing of scientific validity) admit the testimony of footwear analysts that a given shoe is the source of a mark says little about the extent to which these judgments have been subjected to scientific testing. As a committee of the National Academy of Sciences reported in 2009, “Daubert has done little to improve the use of forensic science evidence in criminal cases.” NRC Committee on Strengthening Forensic Science in the United States, Strengthening Forensic Science in the United States: A Path Forward 106 (2009). Instead, “courts often ‘affirm admissibility citing earlier decisions rather than facts established at a hearing.’” Id. at 107.

Ms. Ragazza testified that there are numerous treatises and journals, published in different parts of the world, on the topic of footwear comparison analysis. She testified that there have been studies relative to the statistical likelihood of randomly acquired characteristics appearing in various footwear.
But the existence of “treatises and journals” — including what the NAS Committee called “trade journals,” id. at 150 — does not begin to contradict PCAST’s conclusion about the dearth of studies of the accuracy of examiner judgments. PCAST commented (pp. 116-17) on one of the “studies relative to the statistical likelihood”:
a mathematical model by Stone that claims that the chance is 1 in 16,000 that two shoes would share one identifying characteristics and 1 in 683 billion that they would share three characteristics. Such claims for “identification” based on footwear analysis are breathtaking—but lack scientific foundation. ... The model by Stone is entirely theoretical: it makes many unsupported assumptions (about the frequency and statistical independence of marks) that it does not test in any way.
Ms. Ragazza testified that her work is subject to peer review, including having a second trained examiner carry out a blind review of each analysis that she does. In response to the defendant's question as to whether such reviews have ever resulted in the second reviewer concluding that Ms. Ragazza had carried out an erroneous analysis, she responded that there were no such instances. Most of her work is not done in preparation for litigation. It is frequently done for investigative purposes and may be used to inculpate, but also exculpate, an individual. She indicated that the forensic laboratory carries out its analyses for both prosecutors and defense counsel.
Verification of an examiner’s conclusion by another examiner is a good thing, but it does almost nothing to establish the validity of the examination process. Making sure that two readers of tea leaves agree in their predictions does not validate tea reading (although it could offer data on measurement reliability, which is necessary for validity).

Ms. Ragazza explained how footwear comparison analysis is carried out, using a protocol known as ACE-V, and employing magnifiers and/or microscopes.
Plainly, this misses the point. If tea reading were expanded to include magnifiers and microscopes, that would not make it more valid. (Actually, I believe that footwear-mark comparisons based on “randomly acquired characteristics” are a lot better than tea reading, but I still am searching for the scientific studies that let us know how much better.)

Ms. Ragazza does not agree with the PCAST report because, in her view, that report did not take into account all of the available research on the issue of footwear comparison evidence.
Maybe there is something to this complaint, but what validity studies does the PCAST report overlook? The Supporting Documentation for Department of Justice Proposed Uniform Language for Testimony and Reports for the Forensic Footwear and Tire Impression Discipline (2016) begins “The origin of the principles used in the forensic analysis of footwear and tire impression evidence dates back to when man began hunting animals.” But the issue the PCAST Report addresses is not whether a primitive hunter can distinguish between the tracks of an elephant and a tiger. It is the accuracy with which modern forensic fact hunters can identify the specific origin of a shoeprint or a tire tread impression. If Ms. Ragazza provided the court with studies of this particular issue that would produce a different conclusion about the extent of the validation research reported on in both the NRC and PCAST reports, the court did not see fit to list them in the opinion.

A footnote to the claim that "an examiner can identify a specific item of footwear/tire as the source of the footwear/tire impression" can be found in the Justice Department document mentioned above. This note (#12) lists the following publications:
  1. Cassidy, M.J. Footwear Identification. Canadian Government Publishing Centre: Ottawa, Canada, 1980, pp. 98-108; 
  2. Adair, T., et al. (2007). The Mount Bierstadt Study: An Experiment in Unique Damage Formation in Footwear. Journal of Forensic Identification 57(2): 199-205; 
  3. Banks, R., et al. Evaluation of the Random Nature of Acquired Marks on Footwear Outsoles. Research presented at Impression & Pattern Evidence Symposium, August 4, 2010, Clearwater, FL;
  4. Stone, R. (2006). Footwear Examinations: Mathematical Probabilities of Theoretical Individual Characteristics. Journal of Forensic Identification 56(4): 577-599;
  5. Wilson, H. (2012). Comparison of the Individual Characteristics in the Outsoles of Thirty-Nine Pairs of Adidas Supernova Classic Shoes. Journal of Forensic Identification 62(3): 194-203.
I wish I could say that I have read these books and papers. At the moment, I can only surmise their contents from the titles and places of publication, but I would be surprised if any of them contains an empirical study of the accuracy of footwear-mark examiners’ source attributions. (If my guess is wrong, I hope to hear about it.)

She testified that, to her knowledge, the PCAST members did not include among their membership any forensic footwear examiners.
It's true. The President's Council of Advisors on Science and Technology does not include footwear examiners. But would we say that only tea-leaf readers are able to judge whether there have been scientific studies of the validity of tea-leaf reading? That only polygraphers are capable of determining whether the polygraph is a valid lie detector? That only pathologists can ascertain whether an established histological test for cancer is accurate?

PCAST's conclusion was that no direct experiments currently establish the sensitivity and specificity of footwear-mark identification. In the absence of a single counter-example from the opinion, that conclusion seems sound. But the legal problem is whether to accept the PCAST report's premise that this information is essential to admissibility of footwear evidence under the standard for scientific expert testimony codified in Federal Rule of Evidence 702. Is it true, as a matter of law (or science), that only a large number of so-called black box studies with large samples can demonstrate the scientific validity of subjective identification methods or that the absence of precisely known error probabilities as derived from these experiments dictates exclusion? I fear that the PCAST report is too limited in its criteria for establishing the requisite scientific validity for forensic identification techniques, for there are other ways to test examiner performance and to estimate error rates.  But however one comes out on such details, the need for courts to demand substantial empirical as well as theoretical studies that demonstrate the validity and quantify the risks of errors in using these methods remains paramount.

Although Patel is merely one unpublished pretrial ruling with no precedential value, the case indicates that defense counsel cannot just cite the conclusions of the PCAST report and expect judges to exclude familiar types of evidence. They need to convince courts that "the reliability requirements" for scientific evidence include empirical proof that a technique actually works as advertised. Then the parties can focus on whether PCAST's assessments of the literature omit or give too little weight to studies that would warrant different conclusions. Broadbrush references to "treatises and journals" and a history of judicial acceptance should not be enough to counter PCAST's findings of important gaps in the research base of a forensic identification method.

Wednesday, January 25, 2017

Statistics for Making Sense of Forensic Genetics

The European Forensic Genetics Network of Excellence (EUROFORGEN-NoE) is a group of “16 partners from 9 countries including leading groups in European forensic genetic research.” In 2016, it approached Sense About Science — “an independent charity that challenges misrepresentation of science and evidence in public life” — to prepare and disseminate a guide to DNA evidence. Within the year, the guide, entitled Making Sense of Forensic Genetics, emerged. The 40-page document has a long list of “contributors,” who, presumably, are its authors. According to EUROFORGEN-NoE, it is “designed to introduce professional and public audiences to the use of DNA in criminal investigations; to understand what DNA can and can’t tell us about a crime, and what the current and future uses of DNA analysis in the criminal justice system might be.”

By and large, it accomplishes this goal, offering well informed comments and cautions for the general public. Some of the remarks about probabilities and statistics, however, are not as well developed as they could be. The points worth noting have more to do with clarity of expression than with any outright errors.

Statistics do not arise in a vacuum. Proper interpretation requires some understanding of how they came to be produced. Thus, Making Sense correctly observes that:
DNA evidence has a number of limitations: it might be undetectable, overlooked, or found in such minute traces as to make interpretation difficult. Its analysis is subject to error and bias. Additionally, DNA profiles can be misinterpreted, and their importance exaggerated, as illustrated by the wrongful arrest of a British man, ... . Even if DNA is detected at a crime scene, this doesn’t establish guilt. Accordingly, DNA needs to be viewed within a framework of other evidence, rather than as a standalone answer to solving crimes.
With respect to the narrow question of whether two DNA samples originate from the same individual, Making Sense asks, “So what is the chance that your DNA will match that of someone else?” An ambiguity lurks in this question. Does it refer to probability of a matching profile somewhere in the population, or to the probability of a matching profile in  a single, randomly selected individual? Apparently, the authors have the latter question in mind, for Making Sense explains that
It depends on how many locations in the DNA (loci) you look at. If a forensic scientist looked at just one locus, the probability of this matching the same marker in another individual would be relatively high (between 1 in 20 and 1 in 100). ... Since European police forces today typically analyse STRs at 16 or more loci, the probability that two full DNA profiles match by chance is miniscule — in the region of 1 in 10 with 16 zeros after it (or 1 in 100 million billion). ... Although in the UK court, the statistics are always capped at 1 in a billion.
The 1-in-a-billion cap is not seen in the United States, where laboratories toss about estimates in the quintillionths, septillionths, and so on (and on). (Could this be an instance of “America First”?) The naive reader might be forgiven for thinking that when the probability of the same match to a randomly selected individual is far less than 1 in a billion, an analyst could conclude that the recovered DNA is either from the defendant or a close relative. But Making Sense rejects this thought, insisting that “DNA doesn’t give a simple ‘yes’ or ‘no’ answer.”

The explanation for its position is muddled. First, the report repeats that “with information available for all 16 markers, ... the risk of DNA retrieved from a crime scene matching someone unrelated to the true source is extremely low (less than 1 in a billion, and often many orders of magnitude lower than this).” So why is not this good enough for a “yes or no answer”? The hesitation, as expressed, is that
However, many of the DNA profiles retrieved from crime scenes aren’t full DNA profiles because they’re missing some genetic markers or there is a mixture of DNA from two or more people. So was it the suspect who left their DNA at the crime scene? The DNA evidence won’t give a ‘yes’ or ‘no’ answer: it can only ever be expressed in terms of probability.
But the conclusion that “it can only ever be expressed in terms of probability” is a non sequitur. The only thing that follows from the fact that not all crime-scene DNA samples lead to 16-locus profiles is that matches to the samples with less complete profiles are less convincing than matches to the samples with more complete profiles.

Of course, there is a sense in which all DNA evidence only gives rise to probabilities, and never to categorical conclusions. All empirical evidence only gives probable conclusions rather than certainties. Furthermore, it has been argued that forensic scientists should eschew source attributions because their expertise is limited to evaluating likelihoods — the probability of the match given that the sample came from a named individual and the probability given that it came from a different individual (or individuals). But that is not what Making Sense seems to be saying when declares yes-and-no answers impossible. The limits on all empirical knowledge and the role of an expert witness do not produce any line between 16-locus matches and less-than-16-locus matches.

Making Sense also points out that
[T]he match probability ... must not be confused (but often is) with how likely the person is to be innocent of the crime. For example, if a DNA profile from the crime scene matches the suspect’s DNA and the probability of such a match is 1 in 100 million if the DNA came from someone else, this does not mean that the chance of the suspect being innocent is 1 in 100 million. This serious misinterpretation is known as the prosecutor’s fallacy.
Conceptually, this transposition is a “serious misinterpretation,” but whether the correct inverse probability (one that is based on a prior probability and a Bayes factor on the order 100 million) gives a markedly different value is far from obvious. See David H. Kaye, The Interpretation of DNA Evidence: A Case Study in Probabilities, in Making Science-based Policy Decisions: Resources for the Education of Professional School Students, Nat'l Academies of Science, Engineering, and Medicine Committee on Preparing the Next Generation of Policy Makers for Science-Based Decisions ed., Washington, DC, 2016.

A reasonable approach is to have analysts present the two pertinent conditional probabilities mentioned above (the “likelihoods”) to explain how strongly the profiles support one hypothesis over the other. Making Sense refers to this approach in some detail, but it suggests that it is needed only “in more complex cases, such as mixtures of two or more individuals, or when there might be contamination by DNA in the environment.” Compared to the alternative ways to explain the implications of DNA and other trace evidence, however, the approach is more widely applicable.

Monday, January 9, 2017

If You Are Going To Do a “DNA Dragnet,” Cast the Net Widely

Police in Rockingham County, North Carolina, took a circuitous path to identify the killer of a couple who were shot to death in their home in Reidsville, NC. They utilized a “DNA dragnet,” kinship analysis, ancestry analysis, and DNA phenotyping to conclude that the killer was the brother-in-law of the daughter of slain couple. Had the initial DNA collection been slightly more complete, that effort alone would have sufficed.

The evidence that led to the man ultimately convicted of the double homicide were drops of the killer's blood:
Parabon Nanolabs, The French Homicides, Jan. 4, 2017 [hereinafter Parabon]

In the early hours of 4 Feb 2012, Troy and LaDonna French were gunned down in their home in Reidsville, NC. The couple awoke to screams from their 19-year old daughter, Whitley, who had detected the presence of a male intruder in her second floor room. As they rushed from their downstairs bedroom to aid their daughter, the intruder attempted to quiet the girl with threats at knifepoint. Failing this, he released Whitley and raced down the stairs. After swapping his knife for the handgun in his pocket, he opened fire on the couple as they approached the stairwell. During his escape, the perpetrator left a few drops of his blood on the handrail, apparently the result of mishandling his knife. ...
Seth Augenstein, Parabon’s DNA Phenotyping Had Crucial Role in North Carolina Double-Murder Arrest, Conviction, Forensic Mag., Jan. 5, 2017 [hereinafter Augenstein]

A couple were gunned down by an intruder in their North Carolina home in the early hours of Feb. 4, 2012. The teenaged daughter had seen the hooded gunman, when he had briefly held a knife to her throat, but she could apparently not describe him to cops. The attacker left several drops of blood on a handrail as he fled, apparently self-inflicted from his blade.
At a press conference, Sheriff Sam Page announced that "You can run, but you can’t hide from your DNA." Danielle Battaglia, Blood on the Stairs, News & Record, Apr. 14, 2016 [hereinafter Battaglia]. But efforts to follow the DNA seemed to lead nowhere.
Running short of leads, investigators began collecting DNA samples from anyone thought to have been in or around the French home. "We swabbed a lot of people," says Captain Tammi Howell of the RCSO. "Early on, if there was a remote chance someone could have been connected to the crime, we asked for a swab." In the first 12 months following the crime, over 50 subjects consented to provide a DNA sample. None of the samples matched the perpetrator.
"We swabbed a lot of people," said Capt. Tammi Howell, of the Rockingham County Sheriff’s Office, who led the investigation. "Early on, if there was a remote chance someone could have been connected to the crime, we asked for a swab." Those swabs produced no hits.
In particular, this screening of possible sources in the county eliminated "Whitley, her brother, and her boyfriend at the time, John Alvarez." Parabon. But police did not include Alvarez's father or his three brothers in their dragnet search, and when "[a]nalysts uploaded profiles of the blood drops and the skin fragments along with a sample from Whitley French into a database of known samples maintained by the FBI, [t]hey found no match." Battaglia. (According to Forensic Magazine, "the killer was not in any of the public databases," but law enforcement DNA databases are not public.)

There is some confusion in the accounts of what happened next.
The first break in the case came when familial DNA testing, performed at the University of North Texas, revealed the possibility that the perpetrator might be related to John Alvarez, Whitley's boyfriend. Because traditional DNA testing is limited in its ability to detect all but the closest relationships (e.g., parent-child), this report alone did not provide actionable information. Subsequently, scientists at the University of North Texas performed Y-chromosome STR analysis, which tests whether two male DNA samples share a common paternal lineage. This analysis, however, showed that the perpetrator did not share a Y-STR lineage with John Alvarez, seemingly eliminating John's father and brother as possible suspects.
Further analysis then indicated that the daughter’s boyfriend, John Alvarez (who had given a swab), could be related to the killer. But it was only a possible relationship, since the STR did not definitively say whether the killer and the boyfriend shared ancestry.
The partial DNA matching led to a Y-STR analysis. The short-tandem repeat on the Y chromosome shows paternal links between fathers, sons and brothers, and has produced huge breakthroughs in cases like the Los Angeles serial killer Lonnie Franklin, Jr., infamously dubbed the “Grim Sleeper.” But in the Sleeper and other cases used “familial searching,” or “FS,” a painstaking and somewhat controversial process of combing large state and national databases like CODIS to find partial DNA matches eventually leading to a suspect. FS was not used in the Rockingham County case, where they had a limited pool of suspects.
Investigators then decided to send the DNA samples out of state for what the warrant called “familial DNA testing,” a type of analysis that allows scientists to match DNA samples to a parent, child or sibling. According to warrants, the samples were sent to the Center for Human Identification at the University of North Texas in Denton. But they do not appear to have gone to that lab. And Rockingham County District Attorney Craig Blitzer said that although a lab did the familial DNA test, it was not North Texas. He declined to say where it was done.
The term "familial searching" has no well-established scientific meaning. As explained in David H. Kaye, The Genealogy Detectives: A Constitutional Analysis of “Familial Searching”, 51 Am. Crim. L. Rev. 109 (2013), kinship testing of possible parents, children, and siblings can be done with the usual autosomal STR loci used for criminal forensic investigation. When this technique is applied to a database (local, state, or national), it sometimes reveals that the crime-scene DNA matches no one in the database but is a near miss to someone -- a near miss in such a way as to suggest the possibility that the source of the crime-scene sample is a brother, son, or parent of the nearly-matching individual represented in the database. In other words, "familial searching" is the process of trawling a database for possible matches to people outside of the database -- "outer-directed trawling," for short.

The Rockingham case evidently involved a conventional but fruitless database search ("inner-directed trawling") followed by testing -- in Texas or somewhere else -- to ascertain whether it was plausible that a close relative of the boyfriend was the source of the blood. Based on the autosomal STRs, this seemed to be the case. However, the laboratory threw a monkey wrench into the investigation when it reported that Y-STRs in the boyfriend's DNA did not match the blood DNA. Because Y-STRs are inherited (usually unchanged) from father to son, this additional finding seemed to exclude the untested father and brothers of the boyfriend.

But the social and familial understanding of a family tree does not always correspond to a biological family tree. It is not unheard of for genetic tests for parentage to reveal unexpected cases of illegitimate children. A man and child who believe that they are father and son may be mistaken. Genetic genealogists like to call the phenomenon of misattributed paternity a Non-Paternity Event, or NPE.

Thinking that the male members of the immediate Alvarez family had to be innocent, police were stymied. They turned to Parabon Nanolabs in Reston, Virginia.
[For $3,500, the lab,] starting with 30 ng of DNA, ... genotype[d] over 850,000 SNPs from the sample, with an overall call rate of 98.9% [and advised the police that the blood probably came from a man with] fair or very fair skin, brown or hazel eyes, dark hair, and little evidence of freckling, ... a wide facial structure and non-protruding nose and chin, and ... admixed ancestry, a roughly 50-50 combination of European and Latino ancestry consistent with that observed in individuals with one European and one Latino parent. ... "The Snapshot ancestry analysis and phenotype predictions suggested we should not eliminate José as a suspect, despite the Y-STR results," said Detective Marshall. "The likeness of the Snapshot composite with his driver's license photograph is quite striking."

From approximately 30 nanograms of DNA, the software genotyped approximately 850,000 single-nucleotide polymorphisms, or SNPs, at a call rate of 98.9 percent. In this case, the blood showed the killer to be someone with mixed ancestry – apparently someone with one European and one Latino parent. ... "The Snapshot ancestry analysis and phenotype predictions suggested we should not eliminate Jose (Jr.) as a suspect, despite the Y-STR results," said Det. Marcus Marshall, the lead investigator on the case. "The likeness of the Snapshot composite with his driver’s license photograph is quite striking."
At this time, Parabon proudly juxtaposes the "Snapshot Composite Profile and a photo of José Alvarez, Jr., taken at the time of his arrest" on its website.(and shown below). One of the more intriguing (genetically associated?) similarities is the five o'clock shadow.
Snapshot™ Composite Profile for Case #3999837068, Rockingham County, NC Sheriff's Office

It also would be interesting to know how "confidence" in skin color and other phenotypes is computed. In any event, with this report, police finally obtained DNA samples by consent from the father, José Alvarez Sr., José Alvarez Jr., and Elaine Alvarez, the mother. Analysis indicated misattributed paternity -- and a conventional STR match of the DNA in the bloodstains. As a result,
José Alvarez Jr. was arrested on 25 Aug 2015 on two counts of capital murder. He later pled guilty to both murders and on 8 Jul 2016 was sentenced to two consecutive life sentences without the possibility of parole.
Jose Alvarez, Jr., was ... arrested in August 2015 and charged with two counts of capital murder. He later pleaded guilty to killing the Frenches, and was sentenced to two consecutive life sentences without the possibility of parole in July 2016.
A final note on the twists and turns in the case is that John Alvarez's wedding to Whitley French "had been planned for months. Jose Alvarez Jr. served as a groomsman for his brother even as detectives were planning to arrest him on charges that he murdered his new sister-in-law’s parents." Battaglia.

Related posting

"We Can Predict Your Face" and Put It on a Billboard, Forensic Sci., Stat. & L., Nov. 28, 2016

Sunday, January 8, 2017

Reflections on Glass Standards: Statistical Tests and Legal Hypotheses

Statistical Applicata (Italian Journal of Applied Statistics) recently published several issues (volume 27, nos. 2 & 3) devoted to statistics in forensic science and law. They include an invited article I prepared in 2016 on the statistical logic of declaring pieces of glass "indistinguishable" in their physical properties. 1/ The article contains some of the views expressed in postings on this blog (e.g., Broken Glass: What Do the Data Show?). However, the issue is much broader than glass evidence. The article notes the potential for confusion in reporting that any kind of trace-evidence samples match (or cannot be distinguished) without also describing data on the frequency of such matches in a relevant population. I am informed that NIST's Organization of Scientific Area Committees on Forensic Science (OSAC) is preparing guidelines or standards for explaining the probative value of results obtained from ASTM-approved test methods.

The past 50 years have seen an abundance of statistical thinking on interpreting measurements of chemical and physical properties of glass fragments that might be associated with crime scenes. Yet, the most prominent standards for evaluating the degree of association between specimens of glass recovered from suspects and crime scenes have not benefitted from much of this work. Being confined to a binary match/no-match framework, they do not acknowledge the possibility of expressing the degree to which the data support competing hypotheses. And even within the limited match/no-match framework, they focus on the single step of deciding whether samples can be distinguished from one another and say little about the second stage of the matching paradigm–characterizing the probative value of a match. This article urges the extension of forensic-science standards to at least offer guidance for criminalists on the second stage of frequentist thinking. Toward that end, it clarifies some possible sources of confusion over statistical terminology such as “Type I” and “Type II” error in this area, and it argues that the legal requirement of proof beyond a reasonable doubt does not inform the significance level for tests of whether pairs of glass fragments have identical chemical or physical properties.
  1. The article is David H. Kaye, Reflections on Glass Standards: Statistical Tests and Legal Hypotheses, 27 Statistica Applicata -- Italian J. Applied Stat. 173 (2015). Despite the publication date assigned to the issue, the article, as stated above, was not written until 2016.