Forensic Science, Statistics & the Law: The New York Court of Appeals Returns to Probabilistic Genotyping Software (Part II

The New York Court of Appeals returned to the contentious issue of “probabilistic genotyping software” (PGS) in People v. Wakefield, 2022 N.Y. Slip Op. 02771, 2022 WL 1217463 (N.Y. Apr. 26, 2022). As previously discussed, in People v. Williams, 147 N.E.3d 1131 (N.Y. 2020), a slim majority of the court held that the output of a computer program should not have been admitted without a full evidentiary hearing on its general acceptance within the scientific community. The majority opinion described a confluence of considerations:

The program had only been tested in the laboratory that developed it (“an invitation to bias,” id. at 1141);
The only evidentiary hearing ever conducted on the program had only shown “internal validation” and formal approval by a subcommittee of a state forensic science commission that was a “narrow class of reviewers, some of whom were employed by the very agency that developed the technology,” id. at 1142;
Given “the ‘black box’ nature of that program,” the developer's “secretive approach ... was inconsistent with quality assurance standards” id.; and
Submissions for hearings in other cases “suggested that the accuracy calculations of that program may be flawed,” id.

But which of these four factors were dispositive? Was it the combination of all four, or something in between, that rendered the evidence inadmissible? If the developer were to change its “secretive approach” so as to allow defense experts to study the program’s source code, would that, plus the “internal validation,” be enough to establish general scientific acceptance? Would it be sufficient for the state to refute the suggestions of flawed “accuracy calculations of the program” through testimony from its experts? Just what did the court mean when it summarized its analysis with the statement that “[i]n short, the [PGS] should be supported by those with no professional interest in its acceptance. Frye demands an objective, unbiased review”?

The opinion did not reveal how the majority might answer these questions. Of course, in holding that a hearing was necessary, the Williams majority implied that some information outside of the normal scientific literature could fill the gap created by the absence of replicated developmental validation studies from external (“objective, unbiased”) researchers. But what might that information be?

The court’s encounter with PGS last month did not answer this open question, for the court in Wakefield found that there were replicated studies from the developer of a more sophisticated computer program and other researchers. In addition, it pointed to other evaluations or uses of the program. The totality of the evidence, it reasoned, was stronger than the developer-only record in Williams and demonstrated the requisite general acceptance. But the opinion provoked one member of the court to complain of a "jarring turnabout" from "the same view unsuccessfully advocated by a minority in Williams two years ago."

This posting describes the case, the DNA evidence, and aspects of the discussions of general acceptance that struck me as interesting or puzzling.

The Crime, the Samples, and Some Misunderstood Probabilities of Exclusion

In 2010, John Wakefield strangled the occupant of an apartment with a guitar amplifier cord and made off with various items. The New York State Police laboratory analyzed samples from four areas: the front part of the collar of the victim's shirt, the rear part of the collar; the victim's forearm; and the amplifier cord. The laboratory concluded that the DNA on the collar was “consistent with at least two donors, one of which was the victim, and defendant could not be excluded as the other contributor”; that the DNA from the forearm “was consistent with DNA from the victim, as the major contributor, mixed with at least two additional donors” and the DNA on the cord was “a mixture of at least two donors, from which the victim could not be excluded as a possible contributor.” 2022 WL 1217463, at *1.

At this point, the court’s description of the State Police laboratory’s work becomes had to follow. The court wrote that:

[T]he analyst did not call any alleles based on peaks on the electropherogram below [the pre-established stochastic] threshold. As a result, there was insufficient data to allow the Lab to calculate probabilities for the unknown contributors to the DNA mixtures found on the amplifier cord and the front of the shirt collar.

No alleles at all? It takes only one allele to compute a probability of exclusion, although with such a limited profile, the exclusion probability might be close to zero, meaning that the data are uninformative. In any event, for the other two samples, “[t]he Lab was able to call ... 4 ... STR loci” that enabled “the analyst, using the combined probability of inclusion method, [to opine that] the probability an unrelated individual contributed DNA to the outside rear shirt collar was 1 in 1,088" and “that the probability an unrelated individual contributed DNA ... was 1 in 422" for “the profile obtained from the victim's forearm.”

Or so the court said. As explained in Box 1, these numbers are not “the probability an unrelated individual contributed DNA.” They are estimates of the probability that a randomly selected, unrelated individual could not be excluded as a possible source. Given a large number of unrelated individuals in the region, there easily could be more than a hundred people with STR profiles compatible with the mixtures.

BOX 1: TRANSPOSITION

The probability of inclusion is not the probability that an included individual is the contributor. It is the probability of not excluding an individual as a possible contributor. That probability is not necessarily equal to the probability that an included individual actually contributed to the sample from which he or she could not be excluded. If C stands for contributor and I for included, the probability of inclusion for any randomly selected individual can be written P(I given C). The source probability for the individual is different. It is P(C given I). Equating the two is known as the transposition fallacy (or the “prosecutor’s fallacy,” though it could be called the “judges” fallacy as well).

We do not need any symbols to see that the two conditional probabilities are not necessarily equal. The population of Schnectady county, where the crime occurred, was about 155,000 in 2010. Let’s round down to 150,000. That ought to remove all of Wakefield’s relatives. Excluding all but 1 in 1,088 individuals would leave 138 people as possible perpetrators. Of course, some would be far more plausible suspects than others, but based on the DNA evidence alone, how can the court claim that “the probability an unrelated individual contributed DNA to the outside rear shirt collar was 1 in 1,088”? That probability cannot be determined from the DNA evidence alone. It can be computed only if we are willing to assign a “prior probability” of being the murderer to each of the unrelated individuals in Schnectady (or anywhere else).

Suppose we assume that, ab initio, everyone in the county has an equal probability of being a source of the DNA on the collar. At that point, Wakefield’s probability is quite small. It is 1/150,000. Since the DNA testing would have excluded all but some 138 people, and because Wakefield is one of them, the probability attached to him is larger. Now the probability is 1/138. But that still leaves the vast bulk of the probability with the 137 unrelated individuals. Instead of transposing, we should say that “the probability an unrelated individual contributed DNA to the outside rear shirt collar was 137 out of in 138” rather than the court’s “1 in 1,088.” Of course, our assumption of equal probabilities for every unrelated individual is unrealistic, but that does not impeach the broader point that the mathematics does not make the probability of an unrelated individual the number that the court supplied.

Cybergenetics to the Rescue

To secure a better and more complete analysis, “the electronic data from the DNA testing of the four samples at issue was then sent to Cybergenetics [for] calculating a likelihood ratio—using all of the information generated on the electropherogram, including peaks that fall below a laboratory's stochastic threshold.” Cybergenetics is a private company whose “flagship TrueAllele® technology resolves complex forensic evidence, providing accurate and objective DNA match statistics.” TrueAllele's calculations of the likelihood ratios, using the hypothesis that the four samples contained DNA from an unrelated black individual as the alternative to the hypothesis that Wakefield’s DNA was present were 5.88 billion for the cord, 170 quintillion for the outside rear shirt collar, 303 billion for the outside front shirt collar, and 56.1 million for the forearm.

Wakefield moved to exclude these findings, The Schnectady County Supreme Court held a pretrial evidentiary hearing “over numerous days.” People v. Wakefield, 47 Misc.3d 850, 851, 9 N.Y.S.3d 540 (2015). (New York calls its trial courts supreme courts.) Finding “that Cybergenetics TrueAllele Casework is not novel but instead is ‘generally accepted’ under the Frye standard,” \1/ Justice Michael V. Coccoma (New York calls its trial judges justices) denied the motion. 47 Misc.3d at 859. A jury convicted Wakefield of first degree murder and robbery. The Appellate Division affirmed, and seven years after the trial, so did the Court of Appeals (New York calls its most supreme court the Court of Appeals).

Changes in New York’s Highest Court

Back in Williams, the Court of Appeals judges had split 4-3 on whether New York City's home-grown PGS had attained general acceptance. The three judges led by Chief Judge Janet M. DiFiore * objected to the majority’s negative comments about PGS and propounded a narrower rationale for requiring a Frye hearing. But even if one could have confidently applied the majority reasoning in Williams to the scientific status of TrueAllele in Wakefield, the exercise in legal logic might have been futile. In the two short years since Williams, the composition of the court had changed. One concurring judge died, and the majority-opinion bloc lost half its members, including the opinion’s author, to retirements. The reconstituted court gave Chief Judge DiFiore the opportunity to write a more laudatory opinion for a new and larger majority.

Only one judge stood apart from this new majority. Having been in the majority in Williams, Judge Jenny Rivera now found herself in the Chief Judge’s situation in Williams, composing a dissenting opinion with respect to the reasoning on general acceptance but concurring in the result. Drawing on Williams, Judge Rivera maintained that “the court erred in admitting the TrueAllele results but the error ... was harmless” in view of the other evidence of guilt.

The Court’s Understanding of TrueAllele

The opinions are vague about the inner workings of TrueAllele. The majority opinion suggests that what is distinctive about PGS is that it cranks out a likelihood ratio. \2/ But “likelihood ratio,” for present purposes, simply denotes the probability of data given one hypothesis divided by the probability of the same data given a (simple) alternative hypothesis. It has nothing to do with the probabilistic part of TrueAllelle. Indeed, TrueAllele only computes a likelihood ratio after the probability analysis is completed. It does this by dividing (i) the final posterior odds that favor one source hypothesis as compared to another by (ii) the initial prior odds. This division gives a “Bayes' factor” that states how much the data have changed the odds.

Let me try saying this another way. In effect, TrueAllele starts with prior odds based solely on the frequencies of various DNA alleles (and hence genotypes) in some population, performs successive approximations to converge on a better estimate of the odds, and divides the adjusted odds by the prior odds to yield what Cybergenetics calls “the match statistic.” If all goes well, this quotient (call it a likelihood ratio, a Bayes' factor, a match statistic, or whatever you want) reveals how powerful the DNA evidence is (which is not necessarily the same as the odds that any hypothesis is true). At least, that is what I think goes on. The court contents itself with warm and fuzzy statements such as “a probability model to assess the values of a genotype objectively” “based on mathematical computations from all the data in the electropherograms.” and “separates the genotypes using the mathematical probability principle of the Markov chain Monte Carlo (MCMC) search to calculate the probability for what the different genotypes could be.” (This last clause may not be so warm and fuzzy; it begins to unpack what I simplistically called successive approximations.)

The Timing for General Acceptance

Wakefield is a backwards-looking case. The main question before the Court of Appeals was whether, in 2015, TrueAllele reasonably could have been deemed to have been generally accepted in the scientific community. That is what New York law requires. \3/ The Chief Judge’s analysis of the general acceptance of TrueAllele starts with the observation that “[t]he well-known Frye test applied to the admissibility of novel scientific evidence (Frye v. United States, 293 F. 1013 [D.C. Cir.1923]) is 'whether the accepted techniques, when properly performed, generate results accepted as reliable within the scientific community generally' (People v. Wesley, 83 N.Y.2d 417, 422, 611 N.Y.S.2d 97, 633 N.E.2d 451 [1994]).”

Wesley is an interesting case to cite here. One would not know from the citation or the analysis in Wakefield that in Wesley there was no opinion for a majority of the seven judges on the court. There was one opinion for three judges and another opinion for two judges concurring only in the result. The remaining two judges did not participate. The concurring opinion was written by the late Chief Judge Judith S. Kaye, the longest-serving chief judge in New York history.

Chief Judge Kaye’s concurrence is memorable for its skepticism about finding general acceptance on the basis of studies from the developer of a method. Current Chief Judge Janet DiFiore briefly summarized that discussion (as did the majority in Williams). A more complete exposition is in Box 2. Chief Judge DiFiore then suggests that the Wesley concurrence was satisfied because “[n]otwithstanding these concerns, Chief Judge Kaye ultimately agreed that, at the time the appeal was decided, "RFLP-based forensic analysis [was] generally accepted as reliable" and those testing procedures were accepted as the standard methodology used in the scientific community until the advent of the PCR STR method used today.”

This presentation places an odd spin on the Wesley concurrence. The sole basis for the concurrence was that “it can fairly be said that use of DNA evidence was harmless beyond a reasonable doubt” because the DNA evidence “added nothing to the People's case.” 83 N.Y.2d at 444–45. The observations that five years after the hearing in Wesley, it had become clear that “in principle” RFLP-VNTR testing was “fundamentally sound” and was generally accepted were clearly dicta. Chief Judge Kaye was not suggesting that because a method had become generally accepted later, its earlier admission was vindicated. The dicta on later general acceptance was intended to inform trial courts that while they were at liberty to admit RFLP-VNTR evidence without pretrial hearings on general acceptance, they still needed to probe “the adequacy of the methods used to acquire and analyze samples ... case by case.” Id. at 445.

In contrast to Wesley, which emphasized the state of the science “at the time of the Frye hearing in 1988,” 83 N.Y.2d at 425 (plurality opinion), and whether “in 1988, ... there was consensus,” id. at 439 (concurring opinion), Chief Judge DiFiore’s opinion is less precise on when general acceptance came into existence:

BOX 2. PEOPLE v. WESLEY 83 N.Y.2d 417, 439–41, 611 N.Y.S.2d 97, 633 N.E.2d 451 (N.Y. 1994) (Chief Judge Kaye, concurring) (citations and footnote omitted)

The inquiry into forensic analysis of DNA in this case also demonstrates the "pitfalls of self-validation by a small group" Before bringing novel evidence to court, proponents of new techniques must subject their methods to the scrutiny of fellow scientists, unimpeded by commercial concerns.

A Frye court should be particularly cautious when — as here — "the supporting research is conducted by someone with a professional or commercial interest in the technique" DNA forensic analysis was developed in commercial laboratories under conditions of secrecy, preventing emergence of independent views. No independent academic or governmental laboratories were publishing studies concerning forensic use of DNA profiling. The Federal Bureau of Investigation did not consider use of the technique until 1989. Because no other facilities were apparently conducting research in the field, the commercial laboratory's unchallenged endorsement of the reliability of its own techniques was accepted by the hearing court as sufficient to represent acceptance of the technique by scientists generally. The sole forensic witness at the hearing in this case was Dr. Michael Baird, Director of Forensics at Lifecodes laboratory, where the samples were to be analyzed. While he assured the court of the reliability of the forensic application of DNA, virtually the sole publications on forensic use of DNA were his own or those of Dr. Jeffreys, the founder of Cellmark, one of Lifecodes' competitors. Nor had the forensic procedure been subjected to thorough peer review. ***

The opinions of two scientists, both with commercial interests in the work under consideration and both the primary developers and proponents of the technique, were insufficient to establish "general acceptance" in the scientific field. The People's effort to gain a consensus by having their own witnesses "peer review" the relevant studies in time to return to court with supporting testimony was hardly an appropriate substitute for the thoughtful exchange of ideas in an unbiased scientific community envisioned by Frye. Our colleagues' characterization of a dearth of publications on this novel technique as the equivalent of unanimous endorsement of its reliability ignores the plain reality that this technique was not yet being discussed and tested in the scientific community.

"Although the continuous probabilistic approach was not used in the majority of forensic crime laboratories at the time of the hearing, the methodology has been generally accepted in the relevant scientific community based on the empirical evidence of its validity, as demonstrated by multiple validation studies, including collaborative studies, peer-reviewed publications in scientific journals and its use in other jurisdictions. The empirical studies demonstrated TrueAllele's reliability, by deriving reproducible and accurate results from the interpretation of known DNA samples."

Presumably, and notwithstanding citations to materials appearing after 2015, \4/ she meant to write that the methodology had been generally accepted in 2015 because the indications listed were present then. (Whether the decisive time for general acceptance should be that of the trial rather than the appeal is not completely obvious. If a technique becomes generally accepted later, why should the defendant be entitled to a new trial in which the evidence that should have been excluded has become admissible anyway? The defendant's interest in the time-of-trial rule is the interest in not being convicted with the help of scientifically sound evidence (as per the general-acceptance standard based on the best current knowledge). A counter-argument is that a large pool of potential defense experts to question the application of the general accepted method in the particular case did not exist at the time of trial because the evidence was too novel.)

Quantifying the Accuracy of PGS

Turning to the question of the state of acceptance as of 2015, the majority opinion maintains that

]T]he methodology has been generally accepted in the relevant scientific community based on the empirical evidence of its validity, as demonstrated by multiple validation studies, including collaborative studies, peer-reviewed publications in scientific journals and its use in other jurisdictions. The empirical studies demonstrated TrueAllele's reliability, by deriving reproducible and accurate results from the interpretation of known DNA samples.

Both the fact that the software was written to implement uncontroversial mathematical ideas and the published empirical evidence are important. If the software were designed to implement a mathematically invalid procedure, the game would be over before it began. But techniques such as Bayes’ rule and sampling methods for getting a representative picture of the posterior distribution only work when they are developed appropriately for a particular application. Acknowledging that these tools have been used to solve problems in many fields of science is a bit like saying that the mathematics of probability theory is undisputed. The validity of the mathematical ideas are a necessary but hardly a sufficient condition for a finding that software intended to apply the ideas functions as intended. Using a particular mathematical formula or method to describe or predict real-world phenomena is an endeavor that is subject to and in need of empirical confirmation. Because PGS models the variability in the empirical data that emerge from chemical reactions and electronic detectors, “empirical evidence ... of its accuracy” is indispensable to establishing its accuracy.

Unfortunately, Wakefield is short on details from the “multiple validation studies” and “peer-reviewed publications.” What do the studies and publications reveal about the accuracy of output such as “5.88 billion times more probable” and “170 quintillion times more probable”? The Supreme Court opinion is devoid of any quantitative statement of how well the deconvoluted individual profiles and their Bayes’ factors reported by TrueAllele correspond to the presence or absence of those profiles in samples constructed with or otherwise known to contain DNA from given individuals. So is the Appellate Division opinion. So too with the Court of Appeals’ opinions. The court is persuaded that “[t]he empirical studies demonstrated TrueAllele's reliability, by deriving reproducible and accurate results from the interpretation of known DNA samples.” But how well did True Allele perform in the “many published and peer reviewed” validity studies?

A separate posting summarizes parts of the six studies circa 2015 that are both published and peer reviewed. The numbers in these studies suggest that within certain ranges (with regard to the quantity of DNA, the number of contributors, and the fractions from the multiple contributors), TrueAllele’s likelihood ratios discriminate quite well between samples paired with true contributors and the same samples paired with noncontributors. For example, in one experiment, LR was never greater than 1 for 600,000 simulations of false contributors to 10 two-person mixtures containing 1 nanogram of DNA—no observed false positives! Conversely, LR was never less than 1 for every true contributor to the same ten mixtures—no observed false negatives in 20 comparisons. Moreover, the program’s output behaves qualitatively as it should, generally producing smaller likelihood ratios for electrophoretic data that are more complex or more bedeviled by stochastic effects on peak heights and locations.

Such results suggest that TrueAllele’s LRs are in the ballpark. Yet, it is hard to gauge the size of the ballpark. Is a computed LR of 5.88 billion truly a probability ratio of 5.88 billion? Could the ratio be a lot less or a lot more? The validity studies do not give quantitative answers to these questions about “accuracy.” \5/

The Developer’s Involvement

On appeal, Wakefield had to convince the court that the unchallenged studies and other indicia of general acceptance were too weak to permit a finding of general acceptance. To do so, he pointed to “the dearth of independent validation as a result of Dr. Perlin's involvement in the large majority of studies produced at the hearing.” (Indeed, Dr. Perlin is the lead author of every one of the five published validity studies and a co-author of a sixth published study that also helps show validity.)

The majority acknowledged “legitimate concern” but decided that it was overcome “by the import of the empirical evidence of reliability demonstrated here and the acceptance of the methodology by the relevant scientific community.” However, the discussion of “the import of the empirical evidence” seems somewhat garbled.

First, the court notes that “the FBI Quality Assurance Standards requires ‘a developmental validation for a particular technology’ be published.” That the FBI might be satisfied with a single publication from the developer of a method does not speak to what the broader scientific community regards as essential to the validation. Along with the QAS, the court cites "NIST, DNA Mixture Interpretation: A NIST Scientific Foundation Review, at 64 (June 2021 Draft report)." The page merely reports that the NIST staff were able to examine “[p]ublicly available data on DNA mixture interpretation performance ... from five sources [including] published PGS studies” and that “conducting mixture studies may be viewed as a necessity to meet published guidelines or QAS requirements ... .” That scientists and other NIST personnel who choose to review a technology will read the scientific reports of the developers of the technology does not tell us much about defendant’s claim that Cybergenetics’ involvement in the published validation studies gravely diminishes “the import of the empirical evidence.”

Second, the Court of Appeals maintained that “the interest of the developer was addressed at the Frye hearing in this case.” As the court described the hearing, the response to this concern was that “[a]lthough Dr. Perlin was involved in and coauthored most of the validation studies, his interest in TrueAllele was disclosed as required by the journals who published the studies and the empirical evidence of the reliability of TrueAllele was not disputed.”

These responses seem rather flaccid. Some of the articles contain conflict-of-interest statements; most do not. \7/ But the presence or absence of obvious disclaimers does not come to grips with the complaint. Defendant’s argument is not that there are hidden funding sources or financial relationships. It is that interests in the outcomes of the studies somehow may affect the results. The claim is not that validation data were fabricated or that the data analysis was faulty. As with the movement for replication and “open science,” it is a response to more subtle threats.

Third, the opinion asserts that “the scientific method” is “entirely consistent with” proof of validity coming from the inventors, discoverers, or commercializers (citing President's Council of Advisors on Sci. and Tech., Exec. Office of the President, Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods, at 46 (2016)). Again, however, the argument is not that only disinterested parties do and should participate in scientific dialog. It is that "[w]hile it is completely appropriate for method developers to evaluate their own methods, establishing scientific validity also requires scientific evaluation by other scientific groups that did not develop the method.” Id. at 80 (https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/PCAST/pcast_forensic_science_report_final.pdf [https://perma.cc/R76Y-7VU]).

That precept leads to the court’s last and most telling response to the “legitimate concern” over “the dearth of independent validation.” The Chief Judge finally wrote that “there were [not only] developer [but also] independent validation studies and laboratory internal validation studies, many published and peer-reviewed.”

But is this a fair characterization of the scientific literature as of 2015? From what I can tell, no more than five or six studies appear in peer-reviewed journals, and none are completely “independent validation studies.” The NIST report cited in Wakefield lists but a single “internal validation” study, from Virginia in 2013, apparently released in response to a Freedom of Information Act request, Although the NIST reviewers limited themselves to laboratory studies or data posted to the Internet, they concluded that “[c]urrently, there is not enough publicly available data to enable an external and independent assessment of the degree of reliability of DNA mixture interpretation practices, including the use of probabilistic genotyping software (PGS) systems.”

Of course, this “Key Takeaway #4.3” is merely part of a draft report and is not a judgment as to what conclusions on validity should be reached on the basis of the published studies and the internal ones. Nevertheless, the court overlooks this prominent “takeaway” (and others). Instead, the Chief Judge asserts that “[t]he technology was approved for use by NIST”—even though NIST is not a regulatory agency that approves technologies—and that “NIST's use of the TrueAllele system for its standard reference materials likewise demonstrates confidence within the relevant community that the system generates accurate results.”

~~~

This is not to say that the scientific literature was patently insufficient to support the court’s assessment of the general scientific acceptance of TrueAllele for interpreting the DNA data in the case. But it does raise the question of whether the court’s assertions about the large number of “independent validity studies” and internal ones that have been “published and peer-reviewed” are exaggerated.

Source Code and General Acceptance

The defense also contended that the state’s testimony and exhibits from “the Frye hearing [were] insufficient because, absent disclosure of the TrueAllele source code for examination by the scientific community, its ‘proprietary black box technology’ cannot be generally accepted as a matter of law.” This argument bears two possible interpretations. On the one hand, it could be a claim that scientists demand open-source programs—those with every line of code deposited somewhere for everyone to see—before they will consider a program suitable for data analysis or other purposes. We can call this position the open-source theory.

On the other hand, the claim might be “that disclosure of the TrueAllele source code [to the defense, perhaps with an order to protect against more widespread dissemination of trade secrets] was required to properly conduct the Frye hearing” and that without at least that much discovery of the code, scientists would not regard TrueAllele as valid. We can call this position the discovery-based theory. It implies that, in establishing general scientific acceptance in a Frye hearing, pretrial discovery of secret code is an adequate substitute for exposing the code to the possible scrutiny of the entire scientific community. \8/

The Wakefield opinions are not entirely clear on about which theory they embrace or reject. Judge Rivera’s concurrence may have endorsed both theories. In addition to accentuating “the need to provide defendant with access to the source code,” she decried the absence of “objective, expert third-party access,” writing that

The court's decision was an abuse of discretion as a matter of law because it relied on validation studies by interested parties and evaluations founded on incomplete information about TrueAllele's computer-based methodology. Without defense counsel and objective, expert third-party access to and evaluation of the underlying algorithms and source code, the court could not conclude that TrueAllele's brand of probabilistic genotyping was generally accepted within the forensic science community.

The “evaluations founded on incomplete information” were from a standards developing organization, a state forensic science commission, and NIST. They were incomplete because, according to Judge Rivera, “without the source code, the agencies could not adequately evaluate the use of TrueAllele for this type of DNA mixture analysis ... .”

Focusing on the discovery-based theory, the rest of the court determined that “[d]isclosure ... was not needed in order to establish at the Frye hearing the acceptance of the methodology by the relevant scientific community. The Chief Judge gave two, somewhat confusingly stated, reasons. The first was that Wakefield sought the source code under a rule for discovery that did not apply and then “made no further attempt to demonstrate a particularized need for the source code by motion to the court.” But it is not clear how the failure “to demonstrate a particularized need” overcomes (or even responds to) the argument that the scientific community will not accept software as validly implementing algorithms unless the source code is either open source or given only to the defendant.

The Chief Judge continued:

Moreover, defendant's arguments as to why the source code had to be disclosed pay no heed to the empirical evidence in the validation studies of the reliability of the instrument or to the general acceptance of the methodology in the scientific community—the issue for the Frye hearing—and are directed more toward the foundational concern of whether the source code performed accurately and as intended (see Wesley, 83 N.Y.2d at 429, 611 N.Y.S.2d 97, 633 N.E.2d 451).

The meaning of the sentence may not be immediately apparent. The defense argument is that giving a defendant (or perhaps the scientific community generally) access to source code is a prerequisite to general acceptance of the proposition that the software correctly implements theoretically sound algorithms. If this broad proposition is false dogma, the court should simply say so. It should announce that source code need not be disclosed because there is an alternative, reasonably effective means for establishing that the software performs as it should. The first part of the first sentence starts out that way, but the sentence then states that “whether the source code performed accurately and as intended” is not a matter of general acceptance at all. It is only “foundational” in the sense identified by Chief Judge Kaye in Wesley, who, as we saw (Box 2), wrote that even though RFLP-VNTR testing was generally accepted, the complete “foundation” for admitting DNA evidence entails proof that the generally accepted procedure was performed properly in the case at bar.

But regarding the argument about source code as falling outside of the Frye inquiry misapprehends the defense argument. Neither the open-source nor the discovery-based theories pertain to the execution of valid software. They question the premise that validity can be generally accepted without disclosure of the program’s source code. Yet, the majority elaborates on its non-Frye "foundational" classification for the source-code argument as follows:

To the extent the testimony at the hearing reflected that the TrueAllele Casework System may generate less reliable results when analyzing more complex mixtures (see also President's Council of Advisors on Sci. and Tech., Exec. Office of the President, Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods, at 80 [2016] https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/PCAST/pcast_forensic_science_report_final.pdf [published after the Frye hearing was held]), defendant did not refine his challenge to address the general acceptance of TrueAllele on such complex mixtures or how that hypothesis would have been applicable to the particular facts of this case. As a result, it is unclear that any such objection would have been relevant to defendant's case, where the samples consisted largely of simple (two-contributor) mixtures with the victim as a known contributor (see also NIST, DNA Mixture Interpretation: A NIST Scientific Foundation Review, at 3 [June 2021 Draft report] https://nvlpubs.nist.gov/nistpubs/ir/2021/NIST.IR.8351-draft.pdf).

These citations to the PCAST and NIST reports actually undercut any suggestion that source-code secrecy does not implicate Frye. The NIST draft repeatedly states that

Forensic scientists interpret DNA mixtures with the assistance of statistical models and expert judgment. Interpretation becomes more complicated when contributors to the mixture alleles. Complications can also arise when random variations, also known as stochastic effects, make it more difficult to confidently interpret the resulting DNA profile.

Not all DNA mixtures present these types of challenges. We agree with the President’s Council of Advisors on Science and Technology (PCAST) that “DNA analysis of single-source samples or simple mixtures of two individuals, such as from many rape kits, is an objective method that has been established to be foundationally valid” (PCAST 2016).

NIST, DNA Mixture Interpretation: A NIST Scientific Foundation Review, at 2-3 & 11-12 (June 2021 draft) (citations omitted). To demand that “defendant ... refine his challenge to address the general acceptance of TrueAllele on ... complex mixtures or ... the particular facts of this case” is to hold that TrueAllele is generally accepted for use with “single-source samples or simple mixtures of two individuals”—even though the source code is hidden. But if science does not demand the disclosure of source code for general acceptance inside the "single" or simple "zone," then why would it demand disclosure for general acceptance outside that zone?

The court's remarks make more sense as a response to Wakefield’s different discovery argument about the need for source code for trial purposes. This argument does not claim that disclosure of source is essential to general acceptance to exist. It looks to the trial rather than the pretrial Frye hearing. The thought may be that if the accuracy of the program for the “simple” cases is assured, then the need for discovery of the code to prepare for trial testimony is less compelling. The court appears to be responding that because “the samples consisted largely of simple (two-contributor) mixtures with the victim as a known contributor,” there was little need for discovery of the code in this case.

Although this rejoinder departs from the topic of what Wakefield teaches us about general acceptance, I would note that it is difficult to reconcile this characterization of the case with Chief Judge DeFiore’s own description of the samples. The court mentioned four samples. Its initial description of them indicates that the New York laboratory deemed the sample on the amplifying cord to be “at least” a three-person mixture and stated that “because of the complexity of the mixture,” the laboratory could not even compare “results generated from the amplifier cord ... to defendant's DNA profile.” 2022 WL 1217463, at *1. Because of the “stochastic threshold,” the laboratory could discern peaks at only 4 out of 15 loci for “the outside rear shirt collar” and “for the profile obtained from the victim's forearm.” Id. Presumably, the “insufficient data” on “the unknown contributors to the DNA mixtures found on the amplifier cord and the front of the shirt collar” is what led the state to call Cybergenetics for help. These samples are not instances of what PCAST called “DNA analysis of single-source samples or simple mixtures of two individuals, such as from many rape kits” or what the NIST group called “two-person mixtures involving significant quantities of DNA.” They are “more complicated” situations that arise “when contributors to the mixture share common alleles [and] when random variations, also known as stochastic effects” are present.

In sum, the deeper one looks into the Wakefield opinions, the more there is to wonder about. But whatever quirks and quiddities reside in the writing, the nearly unanimous opinion of the Court of Appeals signals that a trial court can choose to dispense with the general-acceptance inquiry for at least one PGS program—TrueAllele—for nonchallenging single samples or two-person mixtures and for samples of somewhat greater complexity as well.

NOTES

* UPDATE: On July 12, 2022, Chief Judge DiFiore announced that she will resign on August 31. See, e.g., Jimmy Vielkind & Corinne Ramey, New York’s Top Judge Resigns Amid Misconduct Proceeding: Attorney for Court of Appeals Judge Janet DiFiore Said Her Resignation Wasn’t Related to a Claim that She Improperly Attempted to Influence a Disciplinary Hearing, Wall St. J., July 12, 2022 8:31 am ET, https://www.wsj.com/articles/new-yorks-top-judge-resigns-amid-misconduct-proceeding-11657629111.

This formulation conflates the issue of novelty with the issue of general acceptance, which can change over time. See Williams, 35 N.Y.3d at 43, 147 N.E.3d at 1143.
The description begins with the remark that “The likelihood ratio in its modern form was developed by Alan Turing during World War II as a code-breaking method.” That is a possibly defective bit of intellectual history, inasmuch as Turing did not develop the likelihood ratio. To decipher messages, Turing relied on a logarithmic scale for the Bayes’ factor in two ways—as indicating the strength of evidence, and as a tool for sequential analysis. Sir Harold Jeffreys had done the former in his 1939 Theory of Probability book. The sequential analysis problem is not clearly connected to PGS. It arises when the sample size is not fixed in advance and the data are evaluated continuously as they are collected. PGS processes all the data at once.
As the court wrote in People v. Williams, 35 N.Y.3d 24, 147 N.E.3d 1131, 1139–40, 124 N.Y.S.3d 593 (N.Y. 2020), “[r]eview of a Frye determination must be based on the state of scientific knowledge and opinion at the time of the ruling (see Cornell, 22 N.Y.3d at 784-785, 986 N.Y.S.2d 389, 9 N.E.3d 884 (‘a Frye ruling on lack of general causation hinges on the scientific literature in the record before the trial court in the particular case’”).
E.g., 2022 WL 1217463 at *7 n.10 (“TrueAllele is not an outlier in the use of the continuous probabilistic genotyping method. Other types of probabilistic genotyping software, such as STRMix, have likewise been found to be generally accepted (see e.g. United States v. Gissantaner, 990 F.3d 457, 467 (6th Cir.2021)).”
Cf. David H. Kaye, Theona M. Vyvial & Dennis L. Young, Validating the Probability of Paternity, 31 Transfusion 823 (1991) (comparing the empirical LR distribution for parentage using presumably true and false mother-child-father trios derived from a set of civil paternity cases to the “paternity index” (PI), a likelihood ratio computed with software applying simple genetic principles to the inheritance of HLA types, and reporting that the theoretical PI diverged from the empirical LR for PI > 80 or so).
At trial, “Gary Skuse, Ph.D., a professor of biological sciences at the Rochester Institute of Technology, testified at trial as a defense witness [and] agreed ... that defendant's DNA was present in the mixtures found on the shirt collar and amplifier cord and that it was ‘most likely’ present on the victim's forearm.”
The articles in the Journal of Forensic Sciences and Science and Justice have no such statements. The “Competing Interests” paragraph in a PloS One article advises that “I have read the journal’s policy and have the following conflicts. Mark Perlin is a shareholder, officer and employee of Cybergenetics in Pittsburgh, PA, a company that develops genetic technology for computer interpretation of DNA evidence. Cybergenetics manufactures the patented TrueAllele Casework system, and provides expert testimony about DNA case results. Kiersten Dormer and Jennifer Hornyak are current or former employees of Cybergenetics. Lisa Schiermeier-Wood and Dr. Susan Greenspoon are current employees of the Virginia Department of Forensic Science, a government laboratory that provides expert DNA testimony in criminal cases and is adopting the TrueAllele Casework system. This does not alter our adherence to all the PLOS ONE policies on sharing data and materials.”
The defense advanced another different discovery theory in arguing that it could not adequately cross-examine and confront Dr. Perlin at trial unless it could access the source code. The court rejected this theory too.

Forensic Science, Statistics & the Law

Pages

Tuesday, May 24, 2022

The New York Court of Appeals Returns to Probabilistic Genotyping Software (Part II—General Acceptance)

No comments:

Post a Comment

Labels

Popular Posts

Search This Blog

Blog Archive

Places to visit, books to read, meetings to attend [or to avoid]