Sunday, October 25, 2015

SWGDAM Guidelines on "Probabilistic Genotyping Systems" (Part 2)

What makes a "Probabilistic Genotyping System" probabilistic? That a computer program delivers a probability related to a DNA profile does not make it a PGS. After all, traditional, manual analysis of DNA data leads to probabilities. Here, I present a toy example of a single-source sample to convey a sense of the nature of probabilistic genotyping.

I do so with some trepidation. Neither the SWGDAM Guidelines nor the articles that I have located supply a simple and clear exposition of the actual workings of any modern forensic PGS. The Guidelines state that
A probabilistic genotyping system is comprised of software, or software and hardware, with analytical and statistical functions that entail complex formulae and algorithms. Particularly useful for low-level DNA samples (i.e., those in which the quantity of DNA for individuals is such that stochastic effects may be observed) and complex mixtures (i.e., multi-contributor samples, particularly those exhibiting allele sharing and/or stochastic effects), probabilistic genotyping approaches can reduce subjectivity in the analysis of DNA typing results.
That sounds great, but what do these "complex formulae and algorithms" do? Well,
probabilistic approaches provide a statistical weighting to the different genotype combinations. Probabilistic genotyping does not utilize a stochastic threshold. Instead, it incorporates a probability of alleles dropping out or in. In making use of more genotyping information when performing statistical calculations and evaluating potential DNA contributors, probabilistic genotyping enhances the ability to distinguish true contributors and noncontributors.
Moreover, "[t]he use of a likelihood ratio as a reporting statistic for probabilistic genotyping differs substantially from binary statistics such as the combined probability of exclusion."

This sounds good too, but what is "a statistical weighting," and how is a probability of exclusion, which is not confined to 0 to 1, a "binary statistic"? To gain a clearer picture of what might be going on, I thought I would start with the simplest possible situation — a crime-scene sample with a single contributor — to surmise how a probabilistic analysis might operate. My analysis is something of a guess. Corrections are welcome.

Two Peaks, One Inferred Genotype, One Likelihood Ratio of 50: Not a PGS!

In "short tandem repeat" typing via capillary electrophoresis, the laboratory extracts DNA from a sample and uses the PCR (polymerase chain reaction) to make millions of copies of a short stretch of DNA between a designated starting point and a stopping point (a "locus"). These fragments vary in length among different individuals (although none are unique). The laboratory runs the sample fragments through a machine that measures the quantity of the fragments as a function of the length of the fragments. For example, a plot of the quantity on the y-axis and the fragment length on the x-axis might show two prominent peaks, which I will call A and B, of roughly equal height rising above a noisy baseline. This AB pattern at a single locus is exactly what one would expect for DNA from an individual who inherited a fragment of length A from one parent and a fragment of length B from the other parent. Starting with roughly equal numbers of maternally and paternally inherited DNA molecules in the original sample, PCR should generate about equal quantities of the maternal and paternal length variants ("STR alleles") of the two distinct lengths. These produce the two peaks in the graph (the electropherogram).

The analyst then could compute the “random match probability” or “probability of inclusion” (PI) — that is, the probability P(RAB) that a randomly selected individual would be type AB. Even if the analyst used a computer program to do the calculation, no “probabilistic genotyping” would be involved. The “genotype” AB would be regarded as known to a certainty (for the purpose of the computation), and the probability PI pertains to something else — to the chance of coincidentally finding an individual with a matching profile: PI = P(RAB). If 1 in 50 people have the profile AB, then PI = 1/50.

The evidentiary value of the inclusion can be computed as a “likelihood ratio” (LR). If the hypothesis (Hp) that the suspect, who also is type AB, is the contributor of the DNA in the sample is correct, and if the sample has plenty of undegraded DNA, the probability of the data DAB (an A and a B peak detected in the sample) is P(DAB|Hp) = 1. On the other hand, if someone unrelated to the suspect is the contributor (Hd), then P(DAB|Hd) is the probability of inclusion PI = 1/50. Thus, the evidence — the A and B peaks — is 1/PI = 50 times more probable when the suspect is the contributor than when an unrelated person is. This ratio of the probabilities of the evidence conditional on the hypotheses is the likelihood ratio. It measures the support the evidence lends to Hp as opposed to Hd. LRs greater than 1 support Hp over Hd (e.g., Kaye et al. 2011).

Two Peaks, Two Inferred Genotypes with Probabilities for Each Genotype: A PGS?

This much is straightforward, conventional thinking. But an AB contributor is not the only conceivable explanation for the two peaks. Maybe they reflect DNA from an AA individual (one who inherited the fragment of length A from both parents), and the B is just an artifact known as “stutter” (Brooks et al. 2012). If this possibility cannot be dismissed as wildly improbable (as it could be if, for example, the putative stutter peak were far from the A peak), then the analysis should take into account both AA and AB as possible contributor profiles.

One way to do so would be to study the detection probability P(DAB) in experiments with samples from AA and AB contributors. Suppose that a large number of such experiments showed that when the contributor is AA, the probability of detecting AB is P(DAB|CAA) = 1/10 and that when the contributor is AB, the probability is P(DAB|CAB) = 1. Sometimes, AA contributors produce AB peaks; AB contributors always do.

In a case in which the suspect is type AB, what is the evidentiary value of the two peaks A and B? The suspect is still AB, so P(DAB|Hp) is unchanged at 1. But the denominator of the LR, P(DAB|Hd) requires us to consider the probability that the contributor’s profile is AA as well as the probability that it is AB. Imagine that the laboratory receives crime-scene samples with DNA profiles that are representative of a population in which 1 in 100 people are AA and (as stated before) 1 in 50 are AB. Because only 1 in 10 DNA samples from AA contributors will appear to be AB, about 1 in 1000 samples will have the AB peaks and come from AA contributors:

P(CAA & DAB) = P(CAA) ⋅ P(DAB|CAA) = (1/100) ⋅ (1/10) = 1/1000.

More samples, about 20 per 1000, will have the AB peaks and come from AB contributors:

P(CAB & DAB) = P(CAB) ⋅ P(DAB|CAB) = (1/50) ⋅ (1) = 20/1000.

Thus, in about 20 out of 21 detections of AB peaks, the contributor is AB. (Most readers who have borne with me this far will recognize this result as a simple application of Bayes' rule for the posterior probability: P(CAB|DAB) = 20/21.)

A PGS thus could assign probabilities of P(CAA|DAB) = 1/21 and P(CAB|DAB) = 20/21 for the two possible contributor genotypes. The hypothesis Hd is that either an unrelated person who is AA or, as before, that the peaks come from an unrelated AB contributor. If the suspect is not the source and if the apparent AB profile really is AA (which has probability 1/21), Hd requires that a random, unrelated person be type AA (an event that has probability P(RAA) = 1/100). Likewise, if the suspect is not the source and the apparent AB profile really is AB (which has probability 20/21), then Hd requires that a random, unrelated person be type AB (an event that has probability P(RAB) = 1/50). Consequently, the probability of the evidence DAB given Hd is

         P(DAB|Hd) = P(RAA) ⋅ P(CAA|DAB) + P(RAB) ⋅ P(CAB|DAB)
                    = (1/100) (1/21) + (1/50) (20/21) = 41/2100 = 0.0195.

This likelihood is very close to the previous denominator of 1/50 = 0.020. The resulting LR is 2100/41 = 51.2.

The Probability in PGS

This toy model of a PGS only used information about peak location and only mentioned a stutter peak as a source of uncertainty in the contributor's genotype. A more sophisticated PGS would use peak heights as well and would attend to allelle drop-in and drop-out, and other complicating features. The most complete models dispense with the rules of thumb (“analytical thresholds,” “stochastic thresholds,” and “peak-height ratios”) that human examiners employ to decide whether a peak is high enough to count as real, what to do with it in computing a likelihood ratio, and what potential genotypes to cross off the list of possibilities when confronted with a mixture of DNA from several contributors (Kelly et al. 2014).

I do not propose to explain these matters any better than SWGDAM has. My purpose here has been to clarify just what is “probabilistic” about a PGS. The key point is not that the system produces a likelihood ratio as opposed to a probability of exclusion or inclusion. Likelihood ratios also apply to categorical inferences as to what profiles are present in a mixed sample. A PGS is distinctive because it assigns probabilities to the possible profiles and uses more information to arrive at what, one hopes, is a better likelihood ratio for the hypotheses about whether a suspect is a contributor.

References
  • C. Brookes, J.A. Bright, S. Harbison, J. Buckleton, Characterising Stutter in Forensic STR Multiplexes, 6 Forensic Sci. Int’l: Genetics 58-63 (2012)
  • David H. Kaye et al., The New Wigmore on Evidence: Expert Evidence (2d ed. 2011)
  • Hannah Kelly, Jo-Anne Bright, John S. Buckleton, James M. Curran, A Comparison of Statistical Models for the Analysis of Complex Forensic DNA Profiles, 54 Sci. & Justice 66–70 (2014)
Acknowledgement
Thanks are owed to Sandy Zabell for correcting errors in the original posting. This version was last updated 1 February 2016.

Thursday, October 22, 2015

SWGDAM Guidelines on "Probabilistic Genotyping Systems" (Part 1)

In June, the Scientific Working Group on DNA Analysis Methods (SWGDAM), approved new “Guidelines for the Validation of Probabilistic Genotyping Systems.” 1/ They begin,
Guidance is provided herein for the validation of probabilistic genotyping software used for the analysis of autosomal short tandem repeat (STR) typing results. These guidelines are not intended to be applied retroactively. It is anticipated that they will evolve with future developments in probabilistic genotyping systems.
These three sentences, raise four questions. First, is the phrase “probabilistic genotyping system” (PGS) the best label? I will get to the question of what “probabilistic” means a little later, but given the perception of segments of the public and the legal community that “autosomal short tandem repeat (STR) results” are “very likely” “to reveal predispositions to diseases in the individuals being profiled as well as their siblings and offspring,” 2/ is “genotyping” the right word to use for identifying DNA variations that are not genes? A more neutral term such as “probabilistic typing systems” might be less suggestive.

Second, why do the drafters of standards and guidelines prefer stilted writing—“guidance is provided herein”—as opposed to plain English sentences such as “This document offers guidance”? I know this kind of criticism is small potatoes, but scientists are smart enough to be good writers.

Third, what are the drafters trying to say with the doubly passively voiced sentence, “These guidelines are not intended to be applied retroactively”? Who should not apply these standards retroactively? One would think that the guidelines are for laboratories, but how could a laboratory apply a recommendation retroactively? It cannot go back in time to validate software that it has been using even though neither it nor the developer had validated the software in the manner that SWGDAM now recommends. The only thing the laboratory could do to give retroactive effect to the new advice would be to use some better validated software on data from old cases and advise prosecutors, defendants, or defense lawyers of major discrepancies. Is SWGDAM saying that looking back at past cases (for research or other purposes) would be wrong? Or merely that SWGDAM is taking no position on the desirability of undertaking such retrospective analyses? Or is this part of the guidelines written for a difference audience—courts that might be asked to grant postconviction relief? But unless every PGS was adequately validated, surely courts should consider what these guidelines have to say as relevant to (but not necessarily dispositive of) whether the laboratory’s earlier report was scientifically acceptable. Most courts can be expected to appreciate the fallacy of the argument that "because the world gets wiser as it gets older, therefore it was foolish before." 3/

Fourth, why does SWGDAM anticipate that “future developments in probabilistic genotyping systems” will cause these standards to “evolve”? The principles of good software development and validation do not depend on the specific programs. Those principles may evolve whether or not PGSs improve over time. Of course, the guidelines could change if the programs become so superior that SWGDAM would reconsider its view (expressed in the next paragraph) that the only permissible use of a PGS is “to assist the DNA analyst in the interpretation of forensic DNA typing results.” Is SWGDAM envisioning that it could reverse its opinion that “Probabilistic genotyping is not intended to replace the human evaluation of the forensic DNA typing results” because of “future developments in [PGS]”? In light of current problems with human interpretations of mixtures of minute quantities, there are observers who would welcome replacing the current protocols for interpreting these samples with valid and reliable automated expert or probabilistic systems.

Notes

1. Scientific Working Group on DNA Analysis Methods, Guidelines for the Validation of Probabilistic Genotyping Systems, June 15, 2015

2. Gary R. Skusea1 & Anne M. Burgera, Justice as Fairness: Forensic Implications of DNA and Privacy, Champion, Apr. 2015, at 24. For a more authoritative assessment, see Henry T. Greely & David H. Kaye, A Brief of Genetics, Genomics and Forensic Science Researchers in Maryland v. King, 53 Jurimetrics J. 43 (2013).

3. Hart v. Lancashire &Yorkshire Ry. Co., 21 L.T.R. N.S. 261, 263 (1869).