Friday, October 20, 2017

"Probabilistic Genotyping," Monte Carlo Methods, and the Hydrogen Bomb

Many DNA samples found in criminal investigations contain DNA from several people. A number of computer programs seek to "deconvolute" these mixtures -- that is, to infer the several DNA profiles that are mushed together in the electrophoretic data. The better ones do so using probability theory and a simulation procedure known as a Markov Chain Monte Carlo (MCMC) method. These programs are often said to perform "probabilistic genotyping." Although both words in this name are a bit confusing, 1/ lawyers should appreciate that the inferred profiles are just possibilities, not certainties. At the same time, some may find the idea of using techniques borrowed from a gambling casino (in name at least) disturbing. Indeed, I have heard the concern that "You know, don't you, that if the program is rerun, the results can be different!"

The answer is, yes, that is the way the approximation works. Using more steps in the numerical process also could give different output, but would we expect the further computations to make much of a difference? Consider a physical system that computes the value of π. I am thinking of Buffon's Needle. In 1777, Georges-Louis Leclerc, the Count of Buffon, imagined "dropping a needle on a lined sheet of paper and determining the probability of the needle crossing one of the lines on the page." 2/ He found that the probability is directly related to π. For example, if the length of the needle and the distance between the lines are identical, one can estimate π as twice the number of drops divided by the number of hits.3/ Repeating the needle-dropping procedure the same number of times will rarely give exactly the same answer. (Note that pooling the results for two runs of the procedure is equivalent to one run with twice as many needle drops.) For a very large number of drops, however, the approximation should be pretty good.

MCMC computations are more complicated. They simulate a random walk that samples values of a random variable so as to ascertain a posterior probability distribution. The walk could get stuck for a long time in a particular region. Nevertheless, the general approach is very well established in statistics, and Monte Carlo methods are widely used throughout the sciences. 4/ Indeed, they were integral to the development of nuclear weapons. 5/ The book, Dark Sun: The Making of the Hydrogen Bomb, provides the following account:
On leave from the university, resting at home during his extended recovery [from a severe brain infection], [Stanislaw] Ulam amused himself playing solitaire. Sensitivity to patterns was part of his gift. He realized that he could estimate how a game would turn out if he laid down a few trial cards and then noted what proportion of his tries were successful, rather than attempting to work out all the possible combinations in his head. "It occurred to me then," he remembers, "that this could be equally true of all processes involving branching of events." Fission with its exponential spread of reactions was a branching process; so would the propagation of thermonuclear burning be. "At each stage of the [fission] process, there are many possibilities determining the fate of the neutron. It can scatter at one angle, change its velocity, be absorbed, or produce more neutrons by a fission of the target nucleus, and so on." Instead of trying to derive the expected outcomes of these processes with complex mathematics, Ulam saw, it should be possible to follow a few thousand individual sample particles, selecting a range for each particle's fate at each step of the way by throwing in a random number, and take the outcomes as an approximate answer—a useful estimate. This iterative process was something a computer could do. ...[W]hen he told [John] von Neumann about his solitaire discovery, the Hungarian mathematician was immediately interested in what he called a "statistical approach" that was "very well suited to a digital treatment." The two friends developed the mathematics together and named the procedure the Monte Carlo method (after the famous gaming casino in Monaco) for the element of chance it incorporated. 6/
Even without a computer in place, Los Alamos laboratory staff, including a "bevy of young women who had been hastily recruited to grind manually on electric calculators," 7/ performed preliminary calculations examining the feasibility of igniting a thermonuclear reaction. As Ulam recalled:
We started work each day for four to six hours with slide rule, pencil and paper, making frequent quantitative guesses. ... These estimates were interspersed with stepwise calculations of the behavior of the actual motions [of particles] ... The real times for the individual computational steps were short ... and the spatial subdivisions of the material assembly very small. ... The number of individual computational steps was therefore very large. We filled page upon page with calculations, much of it done by [Cornelius] Everett. In the process he almost wore out his own slide rule. ... I do not know how many man hours were spent on this problem. 8/
NOTES
  1. In forensic DNA work, probabilities also are presented to explain the probative value of the discovery of a "deterministic" DNA profile -- one that is treated as known to a certainty. See David H. Kaye, SWGDAM Guidelines on "Probabilistic Genotyping Systems" (Part 2), Forensic Sci., Stat. & L., Oct. 25, 2015. In addition, the "genotypes" in "probabilistic genotyping" do not refer to genes.
  2. Office for Mathematical, Science and Technology Education, College of Educvation, University of Illinois, Boffon's Needle: An Analysis and Simulation, https://mste.illinois.edu/activity/buffon/.
  3. Id.
  4. See, e.g., Persi Diaconis, The Markov Chain Monte Carlo Revolution, 46 Bull. Am. Math. Soc'y 179-205 (2009), https://doi.org/10.1090/S0273-097908-01238-X; Sanjib Sharma, Markov Chain Monte Carlo Methods for Bayesian Data Analysis in Astronomy, arXiv:1706.01629 [astro-ph.IM], https://doiorg/10.1146/annurev-astro-082214-122339.
  5. Roger Eckhard, Stan Ulam, John von Neumann, and the Monte Carlo Method, Los Alamos Sci., Special Issue 1987, pp. 131-41, http://permalink.lanl.gov/object/tr?what=info:lanl-repo/lareport/LA-UR-88-9068.
  6. Richard Rhodes, Dark Sun: The Making of the Hydrogen Bomb 303-04 (1995).
  7. Id. at 423 (quoting Françoise Ulam).
  8. Id.

No comments:

Post a Comment