The mathematics of coincidental matches, particularly with respect to the size of the database, is a thorny subject. However, the odds of a coincidental match seem to rise (the so-called birthday problem). Given the strength of the tunnel vision that sets in when DNA is involved (the Lukis Anderson/Raveesh Kumra case is a recent example), I would think long and hard about the wisdom of having an all-inclusive database.The prospect of more false accusations definitely deserves thought.
In evaluating this risk, the birthday problem is not on point. In the birthday problem, the number of comparisons grows exponentially with the number of people of in the room because there is no single birthday of interest. For a group of size N, there are approximately N2/2 comparisons. For a fixed birthday, however, there are only N comparisons.
In case work, no one examines all possible pairs of profiles. Like a fixed birthday, a single crime-scene profile is compared to the profiles in the database. Because the number of comparisons is N, the risk grows only linearly with the size of the database. The exponential growth in the Birthday Paradox does not occur.
CODIS is expanding the number of loci in the profile, so the chance that two people (other than monozygotic twins) share the same profile will be even smaller than it is now. When the crime-scene sample yields a good profile (say, 16 or more loci), I cannot see why increasing the database size to that of the entire population would produce many matches to people whose DNA was not at the crime-scene.
The record of an innocent monozygotic twin would pop up in every database trawl for a crime-scene DNA profile that actually belongs to the guilty twin. Strictly speaking, these are not these "coincidental," because there is a deterministic explanation. For such matches -- and for truly coincidental ones -- the population-wide database flags the problem for the police. They get matches to more than one individual! They will know that they must investigate further. Thus, universality alleviates the problem of a "coincidental match" by making every such match apparent.
The problem in the Kumra case, according to prosecutors, was secondary transfer to fingernails that (I assume) did not have DNA from the actual killers. Likewise, if the police or anyone else plants DNA from a target in an incriminating place where the perpetrator's DNA might be found -- and the perpetrator's DNA is not there -- the database trawl will identify the target instead of the perpetrator.
If a crime-scene sample becomes contaminated with extraneous DNA from individuals who were not present at the crime-scene in an amount sufficient to yield a clear and complete profile, the result could be a false accusation and ensuing conviction for people living in the vicinity, being in the correct age range and physical capacity, and lacking a convincing alibi.
Another limitation of DNA databases is that, at best, they can only show that an individual was at a crime-scene at some point. If police and prosecutors unreflectively equate presence with guilt and a suspect has no persuasive explanation for his presence, injustice could follow. However, this is a problem today. Arguably, a universal database might diminish the problem: cases of innocent presence would arise more often, leading to greater sensitivity to this limitation.
Less worrisome are errors such as mislabeling or mistyping the samples in the database. If the profile recorded for me is not my profile but was present at the crime-scene, this mistake will become apparent when I am retested after the cold hit, as is standard procedure. The confirmatory test will exclude me.
A population-wide has other advantages than those I mentioned today and yesterday -- but it also has its share of disadvantages.
References
- Persi Diaconis & Frederick Mosteller, Methods for Studying Coincidences, 84 J. Am. Stat. Ass’n 853 (1989) (discussed in Gina Kolata, 1-in-a-Trillion Coincidence, You Say? Not Really, Experts Find, N.Y. Times, Feb. 27, 1990, http://www.nytimes.com/1990/02/27/science/1-in-a-trillion-coincidence-you-say-not-really-experts-find.html?pagewanted=all&src=pm)
- David H. Kaye, Beyond Uniqueness: The Birthday Paradox, Source Attribution, and Individualization in Forensic Science Testimony, 12 Law, Probability & Risk 3 (2013)
- David H. Kaye, On the Hypothetical Population-wide Database, Forensic Science, Statistics & the Law, July 29, 2013, http://for-sci-law-now.blogspot.com/2013/07/on-hypothetical-population-wide-dna.html
- -----, Good Point, Bad Math: DNA Database Statistics Misunderstood (Again), July 26, 2013, http://for-sci-law-now.blogspot.com/2013/07/good-point-bad-math-dna-database.html