Monday, July 29, 2013

On the Hypothetical Population-wide DNA Database

The August ABA Journal landed in my mailbox. Usually, I ask reporters to check with me on the wording before quoting. Alas, I neglected to do so some weeks ago, when Mark Walsh asked me about DNA databases in the aftermath of the Supreme Court's opinion in Maryland v. King. Mr. Walsh's article quotes me as follows: "Is the point of arrest the sensible place to draw the line? I can imagine a system in which you take a sample from everyone. Newborns already have a heel prick taken for certain genetic testing. At the same time you could take a DNA sample. Not that you expect a newborn to commit a crime, but 20 years later the sample is there in the database."

Oops! I said that? I meant to say this: "Is the point of arrest the sensible place to draw the line? I can imagine a system in which you take a sample from everyone. Newborns already have a heel prick taken for certain genetic testing. Along with these genetic tests, you could obtain a DNA profile. Not that you expect a newborn to commit a crime, but 20 years later the profile is there in the database." The critical difference: "profile," not "sample."

Here is the way that Michael Smith, Ed Imwinkelried and I explained the idea (which was stimulated by remarks of Phil Reilly to the legal issues working group of the National Commission on the Future of DNA Evidence) in another ABA publication back in 2001:
Creating a national identification database all at once would be prohibitively expensive today, even if we had the laboratory capacity to do it. But DNA typing technology is advancing at a pace reminiscent of the exponential growth in computer microprocessing power that has made the “personal computer” a fixture on every desk. Soon it will be feasible to create a DNA identification record for everyone, at least prospectively. For example, it would be easy to extract identification profiles as an adjunct to the existing public health programs that for many years have screened DNA samples from almost all newborns, to identify infants with treatable genetic diseases. The identification profiles could be transmitted to a single, secure, national database. The genetic locations (“loci” is the technical term) used for those identification profiles would be strictly limited to sequences that have no implications for health or other significant physical or mental traits. Furthermore, access to the database would be limited to law enforcement personnel investigating specific crimes in which DNA trace evidence already has been found. Law enforcement agencies would not need—and should not be permitted—to handle, much less retain, the samples.

... Not only would a comprehensive database be valuable to the criminal justice system, but it also would be useful in identifying remains after natural disasters, mass accidents, and terrorist attacks. Such a database is, we  believe, socially advantageous. But we would be the first to acknowledge that this belief is surely debatable, and a panoply of questions must be considered.
In the database system we envision, the information that the government is allowed to have is very different from the types that have sparked debates over medical privacy. The preponderance of the human genome consists of sequences that have no medical importance or social significance. Much of the genome is “noncoding” — these sequences are not translated into the proteins that are the machinery of cells — and most of them are not genetically “linked” to any coding sequences. Even in the coding regions, many DNA sequences merely code for traits such as gross features of fingerprints or the pattern of hair follicles in the skin that have no stigmatizing potential. Consistent with current practice, we would limit the database loci to such regions. Consequently, the genetic information included in the database would be no more invasive of privacy than an image of the ridges and whorls in a fingerprint or of the blood vessels in the retina of the eye.
[T]he system we envision keeps almost all samples out of the hands of law enforcement officials. Recall that the initial typing would be done by health workers, not police, as part of neonatal screening. No samples would be sent to law enforcement agencies — they would receive only the biometric genotypes that have no use except for identification. To the extent that additional sampling, say, of immigrants or citizens born abroad, would be necessary to cover as much of the population as possible, the sample could be destroyed as soon as the typing is complete. In fact, an instrument could be built that would extract an identifying profile and destroy the sample at the same time. Proper procedures for sampling the DNA, extracting the identifying profile, and immediately destroying the sample would protect everyone’s genetic privacy — to the extent we have any when private hospitals and HMOs keep samples of our blood and other tissue together with information far more sensitive than the random bits of DNA that identify us. The government officials maintaining the database could neither invade privacy nor enable insurers or employers to do so.
Looking back at these words 12 years later, I would change some of them as well. That the identification loci are not protein-coding is not sufficient to prove that they have no clinical predictive or diagnostic value. Moreover, the loci certainly are more informative than fingerprints or retinal patterns with respect to ascertaining parentage or siblingship. A population-wide database of profiles with the current CODIS loci would make it possible for the government to do "legitimacy testing" -- that is, to check families for children born out of wedlock. Other privacy questions would need to be addressed as well. As we wrote in 2001, "Our vision is futuristic ... incomplete and tentative. Many details remain to be worked out."


  • David H. Kaye, Michael E. Smith, and Edward J. Imwinkelried, Is a DNA Identification Database in Your Future?, Criminal Justice, Fall 2001, at 5-9, 19
  • Mark Walsh, "21st Century Fingerprinting," ABAJ, Aug. 2013, at 16-17


  1. The mathematics of coincidental matches, particularly with respect to the size of the database, is a thorny subject. However, the odds of a coincidental match seem to rise (the so-called birthday problem). Given the strength of the tunnel vision that sets in when DNA is involved (the Lukis Anderson/Raveesh Kumra case is a recent example), I would think long and hard about the wisdom of having an all-inclusive database.

    1. Thorny, but not intractable. See the next posting of July 30.