Sunday, June 23, 2019

The Miami Dade Bullet-matching Study Surfaces in United States v. Romero-Lobato

Last month, the US District Court for the District of Nevada rejected another challenge to firearms toolmark comparisons. The opinion in United States v. Romero-Lobato, 1/ written by Judge Larry R. Hicks, relies in part on a six-year-old study that has yet to appear in any scientific journal. 2/ The National Institute of Justice (the research-and-development arm of the Department of Justice) funded the Miami-Dade Police Department Crime Laboratory "to evaluate the repeatability and uniqueness of striations imparted by consecutively manufactured EBIS barrels with the same EBIS pattern to spent bullets as well as to determine the error rate for the identification of same gun evidence." 3/ Judge Hicks describes the 2013 study as follows:
The Miami-Dade Study was conducted in direct response to the NAS Report and was designed as a blind study to test the potential error rate for matching fired bullets to specific guns. It examined ten consecutively manufactured barrels from the same manufacturer (Glock) and bullets fired from them to determine if firearm examiners (165 in total) could accurately match the bullets to the barrel. 150 blind test examination kits were sent to forensics laboratories across the United States. The Miami-Dade Study found a potential error rate of less than 1.2% and an error rate by the participants of approximately 0.007%. The Study concluded that “a trained firearm and tool mark examiner with two years of training, regardless of experience, will correctly identify same gun evidence.”
The "NAS Report" was the work of a large committee of scientists, forensic-science practitioners, lawyers, and others assembled by the National Academy of Sciences to recommend improvements in forensic science. A federal judge and a biostatistician co-chaired the committee. In 2009, four years after Congress funded the project, the report arrived. It emphasized the need to measure the error probabilities in pattern-matching tasks and discussed what statisticians call two-by-two contingency tables for estimating the sensitivity (true-positive probability) and specificity (true-negative probability) of the classifications. However, the Miami-Dade study was not designed to measure these quantities. To understand what it did measure, let's look at some of the details in the report to NIJ as well as what the court gleaned from the report (directly or indirectly).

A Blind Study?

The study was not blind in the sense of the subjects not realizing that they were being tested. They surely knew that they were not performing normal casework when they received the unusual samples and the special questionnaire with the heading "Answer Sheet: Consecutively Rifled EBIS-2 Test Set" asking such questions as "Is your Laboratory ASCLD/Lab Accredited?" That is not a fatal flaw, but it has some bearing -- not recognized in the report's sections on "external validity" -- on generalizing from the experimental findings to case work.  4/

Volunteer Subjects?

The "150 blind examination kits" somehow went to 201 examiners, not just in the United States, but also in "4 international countries." 5/ The researchers did not consider or reveal the performance of 36 "participants [who] did not meet the two year training requirement for this study." (P. 26). How well they did in comparison to their more experienced colleagues would have been worth knowng, although it would have been hard to draw a clear concolusions since there so few errors on the test. In any event, ignoring the responses from the trainees "resulted in a data-producing sample of 165 participants." (P. 26).

These research subjects came from emails sent to "the membership list for the Association of Firearm and Tool Mark Examiners (AFTE)." (Pp. 15-16). AFTE members all "derive[] a substantial portion of [their] livelihood from the examination, identification, and evaluation of firearms and related materials and/or tool marks." (P. 15). Only 35 of the 165 volunteers were certified by AFTE (p. 30), and 20 worked at unaccredited laboratories (P. 31).

What Error Rates?

Nine of the 165 fully trained subjects (5%) made errors (treating "inconclusive" as a correct response). The usual error rates (false positives and false negatives) are not reported because of the design of the "blind examination kits." The obvious way to obtain those error rates is to ask each subject to evaluate pairs of items -- some from the same source and some from different sources (with the examiners blinded to the true source information known to the researchers). Despite the desire to respond to the NAS report, the Miami Dade Police Department Laboratory did not make "kits" consisting of such a mixture of pairs of same-source and different-source bullets.

Instead, the researchers gave each subject a single collection of ten bullets produced by firing one manufacturer's ammunition in eight of the ten barrels. (Two of these "questioned bullets," as I will call them, came from barrel 3 and two from barrel 9; none came from barrel 4.) Along with the ten questioned bullets, they gave the subjects eight pairs of what we can call "exemplar bullets." Each pair of exemplar bullets came from two test fires of the same eight of the ten consecutively manufactured barrels (barrels 1-3 and 5-9). The task was to associate each questioned bullet with an exemplar pair or to decide that it could not be associated with any of the eight pairs. Or, the research subjects could circle "inconclusive" on the questionnaire. Notice that almost all the questioned bullets came from the barrels that produced the exemplar bullets -- only two such barrels were not a source of an unknown -- and bullets from only one barrel that produced a questioned bullet was not in the exemplar set.

This complicated and unbalanced design raises several questions. After associating an unknown bullet with an exemplar pair, will an examiner seriously consider the other exemplar pairs? After eliminating a questioned bullet as originating from, say seven exemplar-pair barrels, would he be inclined to pick one of the remaining three? Because of the extreme overlap in the sets, on average, such strategies would pay off. Such interactions could make false eliminations less probable, and true associations more probable, than with the simpler design of a series of single questioned-to-source comparisons.

The report to NIJ does not indicate that the subjects received any instructions to prevent them from having an expectation that most of the questioned bullets would match some pair of exemplar bullets. The only instructions it mentions are on a questionnaire that reads:
Please microscopically compare the known test shots from each of the 8 barrels with the 10 questioned bullets submitted. Indicate your conclusion(s) by circling the appropriate known test fired set number designator on the same line as the alpha unknown bullet. You also have the option of Inconclusive and Elimination. ...
Yet, the report confidently asserts that "[t]he researchers utilized an 'open set' design where the participants had no expectation that all unknown tool marks should match one or more of the unknowns." (P. 28).

To be sure, the study has some value in demonstrating that the subset of the subjects could perform a presumably difficult task in associating unknown bullets with exemplar ones. Moreover, whatever one thinks of this alleged proof of "uniqueness," the results imply that there are microscopic (or other) features of marks on bullets that vary with the barrel through which they traveled. But the study does not supply a good measure of examiner skill at making associations in fully "open" situations.

A 0.007% Error Rate?

As noted above, but not in the court's opinion, 5% of the examiners made some kind of error. That said, there were only 12 false-positive associations or false-negative ones (outright eliminations) out of 165 x 10 = 1,650 answers. (I am assuming that every subject completed the questionnaire for every unknown bullet.) That is an overall error proportion of 12/1650 = 0.007 = 0.7%.

The researchers computed the error rate slightly differently. They only reported the average error rate for the 165 experienced examiners. The vast majority (156) made no errors. Six made 1 error, and 3 made 2. So the average examiner's proportion of errors was [156(0) + 6(0.1) + 3(0.2)]/165 = 0.007. No difference at all.

This 0.007 figure is 100 times the number the court gave. Perhaps the opinion had a typographical error -- an adscititious  percentage sign that the court missed when it reissued its opinion (to correct other typographical errors). The error rate is still small and would not affect the court's reasoning.

But the overall proportion of errors and the average-examiner error rate could diverge. The report gives the error proportions for the 9 examiners who made errors as 0.1 (6 of the examiners) and 0.2 (another 3 examiners). Apparently, all of the 9 erroneous examiners evaluated all 10 unknowns. What about the other 156 examiners? Did all of them evaluate all 10? The worst-case scenario is that every one of the 156 error-free examiners answered only one question. That supplies only 156 correct answers. Add this number to the 12 incorrect answers, and we have an error proportion of 12/168 = 0.7 = 7% -- another 100 times larger than the court's number.

However, this worst-case scenario did not occur. The funding report states that "[t]here were 1,496 correct answers, 12 incorrect answers and 142 inconclusive answers." (P. 15). The sum of these numbers of answers is 1,650. Did every examiner answer every question? Apparently so. For this 100% completion rate, the report's emphasis on the examiner average (which is never larger and often smaller than the overall error proportion) is a distinction without a difference.

There is a further issue with the number itself. "Inconclusives" are not correct associations. If every examiner came back with "inconclusive" for every questioned bullet, the researchers hardly could report the zero error rate as validating bullet-matching. 7/ From the trial court's viewpoint, inconclusives just do not count. They do not produce testimony of false associations or of false eliminations. The sensible thing to do, in ascertaining error rates for Daubert purposes, is to toss out all "inconclusives."

Doing so here makes little difference. There were 142 inconclusive answers. (P. 15). If these were merely "not used to calculate the overall average error rates," as the report claims (p. 32), the overall error proportion was 12/(1605 - 142) = 12/1508 = 0.008 -- still very small (but still difficult to interpret in terms of the parameters of accuracy for two-by-two tables).

The report to NIJ discussed another finding that, at first blush, could be relevant to the evidence in this case: "Three of these 35 AFTE certified participants reported a total of four errors, resulting in an error rate of 0.011 for AFTE Certified participants." (P. 30). Counter-intuitively, this 1% average is larger than the reported average error rate of 0.007 for all the examiners.

That the certified examiners did worse than the uncertified ones may be a fluke. The standard error in the estimate of the average-examiner error rate was 0.32 (p. 29), which indicates that, despite the observed difference in the sample data, the study does not reveal whether certified examiners generally do better or worse than uncertified ones. 7/

A Potential Error Rate?

Finally, the court's reference to "a potential error rate of less than 1.2%" deserves mention. The "potential error rate" is tricky. Potentially, the error rate of individual practitioners like the ones who volunteered for the study, with no verification step by another examiner, could be larger (or smaller). There is no sharp and certain line that can be drawn for the maximum possible error rate. (Except that it could not be 100%.)

In this case, 1.2% is the upper limit of a two-sided confidence interval. The Miami Dade authors wrote that:
A 95% confidence interval for the average error rate, based on the large sample distribution of the sample average error rate, is between 0.002 and 0.012. Using a confidence interval of 95%, the error rate is no more than 0.012, or 1.2%.
A 95% confidence interval means that if there had been a large number of volunteer studies just like this one, making random draws from an unchanging population of volunteer-examiners and having these examiners perform the same task in the same way, about 95% of the many resulting confidence intervals would encompass the true value for the entire population. But the hypothetical confidence intervals would vary from one experiment to the next. We have a statistical process -- a sort of butterfly net -- that is broad enough to capture the unknown butterfly in about 95% of our swipes. The weird thing is that with each swipe, the size and center of the net change. On the Miami Dade swipe, one end of the net stretched out to the average error rate of 1.2%.

So the court was literally correct. There is "a potential error rate" of 1.2%. There is also a higher potential error rate that could be formulated -- just ask for 99% "confidence." Or lower -- try 90% confidence. And for every confidence interval that could be constructed by varying the confidence coefficient, there is the potential for the average error rate to exceed the upper limit. Such is the nature of a random variable. Randomness does not make the upper end of the estimate implausible. It just means that it is not "the potential error rate," but rather a clue to how large the actual rate of error for repeated experiments could be.

Contrary to the suggestion in Romero-Lobato, that statistic is not the "potential rate of error" mentioned in the Supreme Court's opinion in Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993). The opinion advises judges to "ordinarily ... consider the known or potential rate of error, see, e.g., United States v. Smith, 869 F. 2d 348, 353-354 (CA7 1989) (surveying studies of the error rate of spectrographic voice identification technique)." The idea is that along with the validity of an underlying theory, how well "a particular scientific technique" works in practice affects the admissibility of evidence generated with that technique. When the technique consists of comparing things like voice spectrograms, the accuracy with which the process yields correct results in experiments like the ones noted in Smith are known error rates. That is, they are known for the sample of comparisons in the experiment. (The value for all possible examiners' comparisons is never known.)

These experimentally determined error rates are also a "potential rate of error" for the technique as practiced in case work. The sentence in Daubert that speaks to "rate of error" continues by adding, as part of the error-rate issue, "the existence and maintenance of standards controlling the technique's operation, see United States v. Williams, 583 F. 2d 1194, 1198 (CA2 1978) (noting professional organization's standard governing spectrographic analysis)." The experimental testing of the technique shows that it can work -- potentially; controlling standards ensure that it will be applied consistently and appropriately to achieve this known potential. Thus, Daubert's reference to "potential" rates does not translate into a command to regard the upper confidence limit (which merely accounts for sampling error in the experiment) as a potential error rate for practical use.

NOTES
  1. No. 3:18-cr-00049-LRH-CBC, 2019 WL 2150938 (D. Nev. May 16, 2019).
  2. That is my impression anyway. The court cites the study as Thomas G. Fadul, Jr., et al., An Empirical Study to Improve the Scientific Foundation of Forensic Firearm and Tool Mark Identification Utilizing Consecutively Manufactured Glock EBIS Barrels with the Same EBIS Pattern (2013), available at https://www.ncjrs.gov/pdffiles1/nij/grants/244232.pdf. The references in Ronald Nichols, Firearm and Toolmark Identification: The Scientific Reliability of the Forensic Science Discipline 133 (2018) (London: Academic Press), also do not indicate a subsequent publication.
  3. P. 3. The first of the two "research hypotheses" was that "[t]rained firearm and tool mark examiners will be able to correctly identify unknown bullets to the firearms that fired them when examining bullets fired through consecutively manufactured barrels with the same EBIS pattern utilizing individual, unique and repeatable striations." (P. 13). The phrase "individual, unique and repeatable striations" begs a question or two.
  4. The researchers were comforted by the thought that "[t]he external validity strength of this research project was that all testing was conducted in a crime laboratory setting." (P. 25). As secondary sources of external validity, they noted that "[p]articipants utilized a comparison microscope," "[t]he participants were trained firearm and tool mark examiners," "[t]he training and experience of the participants strengthened the external validity," and "[t]he number of participants exceeded the minimum sample size needed to be statistically significant." Id. Of course, it is not the "sample size" that is statistically significant, but only a statistic that summarizes an aspect of the data (other than the number of observations).
  5. P. 26 ("A total of 201 examiners representing 125 crime laboratories in 41 states, the District of Columbia, and 4 international countries completed the Consecutively Rifled EBIS-2 Test Set questionnaire/answer sheet.").
  6. Indeed, some observers might argue that an "inconclusive" when there is ample information to reach a conclusion is just wrong. In this context, however, that argument is not persuasive. Certainly, "inconclusives" can be missed opportunities that should be of concern to criminalists, but they are not outright false positives or false negatives.
  7. The opinion does not state whether the examiner in the case -- "Steven Johnson, a supervising criminalist in the Forensic Science Division of the Washoe County Sheriff's Office" -- is certified or not, but it holds that he is "competent to testify" as an expert.

Tuesday, June 11, 2019

Junk DNA (Literally) in Virginia

The Washington Post reported yesterday on a motion in Alexandria Circuit Court to suppress "all evidence flowing from the warrantless search of [Jesse Bjerke's] genetic profile." 1/ Mr. Bjerke is accused of raping a 24-year-old lifeguard at gunpoint at her home after following her from the Alexandria, Va., pool where she worked. She "could describe her attacker only as a thin man she believed was 35 to 40 years old and a little over 6 feet tall." 2/ Swabs taken by a nurse contained sperm from which the Virginia Department of Forensic Sciences obtained a standard STR profile.

Apparently, the STR profile was in neither the Virginia DNA database not the national one (NDIS). So the police turned to the Virginia bioinformatics company, Parabon Labs, which has had success with genetic genealogy searches of the publicly available genealogy database, GEDmatch. Parabaon reported that
[T]he subject DNA file shares DNA with cousins related to both sides of Jesse's family tree, and the ancestral origins of the subject are equivalent to those of Jesse. These genetic connections are very compelling evidence that the subject is Jesse. The fact that Jesse was residing in Alexandria, VA at the time of the crime in 2016 fits the eyewitness description and his traits are consistent with phenotype predictions, further strengthens the confidence of this conclusion.
Recognizing the inherent limitations in genetic genealogy, Parabon added that
Unfortunately, it is always possible that the subject is another male that is not identifiable through vital records or other research means and is potentially unknown to his biological family. This could be the result if an out-of-wedlock birth, a misattributed paternity, an adoption, or an anonymous abandonment.
The motion suggests that the latter paragraph, together with the firm's boiler-plate disclaimer of warranties and the fact that the report contains hearsay, means that police lacked even probable cause to believe that the sperm came from the defendant. This view of the information that the police received is implausible, but regardless of whether "the facts contained in the Parabon report do not support probable cause," 3/ the police did not use the information either to arrest Mr. Bjerke immediately or to seek a warrant to compel him to submit to DNA sampling. Instead,
Police began following Bjerke at his home and the hospital where he worked as a nurse. They took beer bottles, soda cans and an apple core from his trash. They tracked him to a Spanish restaurant ... and, after he left, bagged the straws he had used.

The DNA could not be eliminated as a match for the sperm from the rape scene, a forensic analysis found, leading to Bjerke’s indictment and arrest in February. With [a] warrant, law enforcement again compared his DNA with the semen at the crime scene. The result: a one in 7.2 billion chance it was not his. 4/
A more precise description of the "one in 7.2 billion chance" is that if Mr. Bjerke is not the source, then an arbitrarily selected unrelated man would have that tiny a chance of having the STR profile. The probability of the STR match given the hypothesis that another man is the source is not necessarily the same as the probability of the source given the match. But for a prior probability reflecting the other evidence so far revealed about Mr. Bjerke, there would not be much difference between the conditional probability the laboratory supplied and the article's transposed one.

Faced with such compelling evidence, Mr. Bjerke wants it excluded at trial. The motion states that
For the purposes of this motion, there are three categories of DNA testing. (1) DNA testing conducted before Jesse Bjerke was a suspect in the case; (2) DNA testing conducted without a warrant after Jesse Bjerke became a suspect in the case; and (3) DNA testing conducted with a warrant after Jesse Bjerke's arrest. This motion seeks to suppress all DNA evidence in categories two and three that relate to Jesse Bjerke.
An obstacle is the many cases -- not mentioned in the motion -- holding that shed or "abandoned" DNA is subject to warrantless collection and analysis for identifying features on the theory that the procedure is not a "search" under the Fourth Amendment. The laboratory analysis is not an invasion of Mr. Bjerke's reasonable expectation of privacy -- at least, not if we focus solely on categories (2) and (3), as the motion urges. This standard STR typing was done after the genetic genealogy investigation was completed. The STR profile (which the motion calls a "genetic profile" even though it does not characterize any genes) provides limited information about an individual. For that reason, the conclusion of the majority of courts that testing shed DNA is not a search is supportable, though not ineluctable. ("Limited" does not mean "zero.")

Indeed, most laboratory tests on or for traces from crimes are not treated as searches covered by the warrant and probable cause protections. Is it a search to have the forensic lab analyze a fingerprint from a glass left at a restaurant? Suppose a defendant tosses a coat in a garbage bin on the street, and the police retrieve it, remove glass particles, and analyze the chemical composition to see they match the glass from a broken window in a burglary? Did they need a warrant to study the glass particles?

The underlying issue is how much the constitution constrains the police in using trace evidence that might associate a known suspect with a crime scene or victim. When the analysis reveals little or nothing more than the fact of the association, I do not see much of an argument for requiring a warrant. That said, there is a little additional information in the usual STR profile, so there is some room for debate here.

However, this case might be even more debatable (although the defense motion does not seem to recognize it) because of category (1) -- the genetic genealogy phase of the case. The police, or rather the firm they hired to derive a genome-wide scan for the genetic genealogy, have much more information about Mr. Bjerke at their disposal. They have on the order of a million SNPs. In theory, Parabon or the police could inspect the SNP data for medical or other sensitive information on Mr. Bjerke now that he has been identified as the probable source of those sperm.

Nevertheless, I do not know why the police or the lab would want to do this, and it has always been true that once a physical DNA sample is in the possession of the police, the possibility exists for medical genetic testing using completely different loci. Testing shed DNA in that way should be considered a search. Bjerke is a step in that direction, but are we there yet?

The Post's online story has 21 comments on it. Not one supported the idea that there was a significant invasion of privacy in the investigation. These comments are a decidedly small sample that does not represent any clear population, but the complete lack of support for the argument that genetic genealogy implicates important personal privacy was striking.

NOTES
  1. Defendant's Motion to Suppress, Commonwealth v. Bjerke, No. CF19000031 (Cir. Ct., Alexandria, Va. May 20, 2019).
  2. Rachel Weiner, Alexandria Rape Suspect Challenging DNA Search Used to Crack Case, Wash, Post, June 10, 2019, at 1:16 PM.
  3. Defendant's Motion, supra note 1.
  4. Weiner, supra note 2.
RELATED POSTING
ACKNOWLEDGMENT
  • Thanks to Rachel Weiner for alerting me to the case and providing a copy of the defendant's motion.

Friday, June 7, 2019

Aleatory and Epistemic Uncertainty

An article in the Royal Society's Open Science journal on "communicating uncertainty about facts, numbers and science" is noteworthy for the sheer breadth of the fields it surveys and its effort to devise a taxonomy of uncertainty for the purpose of communicating its nature or degree. The article distinguishes between "aleatory" and "epistemic" uncertainty:

[A] large literature has focused on what is frequently termed 'aleatory uncertainty' due to the fundamental indeterminacy or randomness in the world, often couched in terms of luck or chance. This generally relates to future events, which we can't know for certain. This form of uncertainty is an essential part of the assessment, communication and management of both quantifiable and unquantifiable future risks, and prominent examples include uncertain economic forecasts, climate change models and actuarial survival curves.

By contrast, our focus in this paper is uncertainties about facts, numbers and science due to limited knowledge or ignorance—so-called epistemic uncertainty. Epistemic uncertainty generally, but not always, concerns past or present phenomena that we currently don't know but could, at least in theory, know or establish.

The distinction is of interest to philosophers, psychologists, economists, and statisticians. But it is a little hard to pin down with the definition in the article. Aleatory uncertainty applies on the quantum mechanical level, but is it true that "in theory" predictions like weather and life span cannot be certain? Chaos theory shows that the lack of perfect knowledge about initial conditions of nonlinear systems makes long-term predictions very uncertain, but is it theoretically impossible to have perfect knowledge? The card drawn from a well-shuffled deck is a matter of luck, but if we knew enough about the shuffle, couldn't we know the card that is drawn? Thus, I am not so sure that the distinction is between (1) "fundamental ... randomness in the world" and (2) ignorance that could be remedied "in theory."

Could the distinction be between (1) instances of a phenomenon that has variable outcomes at the level of our existing knowledge of the world and (2) a single instance of a phenomenon that we do not regard as the outcome of a random process or that already has occurred, so that the randomness is gone? The next outcome of rolling a die (an alea in Latin) is always uncertain (unless I change the experimental setup to precisely fix the conditions of the roll), 1/ but whether the last roll produced a 1 is only uncertain to the extent that I cannot trust my vision or memory. I could reduce the latter, epistemic uncertainty by improving my system of making observations. For example, I could have several keen and truthful observers watch the toss, or I could film it and study the recording thoroughly. From this perspective, the frequency and propensity conceptions of probability concern aleatory uncertainty, and the subjective and logical conceptions traffic in both aleatory and epistemic uncertainty.

When it comes to the courtroom, epistemic uncertainty is usually in the forefront, and I may get to that example at a later date. For now, I'll just note that, regardless of whether the distinction offered above between aleatory and epistemic uncertainty is philosophically rigorous, people's attitudes toward aleatory and epistemic risk defined in this way do seem to be somewhat different. 2/

NOTES
  1. Cf. P. Diaconis, S. Holmes & R. Montgomery, Dynamical Bias in the Coin Toss, 49(2) SIAM Rev. 211-235 (2007), http://epubs.siam.org/doi/abs/10.1137/S0036144504446436?journalCode=siread
  2. Gülden Ülkümen, Craig R.  Fox & B. F. Malle, Two Dimensions of Subjective Uncertainty: Clues from Natural Language, 145(10) Journal of Experimental Psychology: General, 1280-1297. http://dx.doi.org/10.1037/xge0000202; Craig R. Fox & Gülden Ülkümen, Distinguishing Two Dimensions of Uncertainty, in Perspectives on Thinking, Judging, and Decision Making (Brun, W., Keren, G., Kirkebøen, G., & Montgomery, H.  eds. 2011).

Saturday, June 1, 2019

Frye-Daubert Flip Flops in Florida

For years, the Florida Supreme Court rebuffed suggestions that it adopt the standard for scientific evidence that the U.S. Supreme Court articulated for the federal judiciary in Daubert v. Merrell Dow Pharmaceuticals, 1/ Instead, it "repeatedly reaffirmed [its] adherence to the Frye standard for admissibility of evidence." 2/

In 2013, the Florida legislature passed a statute to replace Frye with the "reliability" wording of Federal Rule of Evidence 702 -- wording intended to codify Daubert and its progeny. Some Florida courts concluded that this brought Florida into the ranks of jurisdictions that use the Daubert standard of "evidentiary reliability" based on "scientific validity." 3/ However, the Florida Supreme Court has held that only it can implement "procedural" changes to the Florida Rules of Evidence (FLRE). 4/ The legislature has the power to promulgate "substantive" rules of evidence, but it may not force "procedural" ones down the judiciary's throat.

So the Florida Bar's Code and Rules of Evidence Committee reviewed the law. By a narrow margin, it recommended leaving Frye in place. The Florida Supreme Court agreed. It declined to adopt the Daubert amendment "due to the constitutional concerns raised" by certain Committee members and commenters. 5/ "Those concerns," the court explained, "include undermining the right to a jury trial and denying access to the courts." 6/ The next year, in DeLisle v. Crane Co., 7/ the court confirmed that the legislative switch to Daubert was purely procedural. Because the court did not bless it, the law was constitutionally ineffective.

Then last month, the court flip-flopped. It adopted the "Daubert amendments" under its "exclusive rule-making authority." 8/ Although the amendment to Rule 702 did not percolate through the rules committee a second time, the court decided that its earlier reservations about switching to Daubert "appear unfounded." 9/

Indeed, the arguments that the court considered "grave" two years ago are anything but. The two standards -- general scientific acceptance (Frye) and evidentiary reliability encompassing scientific validity (Daubert) -- each seek to screen out expert evidence that is insufficiently validated to warrant its use in court in light of the danger that it will be given too much weight. One standard (Daubert) asks judges to assess directly the validity of scientific theories. The other (Frye) has them do so indirectly, by looking only for a consensus in the scientific community. This difference in the mode of analysis does not make one approach constitutional and the other unconstitutional. Daubert does not create an inherently more demanding test than Frye. 10/ It describes more criteria for answering the same underlying question -- is the proposed evidence probative enough to come in as "science" (or some other form of expertise).

Certainly, there is room to debate the relative merits of the two approaches -- and room for different jurisdictions to go their own ways -- but the choice between Daubert and Frye (or other reasonable standards) does not pose a serious constitutional question.

NOTES
  1. 509 U.S. 579 (1993).
  2. Marsh v. Valyou, 977 So.2d 543, 547 (Fla. 2007) (holding that the "general acceptance" standard fashioned in Frye v. United States, 293 F. 1013 (D.C.Cir.1923), and expressly adopted in Florida in Bundy v. State, 471 So.2d 9, 18 (Fla.1985), and Stokes v. State, 548 So.2d 188, 195 (Fla.1989), does not even apply to "pure opinion" testimony "causally linking trauma to fibromyalgia ... based on the experts' experience and training").
  3. Perez v. Bell So. Telecommunications, Inc., 138 So.3d 492, 497 (Fla. Dist. Ct. App. 2014). The phrases "evidentiary reliability" and "scientific validity" appear in the Daubert opinion.
  4. DeLisle v. Crane Co., 258 So.3d 1219 (Fla. 2018).
  5. In re Amendments to Florida Evidence Code, 210 So.3d 1231, 1239 (Fla. 2017).
  6. Id.
  7. 258 So.3d 1219 (Fla. 2018).
  8. In re Amendments to the Florida Evidence Code, No. SC19-107, 2019 WL 2219714 (Fla. May 23, 2019). Thanks are due to Ed Imwinkelried for calling the case to my attention.
  9. Id.
  10. The Florida Supreme Court had previously written that Frye imposed a "higher standard of reliability" than the "more lenient standard" in Daubert. Brim v. State, 695 So.2d 268, 271–72 (Fla. 1997). It is tempting to ask how Daubert's "more lenient" reliability requirement could be unconstitutional when Frye's more exacting standard is constitutionally sound. I suppose one could argue that because Frye (as construed in Florida)  does not bar "pure opinion" testimony that has not been shown to be scientifically reliable, it has less of an impact on "access to the courts." However, as discussed in The New Wigmore on Evidence: Expert Evidence (2d ed. 2011),  the "pure opinion" exception to either Frye or Daubert is untenable.