Forensic Science, Statistics & the Law: 2015

Wednesday, December 30, 2015

Higher math in a Kansas case

The diagram of a car crash is drawn at a scale of 1 inch to 20 feet. The distance between two points on the diagram is 3 and 3/16 inches. How far apart are two locations shown in the diagram?

You would think that an expert in the field of "accident reconstruction" could answer this question correctly with a pencil and paper or a calculator (if not in his head). But today's online New York Times hosts a re-enactment of the deposition testimony of an expert accident reconstructionist who refused to try without his "formula sheets" and computer.

Here is a small part of the transcript:

A. Three and three-sixteenths inches.
Q. And that is, when you convert that from the scale, what does that convert to?
A. Sixty-eight feet, approximately, sir.
Q. What are the numbers?
A. Three and three-sixteenths.
Q. OK, well here, run it out for me (handing the witness a pocket calculator).
A. Run it out?
Q. Yeah, calculate it for me.
A. (Working on calculator) And again, I'd do this on the computer.
Q. You can't do it, can you?
A. Not without my formulas in front of me, no sir. I can't do it from my head.
Q. You're not able to do a simple scaling problem with a calculator?
A. I don't wish to. I don't wish to make any mistakes. I use instrumentation that does it exact [sic].
Q. You can't show us, based on the numbers you just gave me, that will spit out the 68-foot distance, can you?
A. Not here today I can't, no.

This colloquy suggests an extra-credit problem: Multiply 3 and 3/16 by 20. Do you obtain 68?

Film-maker and comic writer Brett Weiner dramatized this and more of the transcript without changing a word to achieve this surreal video, Verbatim: Expert Witness. Last year, a similar film, Verbatim: What Is a Photocopier?, won the audience award for best short film at the 2014 Dallas Film Festival. There, an IT guy in Ohio struggles with yet another deeply technical issue -- the meaning of the term "photocopier."

Thursday, December 24, 2015

Flaky Academic Conferences

Paralleling the proliferation of journals of ill repute is the globalization of the marketing of academic conferences. Information on and tidbits from sellers whose incessant spam has reached me is at the blog Flaky Academic Conferences. Links to just some of these spammers follow.

⊘ BIT
BIT Life Sciences, aka BIT Congress and BIT Group Global is "Your Think Tank." It lists conference organizers, presenters, and session chairs without their knowledge or over their objections (see The Dalian Letters).
⊘ Conference123.net
Looks like another mushy mega-conference organizer for China travel.
⊘ Conference Series LLC, conferenceseries.com
a front for OMICS with (as of 8/25/16) "1000+ Global Events Every Year across USA, Europe & Asia" with support from 1000 more scientific societies and Publishes 700+ Open access journals which contains over 100000 eminent personalities, reputed scientists ... ."
⊘ DEStech Publications
runs "a leading conference for all researchers from different countries and territories to present their research results about human society and spiritual cultures of human annually"
⊘ Engineering Information Institute
Hardly limited to engineering, this group's "mission is to meet the satisfactions of our authors involved in all kinds of comprehensive conferences. ... We look forward to benefiting and establishing harmonious relationship with everybody."
⊘ Eureka Science
Nobel Laureate Ferid Murad promises conferences that "should provide eminent scientists the opportunity to present their cutting edge researches" at "important," "exciting," and, of course, "scientific events."
⊘ Global Science and Technology Forum
a group from Singapore that has been accused of "conference hijacking" and is on Beall's List as "an exploitative publisher that ... everyone should avoid."
⊘ Institute of Research Engineers and Doctors (IRED)
"IRED welcomes all the Doctors, Scientist, Engineers Professionals, Researchers, Scholars and Medical and Health, Technical Engineering Colleges and Universities to join us to exchange information and ideas; in according with our objective to facilitate this, we call upon to network with us."
⊘ International Scientific Events
Come to Hotel "Royal Castle" on the Bulgarian Black Sea Coast.
⊘ North Sea Conference and Journal
Believes forensic science reform is part of the Internet of things
⊘ OMICS
Well known as a "predatory publisher", OMICS is also in the conference business -- big time. Despite a California address, its roots are in India, and the FTC has charged it with deceptive practices.
⊘ Oxford Global Marketing, Ltd. (OGM)
With offices in Singapore and London, "We also offer bespoke event management to companies in the sector." It belts out email for the Annual Genetics in Forensics Congress.
⊘ Oxford Round Table
An American invention (starting in Kentucky) with a history that would impress a corporate reorganization lawyer.
⊘ Pace Institute of Technology and Sciences (PITS)
"Elsevier based conference is going to organizing in Andhra pradesh, India during 29^th to 30^th July 2016," but don't trust us about that ("disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited")
⊘ SASI Institute of Technology & Engineering (SASI)
All of engineering conferences from "[w]hat began as a small school in with a 9 students in a small village in West Godavari [that] has created a sensation in the field of education" ... The very name SASI instills confidence in the minds."
⊘ Scientific Federation (SF) is an "abode for researchers"

References

John D. Bowman, Predatory Publishing, Questionable Peer Review, and Fraudulent Conferences, Am J Pharm Educ. 2014 Dec 15; 78(10): 176, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4315198/
Melanie Newman, Round Table Invitees Confused by Oxford 'Link', Times Higher Education, Mar. 12, 2009, https://www.timeshighereducation.com/news/round-table-invitees-confused-by-oxford-link/405766.article
Dana Roth & Donna Wrublewski, Conferences - spammed and ??, http://libguides.caltech.edu/content.php?pid=134993

Wednesday, December 23, 2015

Flaky Academic Journals

Barely a week goes by when I am not asked to submit an article to or become an editor of the Journal of This or That. One such missive is from Maple Xiao of the Canadian Center of Science and Education. Less than a week after I posted a 38-year-old book review on the Social Science Research Network, Ms. Xiao pounced:

I have the honor to read your paper "Book Review, The Right and the Power: The Prosecution of Watergate", and really appreciate your contributions in this area. As the editorial assistant of Journal of Management and Sustainability, I write to invite you to submit manuscripts to our journal.

Apparently, "management and sustainability" embraces a wide swath of subjects. (The book review asked whether it was proper for a former prosecutor to disclose previously private information about the subjects of the criminal investigations, and it questioned the justifications given by former Watergate Special Prosecutor Leon Jaworski for some of his decisions.)

Although the problem of bogus academic journals is widely recognized, 1/ I began listing excerpts from some of the academic journal spam that was reaching me along with observations about the purveyors and their editorial boards. When the posting grew too long, I converted it into a blog of its own, named Flaky Academic Journals. Links to just some of the accumulated information on the spamming and dubious journals follow.

⊗ Academicians' Research Center (ARC)
email from ARC Journal of Forensic Science and ARC Journal of Nursing and Healthcare
⊗ Allied Academies
"confidential" email from Journal of Forensic Genetics and Medicine and Journal of Sinusitis and Migraine
⊗ American Association for Science and Technology (AASCIT)
⊗ American Institute of Science (AIS)
⊗ American Research Institute for Policy Development (ARID)
email from Journal of Law and Criminal Justice
⊗ Annex Publishers
email for the Journal of Forensic Science and Criminology, publisher of articles on the sacred geometry of fingerprints
⊗ Apex Journal International
email soliciting for 12 journals at once, including International Research on Medical Sciences, Journal of Education Research and Behavioral Sciences, and International Law and Policy Research Journal
⊗ Austrian Scientific Publication House (ASPH)
Where are the Austrians?
⊗ Bentham Science
email from Current Drug Abuse Reviews and Neuroscience and Biomedical Engineering
⊗ Bioaccent Group
email from BOAJ Urology and Nephrology
⊗ Biomed Central
email from Skeletal Muscle
⊗ Canadian Center of Science and Education (CCSE)
email from Journal of Management and Sustainability
⊗ Center for Promoting Ideas (CPI), USA
email from American International Journal of Social Science and International Journal of Humanities and Social Sciences
⊗ Centre of Excellence for Scientific and Research Journalism (COES&RJ)
email from Journal of Social Science with a fake address in Texas
⊗ Elyns Publishing Group
email from Journal of Forensic Medicine and Legal Affairs
⊗ Gavin Publishers
email from Journal of Forensic Studies
⊗ Herald Scholarly Open Access (HSOA)
email from Journal of Forensic, Legal & Investigative Sciences and a vision "to highlight quality exploration work to the biggest possible swarm over development points secured under the field of medicine."
⊗ Internal Medicine Review
"complete rubbish" from "a completely fake medical journal that falsely claims to be based in Washington, D.C."
⊗ Institute of Research in Engineering and Technology (IRET)
email from International Journal of Emerging Trends in Electrical and Electronics (IJETEE)
⊗ Insight Medical Publishing (iMedPub)
email from Journal of Medical Toxicology and Clinical Forensic Medicine
⊗ Jacobs Publishers
"bringing science, medicine, engineering and Pharmacy to the spearhead."
email from Jacobs Journal of Forensic Science.
⊗ JSciMed Central (JSM)
email from Annals of Forensic Science and Analysis
⊗ Juniper Publishers
email from the Journal of Forensic Science and Criminal Investigation
⊗ Knowledge Enterprises, Inc. (KEI Journals)
"A publisher to avoid"
"confidential" email from Medical Research Archives
⊗ Medwin Publishers
seeking "to intellectualize the global society by providing them with the advancements"
email from Vaccines & Vaccination Open Access and comments on the International Journal of Forensic Sciences
⊗ Mehta Press
No editorial boards, but "rigorously reviewed" with "Maximum review time 15 days"
⊗ Merit Journals
"This journal opts to bring panacea"
⊗ Net Journals
email from International Research Journal of Medicine and Medical Sciences (IRJMMS) and Biochemistry and Biotechnology Research (BBR)
⊗ OMICS International
email from Intellectual Property Rights: Open Access, Journal of Civil & Legal Sciences, and Global Journal of Nursing & Forensic Studies
"We look forward for a long lasting scientific relationship."
⊗ Open Access Library (OALib) Journal
⊗ Openventio Publishers
email from Anthropology - Open Journal
⊗ Peertechz Journals
email from Forensic Science and Technology and Archives of Sports Medicine and Physiotherapy
"themed Organization setted up with 40 Peer Reviewed Medical Journals"
⊗ Progressive Science Publications (PSCIPUB)
emails from four journals at once; "+5000 active participants"
⊗ Public Science Framework
"Continued Privilege: Publishing Papers with 50% Discount" and you can submit "the extended version" of your previously published paper
⊗ Remedy Publications
email from Clinics in Oncology
⊗ Research Institute for Progression of Knowledge (RIPK)
email from International Journal of Education and Social Science and International Journal of Humanities and Social Science Review
⊗ Science Publishing Group (SciencePG)
articles include "Modification of Einstein's E= mc² to E =1/22 mc²/" and "Mathematical Proof of the Law of Karma"
⊗ Scientific Research Association (SCIREA)
"takes an opportunity to serve the scientific community, students and researchers with undefiled research works."
⊗ Scientific Research Publishing Inc. (SCIRP)
Looks like a subject-based filing system that, for a fee, stores even randomly generated papers.
⊗ Scientifica (Hindawi)
Cairo-based Hindawi Publishing claims to have "more than 30,000 internationally-recognized Editors"
email from Journal of Nucleic Acids
⊗ Scinzer Scientific Journals
"Fast track paper publication (3-10 Days)" and "Papers from your country are welcome." Only "40 USD per paper." ⊗ Scitech Central
27 journals out to save the world being being "a quantum to research"
⊗ SciTechnol OMICS in disguise
email from Journal of Forensic Toxicology and Pharmacology
⊗ Time Journals
"It arises from a reaction to the severe restriction of knowledge distribution"
email from Time Journal of Biological Sciences
⊗ Trade Science Inc.
"wide spectrum of audience. SUBMIT MANUSCRIPTS NOW !!"
email from 18 journals

Note

Declan Butler, Investigating Journals: The Dark Side of Publishing, 495 Nature 433 (2013); John D. Bowman, Predatory Publishing, Questionable Peer Review, and Fraudulent Conferences, 78 Am. J. Pharm. Educ. 176 (2014), doi: 10.5688/ajpe7810176, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4315198/; Kevin Carey, Fake Academe, Looking Much Like the Real Thing, N.Y. Tim es, Dec. 30, 2016, http://www.nytimes.com/2016/12/29/upshot/fake-academe-looking-much-like-the-real-thing.html; Jocalyn Clark & Richard Smith, Firm Action Needed on Predatory Journals, 350 Brit. Med. J. h210 (2015), doi: http://dx.doi.org/10.1136/bmj.h210, http://www.bmj.com/content/350/bmj.h210; Colleen Flaherty, Librarians and Lawyers, Inside Higher Education, Feb 15, 2013, https://www.insidehighered.com/news/2013/02/15/another-publisher-accuses-librarian-libel; David Moher & Ester Moher, Stop Predatory Publishers Now: Act Collaboratively, Annals Internal Med. (2016), http://annals.org/article.aspx?articleId=2484878&guestAccessKey=a399556a-92ad-443f-855f-9d7fea36fefd; Cenyu Shen & Bo-Christer Björk, ‘Predatory’ Open Access: A Longitudinal Study of Article Volumes and Market Characteristics, 13 BMC Medicine 230 (2015), DOI: 10.1186/s12916-015-0469-2; http://bmcmedicine.biomedcentral.com/articles/10.1186/s12916-015-0469-2.

Related Postings

Flaky Academic Conferences, Dec. 24, 2015 (with updates)

Thursday, December 17, 2015

"Remarkably Accurate": The Miami-Dade Police Study of Latent Fingerprint Identification (Pt. 3)

The Department of Justice continues to communicate the sweeping view that fingerprint examiners make extremely few errors in their work. A few days ago, it issued this bulletin:

Miami-Dade Examiner Discusses a Highly Accurate Print Identification Process
In a new video, Brian Cerchiai talks about an NIJ-supported study conducted by the Miami-Dade Police Department on the accuracy of fingerprint examiners. The study found that fingerprint examiners make extremely few errors. Even when examiners did not get an independent second opinion about their decisions, they were remarkably accurate. But when decisions were verified by an independent reviewer, examiners had a 0-percent false positive, or incorrect identification, rate and a 3-percent false negative, or missed identification, rate.

A transcript of the NIJ video can be found below. The naive reader of the bulletin might think that Miami Dade's latent print examiners do not make false identifications -- they are "remarkably accurate" in their initial judgments -- and they have a "0-percent" rate of incorrectly declaring a match in their cases. In previous postings, I suggested that this first characterization is a remarkably rosy view of the results reported in the study, but I did not address the verification phase that brought the false positive rate (of 3 or 4%) for the judgments of individual examiners down to zero.

Today, Professor Jay Koehler shared his reactions to both aspects of the Miami Dade study on a discussion list of evidence law professors. I have not reread the study myself to verify the details of his analysis, but here is his take on the study:

Regarding the Miami-Dade fingerprint proficiency test (funded by the Department of Justice) - and DOJ’s claim that it showed a 0% false positive error rate - I urge you to be skeptical.

First, the study was not blind (examiners knew they were being tested) and the participants were volunteers. If we are serious about estimating casework error rates, these features are not acceptable.

Second, the Department of Justice’s press release indicates that the study showed examiners to be “remarkably accurate” and “found that examiners make extremely few errors.” But the press released doesn’t actually state what those remarkable error rates were.

Here they are: the false positive error rate was 4.2% (42 erroneous identifications out of 995 chances, excluding inconclusives), and the false negative error rate was 8.7% (235 erroneous exclusions out of 2,692 chance, excluding inconclusives). In case you are wondering whether the false positive errors were confined to a few incompetents, 28 of the 109 examiners who participated in the study made an erroneous identification. Also, the identification errors occurred on 21 of the 80 different latent prints used in the study.

The error rates identified in this study produce a likelihood ratio of about 22:1 for a reported fingerprint match. This means that, one should believe that it is about 22 times more likely that the suspect is the source of the latent print in question than it was prior to learning of the match. Not 22 million or billion times more likely to be the source of the latent print in question. Just 22 times more likely.

But not all false positive errors are equal, and most of those reported in this study really shouldn’t count as false positive errors if we are concerned with who is the source of the fingerprint as opposed to which finger is the source of the fingerprint. The authors report that 35 of the 42 false positive errors seemed to be nothing more than “clerical errors” in which the correct matching person was selected but the wrong finger was identified. If we move those 35 minor false positives into the correct calls category, we are left with 7 major false positive errors (i.e., a person who was not the source is falsely identified as the source). This translates to a 0.7% false positive error rate (i.e., about one false positive error per 142 trials), and a likelihood ratio of 130:1. Better, but still not even close to millions or billions to one.

Third, the study provides some evidence about the value of verification for catching false positive errors, but caution is needed here as well. The 42 false positives were divided up and assigned to one of three verification conditions: a group of different examiners, a group of examiners who were led to believe that they were the 2^nd verifiers, and the original examiners themselves (months later). The 0% post-verification error rate that the Department of Justice touts is an apparent reference to the performance of the first verification group only. None of the 15 false positive errors that were sent to this group of verifiers was repeated. But some of the original false positive errors were repeated by the second and third group of verifiers. The authors are silent on whether any of the 7 major false positive errors were falsely verified or not.

Appendix: NIJ Video Transcript: How Reliable Are Latent Fingerprint Examiners?

Forensic Science, Statistics, and the Law gratefully acknowledges the U.S. Department of Justice, Office of Justice Programs, National Institute of Justice, for allowing it to reproduce the transcript of the video How Reliable Are Latent Fingerprint Examiners? The opinions, findings, and conclusions or recommendations expressed in this video are those of the speaker and do not necessarily represent the official position or policies of the U.S. Department of Justice.

Research Conducted by the Miami-Dade Police Department.
Speaking in this video: Brian Cerchiai, CLPE, Latent Fingerprint Examiner, Miami-Dade Police Department

The goal of the research was to determine if latent finger print examiners can make and be able to make identifications, exclude properly prints not visible to the naked eye. In this case, we had these 13 volunteers leave over 2000 prints on different objects that were round, flat, smooth and we developed them with black powder and tape lifts.

We did the ACE which is analyze compare evaluate. Where we gave latent examiners - 109 latent examiners - unknown finger prints or palm prints and latents to look at and compare to three known sources. So essentially, compare this latent to one of these 30 fingers or one of these six palms.

[Slide text] 109 examiners compared the unknown latent prints to known sources. Can they match the prints correctly?
So as participants were looking at the latent list and comparing them to the subjects, we asked them if they could identify any of those three subjects as being the source of that latent print. In that case, they would call that an identification. If we asked them to exclude, we are basically asking them to tell us that none of those three standards made that latent or were not the source of that latent print.

That ACE verification (ACE-V) process works, secondly, the examiner looks at that comparison and does their own analysis comparison and gives their evaluation of that decision.

When we found that under normal conditions where one examiners made an identification and the second examiner verified that no erroneous identification got passed that second latent examiner. So it had a false positive rate of zero.

[Slide text] With verification, 0% false positive.
So when we are looking at ACE comparisons where one latent examiner looked a print and one latent examiner analyzed compared and evaluate and came up with a decision. We came up- there was a false positive rate which basically an erroneous identification where they identified the wrong source.

[Slide text] Without verification, 3% false positive.
Without verification, there was a three percent error rate for that type of identification. And we also tracked a false negative rate where given those three standards, people were erroneously excluded that source; where you’re given the source, check one of these three people and then you now eliminate that one of those latent print does not come from one of those three people, even though it did. So that would be a false negative. And that false negative rate was 7.5 percent.

[Slide text] Without verification, 7.5% false negative.
And what we did during the third part of our phase in this test was – we were testing for repeat ability and reproduce ability. We sent back answers over - after six months we sent back participants their own answers and we also gave them answers from other participants. But all those answers came back as if they were verifying somebody’s answers.

[Slide text] To test the error rate further, an independent examiner verified comparisons conducted by other examiners.
Under normal conditions we’d give them the source, latent number and basically agree, disagree or inconclusive. With a biased conditions, we’d give them the identification answer that someone identified, given the answer of a verifier. So now, it’s already been verified and now we want them to give a second verification. Having those print verified, ending out those erroneous identification to other examiners not one latent examiner under just regular verification process-not one latent examiner identified that, they caught all those errors. What actually brought the error rate – the reported error rate dropped down to zero.

[Slide text] The independent examiner caught all of the errors, dropping false positive error rate to 0%.
We maintained our regular case load, this was done in the gaps in between, after hours. The hardest part of doing this was not being dedicated researchers. That’s why it took us quite a long time to get this done. Now that it’s finally out here and we are doing things like this -- giving presentations this year. We really hope to expand on this research. The results from this study are fairly consistent with those of other studies.

[Slide text] This research project was funded by the National Institute of Justice, award no. 2010-DN-BX-K268. The goal of this research was to evaluate the reliability of the Analytics, Comparison, and Evaluation (ACE) and Analysis, Comparison, Evaluation, and Verification (ACE-V) methodologies in latent fingerprint examinations.
Produced by the Office of Justice Programs, Office of Communications. For more information contact the Office of Public Affairs at: 202-307-0703.

Friday, December 11, 2015

More on Task Relevance in Forensic Tests

Yesterday, I suggested that the National Commission on Forensic Science's views on task relevance are a significant step forward, and I elaborated on the use of conditional independence in determining which information is task relevant. The NCFS position is simple -- the examiner "should rely solely on task-relevant information when performing forensic analyses."

As the Commission explained, excluding task-irrelevant information avoids subtle but possible biases. 1/ However, what if the potentially biasing information could improve the accuracy of the analyst's conclusions? Statisticians often use biased estimators because they have greater precision -- they tend to give estimates that are closer to the true value with limited data -- even though these estimates tend to lie consistently on one side of that value. Moreover, even if the bias from the task-irrelevant information would increase the risk of an incorrect conclusion, what if it would be very costly to keep it out of the examination process? One might argue that the NCFS view is too stringent.

This challenge to the simple rule of no reliance is not persuasive. First, it is rather theoretical. Reasonably cheap methods to blind analysts to biasing, task-irrelevant information are generally available. The NCFS document explains how they can work.

Second, the conclusions that are likely to be more accurate are not those that the analyst should be drawing. At least with identification evidence in the courtroom, the expert should explain the strength of the scientific evidence, leaving the conclusion as to the identity of the true source to the judge or jury to decide based on all the evidence in the case. The NCFS views document adopts this philosophy most clearly in the last sentence of the appendix, which reads "[a]ny inferences analysts might draw from the task-irrelevant information involve matters beyond their scientific expertise that are more appropriately considered by others in the justice system, such as police, prosecutors, and jurors."

But it is not just a matter of relative expertise that should limit the analyst to task-relevant information. Forensic scientists are supposed to be conveying scientific information, and if the putative scientific judgment comes from a mixture of scientific and other information, the judge or jury cannot properly evaluate its weight without knowing what is the scientific part and what is some other part.

Information contamination also makes it difficult to discern the validity of scientific tests. Consider hair-morphology evidence. I have presented the Houck-Budowle study of the correspondence between microscopic hair examinations and mitochondrial DNA tests as evidence that the former has some modest probative value (as measured by the likelihood ratio for positive associations). 2/ But inasmuch as the examiners were not blinded to task-irrelevant information, it is hard to tell from this one study how much of the probative value comes from the features of the hair and how much comes from other information that the hair examiners might have considered.

Studies of polygraphic lie detection offer another example. The technique sounds scientific, and the graphs of physiological responses look technical. But if the examiners' conclusions used in a validation study are influenced by impressions of the subject, the study does not reveal the diagnostic value of just the information in the tracings -- the impact of that information and the subjective impressions are confounded. (This problem can be avoided by computerized scoring of the data.)

As the NCFS appendix emphasizes, the task-irrelevant information "does not help the analyst draw conclusions from the physical evidence that has been designated for examination through correct application of an accepted analytic method." At the risk of oversimplifying a complex subject, the message is that forensic scientists should stick to the scientific information.

Even this precept is not a complete response to concerns about bias. What if the task-relevant information also poses a serious risk of bias? If the contribution to the scientific analysis is minor and the risk of distortion is great, should not the examiner be blinded to this concededly task-relevant information? NCFS expressed no view on this situation. Perhaps it never arises, but if it does, standard-setting organizations should deal with it.

Notes

The NCFS observes that "there are risks entailed in exposing examiners unnecessarily to task-irrelevant information." But if the information is truly task-irrelevant, why would it be necessary? And if such information exists, would not the same risk of biasing the analysis be present?
David H. Kaye, Ultracrepidarianism in Forensic Science: The Hair Evidence Debacle, 72 Wash. & Lee L. Rev. Online 227 (2015); Disentangling Two Issues in the Hair Evidence Debacle, Forensic Sci., Stat. & L., Aug. 22, 2015, http://for-sci-law.blogspot.com/2015/08/disentangling-two-issues-in-hair.html.

Thursday, December 10, 2015

Blinding Forensic Analysts to Task-irrelevant Information: A National Commission (NCFS) Speaks Out

This week, the National Commission on Forensic Science (NCFS) approved a “views document” entitled Ensuring That Forensic Analysis Is Based Upon Task-Relevant Information. 1/ If these views are translated into practice, it will be a major step forward in making sure forensic science findings are based on scientific data and not on extraneous information. The document is thus cause for celebration.

Here, I describe how the document defines task-relevance. I identify an arguable inconsistency in the Commission’s terminology and elaborate on the use of what is known in probability theory as conditional independence.

The NCFS’ views are these:

FSSPs [Forensic Science Service Providers] should rely solely on task-relevant information when performing forensic analyses.
The standards and guidelines for forensic practice being developed by the Organization of Scientific Area Committees (OSAC) should specify what types of information are task-relevant and task-irrelevant for common forensic tasks.
Forensic laboratories should take appropriate steps to avoid exposing analysts to task-irrelevant information through the use of context management procedures detailed in written policies and protocols.

The analysis and explication that follows this enumeration tries to define task-relevance both in words and in symbols involving conditional probabilities. The NCFS definition is in two parts:

(1) [I]nformation is task-relevant for analytic tasks if it is necessary for drawing conclusions: (i.) about the propositions in question, (ii.) from the physical evidence that has been designated for examination, (iii.) through the correct application of an accepted analytic method by a competent analyst.
(2) Information is task-irrelevant if it is not necessary for drawing conclusions about the propositions in question, if it assists only in drawing conclusions from something other than the physical evidence designated for examination, or if it assists only in drawing conclusions by some means other than an appropriate analytic method.

Taken literally, this formulation seems to dismiss as task-irrelevant information that could help the analyst assess the strength of the evidence yet is not necessary for drawing conclusions about the propositions. For example, suppose the proposition P in question is whether a trace sample that has both clear and ambiguous features came from a suspect. Viewing a tape of someone who looks like (and thus might be) the suspect leaving the mark is task-irrelevant under (2). No analyst needs to view the tape to compare the questioned mark to a known exemplar. The analyst can reach a conclusion of some sort without the video.

Nevertheless, viewing the tape could help the analyst doing a side-by-side comparison resolve the ambiguities in the features in the mark and thereby “assess the strength of the inferential connection between the physical evidence being examined and the propositions the analyst is evaluating.” For example, if the mark is a distorted fingerprint, observing how it was deposited might help the analyst. It seems as if the tape should be declared task-relevant, but (1) requires that it be necessary to the analysis. Strictly speaking, it is not.

Indeed, a few sentences later, the document states that information “is task-relevant if it helps the analyst assess the strength of the inferential connection between the physical evidence being examined and the propositions the analyst is evaluating.” Not everything that is helpful is necessary.

That the Commission did not really mean to require necessity also can be gleaned from the “more formal definition of task-relevance and task-irrelevance ... in the technical appendix.” For “two mutually exclusive propositions P and NP that a forensic science service provider (FSSP) is asked to evaluate,” and for E defined as “the features or characteristics of the physical evidence that has been designated for examination,”

(1) information is task-relevant if it has the potential to assist the examiner in evaluating either the conditional probability of E under P—which can be written p(E|P)—or the conditional probability of E under NP—which can be written p(E|NP);
(2) information is task-irrelevant if it has no bearing on the conditional probabilities p(E|P) or p(E|NP).

Again, necessity is not crucial: The phrase “has the potential to assist” has been substituted for “is necessary,” and “has no bearing” has replaced “is not necessary.”

Technical definitions (1) and (2) also depart from (or refine) the main definitions (1) and (2) in that the only “conclusions” that can be considered in judging task-relevance are conditional probabilities for “features” given certain propositions. These conditional probabilities often are called “likelihoods” to distinguish them from the posterior probabilities of the propositions given the features. Traditionally, analysts testified about posterior probabilities expressed qualitatively or categorically. For example, the statement P that “Jane Doe’s thumb is the source of the latent print” is a categorical conclusion meaning that the posterior probability Pr(P|E) is close to 1.

Using likelihoods could be valuable in clarifying task-relevance, but the concepts of “bearing” and “potential to assist” remain undefined. It would seem that the NCFS intends to equate task-irrelevance with conditional independence. Let I denote the information that might be task-irrelevant. E and I are conditionally independent given some proposition R if and only if (iff) Pr(E&I|R) = P(E|R) P(I|R). An equivalent definition looks to whether Pr(E|I&R) = Pr(E|R). The idea is that once R is known, knowing I brings no additional information about E.

If we take conditional independence to be the NCFS’ technical definition of task-irrelevance, and we use R to stand for either P or NP, then we can rewrite the NCFS definitions as

(3) I is task-relevant iff Pr(E|I&R) ≠ Pr(E|R);
(4) I is task-irrelevant iff Pr(E|I&R) = Pr(E|R).

This more precise definition is easier to write than to apply. The “physical evidence” itself — bits of soil, specimens of handwriting, latent and rolled fingerprints, and so on — is not “E” in the probability function. Instead, E is “the features or characteristics of the physical evidence.” But are these the actual features or the declared features?

I think the formal definition works better when E refers to the true features (although the judge or jury only knows what the expert thinks they are). First, let’s look at an easy case. Suppose that E refers to the DNA alleles present at each locus in the suspect’s DNA (A0) and the profile in the crime-scene DNA (A1). Thus, E = A0 & A1. P means that the suspect is the source of both DNA samples; let Q mean that someone else is. I is a credible report that the suspect was near the crime scene just after the crime occurred. Finally, suppose that A0 and A1 are the same — the DNA in both samples have the same true features.

If P is true, then the samples must have the same features, so Pr(E|P) = Pr(E|I&P) =1. If Q is true, then whether the samples have the same features also does not depend on I — if someone else left the DNA, the suspect’s propinquity does not affect the alleles that the true contributor possesses and left at the crime-scene. Consequently, under (4), I is task-irrelevant, just as it should be.

Now, let’s make it more complicated. The laboratory is asked to assess whether a suspect’s “touch” DNA is present on a gun used in a killing. Several small peaks in the electropherogram are at the positions one would expect if this were the case, but they are at the limit of detectability. Some analysts would treat them as real (true peaks), but others would see them as spurious. The question is whether the analyst should be able to know the profile reported for the suspect — let’s call it r[A0] — before ascertaining the profile A1 in the crime-scene sample. Is I = r[A0] task-relevant to the determination of A1?

Some analysts might argue that I is task-relevant because knowing what is in the suspect’s DNA helps them understand what really is in the crime-scene DNA. They could say that the fact that the small peaks in the crime-science sample are located at just the same places as their larger homologs in the suspect’s sample helps them resolve the ambiguity arising from the small peak heights. Of course, if they are thoughtful, they also will recognize that the related information I could bias them, and they might well agree that they should not be exposed to it because it does not contribute enough to the accuracy of their determinations. But are they wrong in their claim that I is task-relevant (applying the NCFS definition)?

The views document does not give an explicit answer. It concludes with the observation that

[Task-irrelevant information] might help the analyst draw conclusions about the propositions, but it does not help the analyst draw conclusions from the physical evidence that has been designated for examination through correct application of an accepted analytic method. Any inferences analysts might draw from the task-irrelevant information involve matters beyond their scientific expertise that are more appropriately considered by others in the justice system, such as police, prosecutors, and jurors.

This relative-expertise criterion, however, does not quite define task-irrelevance. The inference that a small peak in an electropherogram is the result of chemiluminescence from alleles as opposed to an artifact or background noise may be difficult to make correctly, but it is not clear that it is lies more squarely within the expertise of police, prosecutors, and jurors than of DNA analysts. 2/

The formal definition can help us out here. If P is true, then regardless of what the peaks look like and irrespective of the suspect’s reported profile r[A0], the true profile of the crime-scene sample is A1 = A0. Thus, Pr(E=A1|P) = Pr(A1|P&I) = Pr(A1|P&r[A0]) = 1. Likewise, if someone else’s DNA is on the gun instead of the suspect’s, then the probability that the profiles match also is unrelated to a report of what is in the suspect’s DNA sample. Once again, the conditional-independence definition of task-irrelevance seems to work. Sometimes probability notation is purely window dressing, but the approach begun in the technical appendix might do some useful work in spotting task-irrelevant information. 3/

Notes

The document should appear on the Commission’s web page http://www.justice.gov/ncfs/work-products-adopted-commission in the near future. The principal drafter of the document was Bill Thompson, who is the chair of the Human Factors Resource Committee of the NIST Organization of Scientific Area Committees (OSACs) that is developing standards for forensic science.
One can argue that looking at the suspect's profile or peaks before resolving ambiguities in the crime-scene profile is not a "correct application of an accepted analytic method." However, given that the task is to ascertain and compare the two DNA samples, it seems odd to call this information about the profiles "irrelevant" as opposed to improper or not acceptable. And, if there were no standard in place rejecting this practice (as was true for a period of time), this criterion would not render the information task-irrelevant.
This is so even though the likelihoods involved are not necessarily the ones that determine the probative value of the forensic analysis with respect to the two competing hypotheses P and Q. Those likelihoods are Pr(E*|P) and Pr(E*|Q). The asterisk is attached because we do not know the true features E in the samples. We have data E* on them (fallible measurements or observations of them). The probative value of E* (or, if you like, of the analysis that generates these data) is the likelihood ratio Pr(E*|P) / Pr(E*|Q). For example, even though we can speak of the likelihood ratio for the true DNA profiles (A1 & A0) for present purposes, the court’s evidence is the reported profiles: E* = r[A1 & A0].

Sunday, December 6, 2015

Hair Evidence in the “Clearly Not Exonerated” Exoneration of Mark Reid

On November 3, the fictional forensic scientist in the world’s most watched television drama was aghast that she had once performed microscopic hair comparisons. Having learned that such comparisons are entirely discredited, NCIS’s Abby Sciuto is horrified: “Can you imagine if I messed up, what that really means? It means that innocent people went to jail because of me, because of my mistakes.” As a writer for Entertainment Weekly wrote, “Abby’s spinning out of control, locked in her lab and reexamining every single case she’s ever touched.” 1/

The “16 Years” episode is fiction, but real people have gone to jail for longer than that — and some could have been executed — because of mistakes by examiners. One disturbing hair-comparison case is State v. Reid. 2/ I have cited the Connecticut Supreme Court's opinion in two publications that survey different ways to testify about the implications of similarities between trace evidence and samples from known sources (such as the defendant). 3/ In doing so, I was not expressing the slightest agreement with the supreme court’s reasoning or arguing that the court was correct to hold that the trial judge properly admitted the testimony. But the case does illustrate how a careful criminalist called upon to testify could proceed in the face of gaping scientific uncertainty about the significance of similarities in the trace material and the known samples. It also illustrates the different sorts of errors than can occur in ascertaining hair morphology and drawing inferences from it as well as the different types of exonerations that can occur with the benefit of DNA testing.

I. The Criminalist Gives “Features Only” Testimony in Reid

It seems hard to deny that various physical features of hair display at least some variation within a population. But without extensive population data that might permit at least rough estimates of the relative frequencies of the features, and without employing methods that have demonstrated reliability in measuring the features of interest, it is not clear how, or even if, this information should be used in trials.

One possibility is to limit the testimony to a presentation of the observed features (and perhaps a characterization of the samples' features as similar or different, as the case may be). According to the Connecticut Supreme Court, the analyst in Reid pursued this “features only” approach:

[He] displayed an enlarged photograph of one of the defendant's hairs and one of the hairs recovered from the victim's clothing as they appeared side-by-side under the comparison microscope. [He] explained to the jurors how the hairs were similar and what particular features of the hairs were visible. He also drew a diagram of a hair on a courtroom blackboard for the jurors. The jurors were free to make their own determinations as to the weight they would accord the expert's testimony in the light of the photograph and their own powers of observation and comparison.

The trial court had held a pretrial hearing to decide whether this testimony satisfied the preliminary showing of scientific validity normally required of all suitably challenged scientific evidence. The court found that it did, but the supreme court did not rely on or discuss either the scientific validity or the general scientific acceptance of visual hair comparisons. It avoided the issue by holding that the testimony did not have to satisfy such standards — because it was not “scientific evidence” at all. Rather, the expert “testified about a subject that simply required the jurors to use their own powers of observation and comparison.”

Three years later, in 2003, the Superior Court granted a petition for a new trial. 4/ Its opinion casts doubt on the no-science theory. Elaborating on the supreme court’s description of the testimony, this court observed that at the trial, the expert “indicated that hair comparison analysis ... is generally accepted as reliable within the field of forensic science” and “that he could state, ‘to a reasonable degree of scientific certainty,’ that the pubic hairs found on the victim's clothing were microscopically similar to those pubic hair samples taken from Mark Reid.” On such a record, the supreme court’s conclusion that the usual standards for scientific evidence are beside the point is hard to swallow.

II. The Superior Court Orders a New Trial While Insisting that the DNA Evidence Does Not Exonerate Reid

In any event, at Reid’s trial

Mr. Settachatgul testified that the three hairs recovered from the victim's clothing were pubic hairs. These hairs were rootless, indicating that they were shed, not plucked; one was found on the victim's jeans, another on a sock, and another on her lower undergarment (panty). ... Based on the microscopic analysis, Mr. Settachatgul's conclusion was that the three rootless hairs recovered from M.'s clothing were Negroid pubic hairs which had similar characteristics to the pubic hairs supplied by petitioner.

As the State’s Attorney explained in her summation, “the hairs ... appeared to be the same color, both had an abundance of fuci, and both exhibited the shadow of twisting, indicative of pubic hair.” However, she acknowledged that “statistics are not done in the comparison field, [and] the only conclusion that can be drawn is similar or dissimilar characteristics, not the percentage of the population which shares those hair characteristics.” She told the jury “this is not conclusive evidence. I agree with the defense to that degree. It is not conclusive. But it is supportive of the victim's I.D.”

This support collapsed when postconviction mitochondrial DNA testing established that the three public hairs came from the same individual or from individuals in the same maternal lineage. Critically, Reid was not in that maternal line, while the victim, a white woman, was. This left two major possibilities: either the unknown rapist was the source of the three hairs or the victim was. The former scenario totally exculpates Reid; the latter renders the “supportive” expert testimony inconclusive.

Reid argued that the mtDNA test proved his actual innocence — that the criminalist’s determination of race had to be true, that the victim was white, and therefore the true rapist must have been some other black man who deposited three hairs on the victim’s clothes. The court did not buy this argument. And for good reason. Because the mtDNA testing showed that the victim’s mtDNA sequences matched those of the three hairs, the most plausible conclusion is that the criminalist erred in finding “Negroid pubic hairs.” The hairs probably were the victim’s rather than any rapist’s.

Thus, nothing was left of to connect Reid to the rape except the victim’s identification of him. The Superior Court concluded:

This is a close, difficult case. The new mtDNA evidence merely excludes petitioner as the depositor of the unknown hairs; it clearly does not exonerate him. And, as stated, the victim, M., was certain and steadfast in her identification of Mark Reid, and the circumstances surrounding that identification support its reliability, at least when viewed absent the newly discovered mtDNA evidence.

There are reasons to question this rosy picture of the eyewitness testimony, but whatever one thinks of that identification, the Superior Court found that Reid was entitled to a new trial at which he could use the mtDNA evidence to devastate a major part of the state’s case — the hair testimony.

III. The End of the Story Leaves Questions Hanging

In the end, there was no retrial. According the University of Michigan Law School’s National Registry of Exonerations, the state dismissed the charges “after the victim declined to participate ... . Reid, who had other felony convictions, was deported to his native Jamaica. In 2004, Reid filed a lawsuit seeking $2 million in damages from East Hartford.” The Registry does not report the outcome of that action. 5/

Whether or not all microscopic hair testimony is scientifically invalid because the comparisons have not been shown to be scientifically reliable, there is agreement that criminalists frequently have erred by using the similarity between hairs to make strong or quantified statements about the source of the trace hairs. 6/ This type of overclaiming apparently did not occur in Reid. In granting a new trial, the Superior Court emphasized that “Mr. Settachatgul was testifying only to the very ‘narrow opinion’ that the three pubic hairs recovered from the victim's clothing were similar to the samples obtained from the defendant, and, that he, Settachatgul could not say that the questioned specimens were the pubic hairs of petitioner/defendant.”

This testimony would not be judged as scientifically invalid under the FBI’s guidelines for reviewing microscopic-hair-comparison testimony. 7/ Likewise, the prosecutor’s summation was not infected with the sort of egregious overstatements, such as “There is one chance, perhaps for all we know, in 10 million that it could [be] someone else’s hair,” heard in other cases. 8/

The hair analyst may have erred in concluding that the features were similar. But mtDNA testing cannot tell us that. The small DNA molecules in the mitochondria do not relate to hair morphology. They provide a complementary — and more specific — test for identity. Sequence differences can exclude suspects when the inherently less discriminating visible features cannot. Standing alone, this limitation does not make microscopic hair analysis scientifically invalid, and it does not mean that the analyst misjudged the visual features here. But it does underscore the need to estimate the likelihoods or conditional error rates for microscopic hair comparisons. Without this information, how can anyone know what weight to give to the criminalist’s findings of similar hairs?

Finally, given the mitochondrial results for the victim and the pubic hairs in Reid, the hair analyst probably erred in concluding that the three pubic hairs were of “Negroid origin.” As noted earlier, the woman who was attacked in Reid was white, and the simplest conclusion is that the hairs on her clothes were hers, as the mtDNA sequences suggest. Furthermore, the unqualified assurance as to the racial origin of the hairs was unjustified — even if the classification turned out to be correct. Skimming a few forensic science textbooks, I can find no reference to publications in the scientific literature to support the position that hair analysts can make firm determinations of biogeographic ancestry. Caution is usually advised. 9/

Notes

Sara Netzley, “16 Years,” Entertainment, http://www.ew.com/recap/ncis-season-13-episode-7/2
757 A.2d 482 (Conn. 2000). The case is noted in J.M. Taupin, Forensic Hair Morphology Comparison— A Dying Art or Junk Science?, 44 Sci. & Justice 95 (2004).
David H. Kaye, David E. Bernstein & Jennifer L. Mnookin, The New Wigmore: A Treatise on Evidence: Expert Evidence (2d ed. 2011); David H. Kaye, Presenting Forensic Identification Findings: The Current Situation, in Communicating the Results of Forensic Science Examinations 12–30 (C. Neumann et al. eds. 2015) (Final Technical Report for NIST Award 70NANB12H014).
Reid v. State, No. CV020818851, 2003 WL 21235422 (Ct. Super. Ct. May 14, 2003).
Maurice Possley, Mark Reid, The National Registry of Exonerations.
David H. Kaye, Ultracrepidarianism in Forensic Science: The Hair Evidence Debacle, 72 Wash. & Lee L. Rev. Online 227 (2015)
Id.
Spencer S. Hsu, Santae Tribble Cleared in 1978 Murder Based on DNA Hair Test, Dec.14, 2012 (quoting from federal prosecutor David Stanley’s closing argument).
Max M. Houck & Jay A. Siegel, Fundamentals of Forensic Science 303 (2015) (“Estimating the ethnicity or ancestry of an individual from his or her hairs is just that: an estimate.”); Richard Saferstein, Forensic Science: An Introduction 419 (2d ed. 2011) (“all of these observations are general, with many possible exceptions. The criminalist must approach the determination of race from hair with caution and a good deal of experience.”).

Acknowledgment: Thanks to Chris Fabricant for thoughts on State v. Reid and for pointing me to the full history of the case.

Thursday, November 26, 2015

Cell Phones, Brain Cancer, and Scientific Outliers Are Not the Best Reasons to Abandon Frye v. United States

Two days ago, the District of Columbia Court of Appeals (the District’s highest court) heard oral argument 1/ on whether to discard the very test that its predecessor introduced into the law of evidence in the celebrated — and castigated — case of Frye v. United States. 2/ That was 1923, and the evidence in question was a psychologist’s opinion that a systolic blood pressure test showed that James Alphonso Frye was telling the truth when he recanted his confession to a notorious murder in the District. With nary a citation to any previous case, the Court of Appeals famously wrote that

[W]hile courts will go a long way in admitting expert testimony deduced from a well-recognized scientific principle or discovery, the thing from which the deduction is made must be sufficiently established to have gained general acceptance in the particular field in which it belongs. 3/

Now it is 2015, the case is Murray v. Motorola, Inc., 4/ and the proffered evidence is expert testimony that cell phones cause (or raise the risk of) brain cancer. The methods used to form or support this opinion or related ones range from what the court calls “WOE” (the expert says, I thoroughly assessed the “weight of evidence”), to “PDM” (I considered the evidence of causation pragmatically, with the “Pragmatic Dialog Method”), to “a literature review” (I read everything I could find on the subject), to “laboratory experiments” (I conducted in vitro exposure of cells, with results that may not have been replicated), and to “experience as a toxicologist and pharmacologist” to show that “it is generally accepted to extrapolate findings from in vitro studies in human and mammalian cells to predict health effects in humans.”

The trial judge, Frederick H. Weisberg, ruled much of this testimony admissible on the theory that regardless of the extent to which the conclusions are within the mainstream of scientific thinking, the “methods” behind them were generally accepted in ascertaining carcinogenicity. He chastised the defense for “repeatedly challeng[ing] plaintiffs' experts on the ground that their conclusions and opinions are not generally accepted.” As he construed Frye, “[e]ven if 99 out of 100 scientists come out on one side of the causation inference, and only one comes out on the other, as long as the one used a ‘generally accepted methodology,’ Frye allows the lone expert to testify for one party and one of the other ninety-nine to testify for the opposing party.” Having placed himself in this box, Judge Weisberg asked the Court of Appeals to let him out, writing that “most, if not all, of Plaintiffs' experts would probably be excluded under the Rule 702/Daubert standard based on the present record” and granting the defendants' request to allow them to appeal his ruling immediately.

Defendants then convinced the Court of Appeals to jump in. Normally, the appellate court would review only the final judgment entered after a trial. In Murray, it granted an interlocutory appeal on the evidentiary ruling. Not only that, but it agreed to sit en banc, with all nine judges participating rather than to act through a normal panel of three randomly selected judges.

The question before the en banc court is thus framed as whether to replace the jurisdiction’s venerable Frye standard with the approach sketched in Daubert v. Merrell Dow Pharmaceuticals. 5/ Daubert changes the focus of the judicial inquiry from whether a theory or technique is generally accepted to whether it is scientifically valid. (see Box: What Daubert Did).

But does Frye really require Judge Weisberg to accept evidence that Daubert excludes in this case? The case, I shall argue, is not about Daubert versus Frye. It is about methodology versus conclusion. The judge's construction of Frye as sharply confined to “methodology” is what makes it impossible for him to reject as inadmissible the theory that cell phones cause brain cancers even if it is plainly not accepted among knowledgeable scientists. And that is just as much a problem under Daubert as it is under Frye. Daubert specifically states that the subject of the inquiry “is the scientific validity ... of the principles that underlie a proposed submission. The focus, of course, must be solely on principles and methodology, not on the conclusions that they generate.” 6/ Judge Weisberg decided that principles or putative methodologies like WOE, PDM, literature review, extrapolation from in vitro experiments, and experience are all generally accepted among scientists as a basis for inferring carcinogencity. But if this is correct, and if it insulates claims of general causation from scrutiny for general acceptance under Frye, then it does the same under Daubert (as originally formulated). 7/ Surely, weighing all the relevant data, being pragmatic, studying the literature, considering experiments, and using experience is what scientists everywhere do. They do it not out of habit, but because these things tend to lead to more correct conclusions (and less criticism from colleagues) than the alternatives of not weighing all the data, being doctrinaire, ignoring the literature, and so on.

WHAT DAUBERT DID

In Daubert, the U.S. Supreme Court did not rule that Frye was antiquated or not up to job of screening out dangerous and dubious scientific evidence. Rather, the Court reasoned that Congress, in adopting the Federal Rules of Evidence in 1975, had implicitly dropped a strict requirement of general acceptance. The Court then read Federal Rule 702 as requiring scientific evidence to be, well, “scientific,” as determined by district courts that could look to various hallmarks of scientifically warranted theories. One important criterion, the Court observed, was general acceptance. But such acceptance was no longer dispositive. It was only an indicator of the scientific validity that courts had to find in order to admit suitably challenged scientific evidence.

A majority of U.S. jurisdictions (41 according to the trial court order in Murray), either by legislation or judicial decision, follow the Daubert approach for filtering out unvalidated or invalid scientific evidence (although they still place great weight on the presence of absence of general acceptance in the relevant scientific community). At least one state, Massachusetts, still clings to Frye while embracing Daubert.

The problem with the toxic tort cases like Murray is that the line between “method” and “conclusion” is difficult to draw, and Judge Weisberg draws it in the wrong place. Although his opinion cites to (the first edition of) Wigmore on Evidence: Expert Evidence, it ignores the warning (in § 6.3.3(a)(1) of the second edition and § 5.2.3 of the first edition) that

Occasionally, however, courts define the theory or method at so high a level of abstraction that all kinds of generally applicable findings can be admitted without attending to whether the scientific community accepts them as well founded. For example, in Ibn-Tamas v. United States, [407 A.2d 626 (D.C. 1979),] the District of Columbia Court of Appeals reasoned that a psychologist's theory of the existence and development of various characteristics of battered women need not be generally accepted because an overarching, generally accepted methodology — clinical experience — was used to study the phenomenon. The problem, of course, is that such reasoning could be used to obviate heightened scrutiny for virtually any scientific development [citing, among other cases, Commonwealth v. Cifizzari, 492 N.E.2d 357, 364 (Mass. 1986) (“to admit bite mark evidence, including an expert opinion that no two people have the same bite mark, a foundation need not be laid that such identification technique has gained acceptance in the scientific community. What must be established is the reliability of the procedures involved, such as X-rays, models, and photographs.”)]. Indeed, in developing the lie-detection procedure used in Frye, Marston applied generally accepted techniques of experimental psychology to test his theory and equipment. Thus, an exclusively “high-level” interpretation of Frye is untenable. 8/

The opinion in Murray also overlooks the more extended analysis in Wigmore of why causation opinions in toxic tort cases should be considered theory rather than conclusions within the meaning of Frye. 9/ It would make no sense to ask whether psychologists generally accept the proposition that Marston correctly measured the defendant's blood pressure or correctly applied some formula or threshold that indicated deception. Such case-specific facts do not appear before any general scientific community for scrutiny. On the other hand, whether elevated blood pressure is associated with deception, how it can be measured, and whether a formula or threshold for concluding that the defendant is deceptive or truthful are trans-case propositions that should be part of normal scientific discourse.

The same is true of claims of carcinogenicity. Whether cell phones can cause brain cancer at various levels of exposure are trans-case propositions that stimulate scientific dialog. The Frye test can function just as well (or as poorly) in vetting expert opinions that exposure can cause cancer as in screening a psychologist's opinion that deception can cause a detectable spike in blood pressure. In sum, denominating trans-case conclusions that have been or could be the subject of scientific investigation and controversy as "conclusions" that are beyond the reach of either Frye or Daubert is a category mistake.

There is another way to make this point. Given all the usual reasons to subject scientific evidence to stricter-than-normal scrutiny, courts in Frye jurisdictions need to consider whether it is generally accepted that the body of scientifically validated findings on which the expert relies is sufficient to justify, as scientifically reasonable, the trans-case conclusion. Thus. the Ninth Circuit Court of Appeals in Daubert originally reasoned — on the basis of Frye — that in the absence of some published, peer-reviewed epidemiological study showing a statistically significant association, the causal theories (whether they are labelled general premises or specific conclusions) of plaintiffs’ expert were inadmissible. The court determined that the body of research, namely, “the available animal and chemical studies, together with plaintiffs' expert reanalysis of epidemiological studies, provide insufficient foundation to allow admission of expert testimony to the effect that Bendectin caused plaintiffs' injuries.” 10/ It was appropriate — indeed, necessary — to consider all the “available ... studies,” but under Frye, there still had to be general acceptance of the proposition that drawing an inference of causation from such studies was generally accepted as scientifically valid. Gussying up the inferential process as a WOE analysis (or anything else) cannot alter this requirement.

Whether or not the Court of Appeals switches to Daubert, it should correct the trial court's blanket refusal to consider whether the theory that cellphones ever cause brain cancer at relevant exposure levels is generally accepted. General acceptance may not be determinative under Daubert, but it remains important. Whether the inquiry into this factor is compelled and conclusive under Frye or inevitable and influential under Daubert, it should not be skewed by a misconception of the scope of that inquiry. In the end, the courts in Murray should realize that

the choice between the general-acceptance and the relevancy-plus standards may be less important than the copious quantities of ink that courts and commentators have spilled over the issue would indicate. [O]ne approach is not inherently more lenient than the other—the outcomes depend more on how rigorously the standards are applied than on how the form of strict scrutiny is phrased. 11/

Notes

Ann E. Marimow, D.C. Court Considers How To Screen Out ‘Bad Science’ in Local Trials, Wash. Post, Nov. 24, 2015
293 F. 1013 (D.C. Cir. 1923).
Id. at 1014.
No. 2001 CA 008479 B (D.C. Super. Ct.), available at http://apps.washingtonpost.com/g/page/local/dc-court-of-appeals-notice-of-appeal/1889/
509 U.S. 579 (1993).
Id. at 594–95 (emphasis added).
In General Electric Co. v. Joiner, 522 U.S. 136 (1997), the Supreme Court blurred the distinction between methodology and conclusion, and Congress later amended Rule 702 to incorporate this shift. The result is that in federal courts, it is less important to draw a better line than the one in Murray and Ibn-Thomas. See David H. Kaye, David A. Bernstein, and Jennifer L. Mnookin, The New Wigmore: A Treatise on Evidence: Expert Evidence § 9.2.2 (2d ed. 2011).
Id. § 6.3.3(a)(1).
Id. § 9.2.3(b).
Daubert v. Merrell Dow Pharms., Inc., 951 F. 2d 1128, 1131 (9th Cir. 1991).
Kaye et al., supra note 7, § 7.2.4(a).

Postscript: A more extensive version of these comments appears in the Bloomberg BNA Product Safety & Liability Reporter (14 Dec. 2015) and in the Expert Evidence Reporter (21 Dec. 2015).

Tuesday, November 24, 2015

Public Comment Period for Seven National Commission on Forensic Science Work Products To Close on 12/22

Public Service Announcement

The comment period for seven National Commission on Forensic Science work products will close on 12/22/15. The documents can be found, and comments can be left, at this location on Regulations.gov. The documents that are the most interesting (to me, at least) are as follows:

Directive Recommendation on the National Code of Professional Responsibility DOJ-LA-2015-0009-0002 (calls on the Attorney General to require forensic science service providers within the Department of Justice to adopt and enforce an enumerated 16-point “National Code of Professional Responsibility for Forensic Science and Forensic Medicine Service Providers”; to have someone define “steps ... to address violations”; and to “strongly urge” other groups to adopt the code)
Views Document on Establishing the Foundational Literature Within the Forensic Science Disciplines DOJ-LA-2015-0009-000 (asks for unspecified people or organizations to prepare “documentation” or “compilation” of “the literature that supports the underlying scientific foundation for each forensic discipline” “under stringent review criteria” and, apparently, for courts to rely on these compilations in responding to objections to admitting forensic science evidence)
Views Document on Using the Term Reasonable Degree of Scientific Certainty DOJ-LA-2015-0009-0008 (“legal professionals should not require that forensic discipline testimony be admitted conditioned upon the expert witness testifying that a conclusion is held to a ‘reasonable scientific certainty,’ a ‘reasonable degree of scientific certainty,’ or a ‘reasonable degree of [discipline] certainty’ [because] [s]uch terms have no scientific meaning and may mislead factfinders ... . Forensic science service providers should not endorse or promote the use of this terminology.”)
Views Document on Proficiency Testing in Forensic Science DOJ-LA-2015-0009-0007 (“As a recognized quality control tool, it is the view of the Commission that proficiency testing should ... be implemented [not only by accredited forensic science service providers, but also] by nonaccredited FSSPs in disciplines where proficiency tests are available from external organizations”)

I won’t discuss the substance of these documents here, but I can't help noting that the Commission lacks a professional copy editor. The dangling modifier in the sentence on proficiency testing is a sign of the absence of this quality control tool for writing.

Saturday, November 21, 2015

Latent Fingerprint Identification in Flux?

Two recent articles suggest that seeds of change are taking root in the field of latent fingerprint identification.

I. The Emerging Paradigm Shift in the Epistemology of Fingerprint Conclusions

In The Emerging Paradigm Shift in the Epistemology of Fingerprint Conclusions, the chief of the latent print branch of the U.S. Army Criminal Investigation Laboratory, Henry J. Swofford, writes of “a shift away from categoric conclusions having statements of absolute certainty, zero error rate, and the exclusion of all individuals to a more modest and defensible framework integrating empirical data for the evaluation and articulation of fingerprint evidence.” Mr. Swofford credits Christophe Champod and Ian Evett with initiating “a fingerprint revolution” by means of a 2001 “commentary, which at the time many considered a radical approach for evaluating, interpreting, and articulating fingerprint examination conclusions.” He describes the intense resistance this paper received in the latent print community and adds a mea culpa:

Throughout the years following the proposition of this new paradigm by Champod and Evett, the fingerprint community continued to respond with typical rhetoric citing the historical significance and longstanding acceptance by court systems, contending that the legal system is a validating authority on the science, as the basis to its reliability. Even the author of this commentary, after undergoing the traditional and widely accepted training at the time as a fingerprint practitioner, defensively responded to critiques of the discipline without fully considering, understanding, or appreciating the constructive benefits of such suggestions [citing Swofford (2012)]. Touting 100% certainty and zero error rates throughout this time, the fingerprint community largely attributed the cause of errors to be the incompetence of the individual analyst and failure to properly execute the examination methodology. Such attitudes not only stifled potential progress by limiting the ability to recognize inherent weaknesses in the system, they also held analysts to impossible standards and created a culture of blame amongst the practitioners and a false sense of perfection for the method itself.

The article by Champod and Evett is a penetrating and cogent critique of what its authors called the culture of “positivity.” They were responding to the fingerprint community’s understanding, as exemplified in guidelines from the FBI’s Technical Working Group on Friction Ridge Analysis, Study and Technology (TWGFAST), that

"Friction ridge identifications are absolute conclusions. Probable, possible, or likely identification are outside the acceptable limits of the science of friction ridge identification" (Simons 1997, p. 432).

Their thesis was that a “science of friction ridge identification” could not generate “absolute conclusions.” Being “essentially inductive,” the reasoning process was necessarily “probabilistic.” In comparing latent prints and exemplars in “an open population ... probabilistic statements are unavoidable.” (I would go further and say that even in a closed population — one in which exemplars from all the possible perpetrators have been collected — any inferences to identity are inherently probabilistic, but one source of uncertainty has been eliminated.) Although the article referred to “personal probabilities,” their analysis was not explicitly Bayesian. Although they wrote about “numerical measures of evidential weight,” they only mentioned the probability of a random match. They indicated that if “the probability that there is another person who would match the mark at issue” could be calculated, it “should be put before the court for the jury to deliberate.”

Mr. Swofford’s recent article embraces the message of probabilism. Comparing the movement toward statistically informed probabilistic reasoning in forensic science to the development of evidence-based medicine, the article calls for “more scientifically defensible ways to evaluate and articulate fingerprint evidence [and] quantifiable, standardized criterion to support subjective, experience-based opinions, thus providing a more transparent, demonstrable, and scientifically acceptable framework to express fingerprint evidence.”

Nonetheless, the article does not clearly address how the weight or strength of the evidence should be expressed, and a new DFSC policy on which he signed off is not fully consistent with the approach that Champod, Evett, and others have developed and promoted. That approach, Part II of this posting will indicate, uses the likelihood ratio or Bayes factor to express the strength of evidence. In their 2001 clarion call to the latent fingerprint community, however, Champod and Evett did not actually present the framework for “evidential weight” that they have championed both before and afterward (e.g., Evett 2015). The word “likelihood” appears but once in the article (in a quotation from a court that uses it to mean the posterior probability that a defendant is the source of a mark).

II. Fingerprint Identification: Advances Since the 2009 National Research Council Report

The second article does not have the seemingly obligatory words “paradigm shift” in its title, but it does appear in a collection of papers on “the paradigm shift for forensic science.” In a thoughtful review, Fingerprint Identification: Advances Since the 2009 National Research Council Report, Professor Christophe Champod of the Université de Lausanne efficiently summarizes and comments on the major institutional, scientific, and scholarly developments involving latent print examination during the last five or six years. For anyone who wants to know what is happening in the field and what is on the horizon, this is the article to read.

Champod observes that “[w]hat is clear from the post NRC report scholarly literature is that the days where invoking ‘uniqueness’ as the main (if not the only) supporting argument for an individualization conclusion are over.” He clearly articulates his favored substitute for conclusions of individualization:

A proper evaluation of the findings calls for an assignment of two probabilities. The ratio between these two probabilities gives all the required information that allows discriminating between the two propositions at hand and the fact finder to take a stand on the case. This approach is what is generally called the Bayesian framework. Nothing prevents its adoption for fingerprint evidence.

and

[M]y position remains unchanged: the expert should only devote his or her testimony to the strength to be attached to the forensic findings and that value is best expressed using a likelihood ratio. The questions of the relevant population—which impacts on prior probabilities—and decision thresholds are outside the expert’s province but rightly belong to the fact finder.

I might offer two qualifications. First, although presenting the likelihood ratio is fundamentally different from expressing a posterior probability (or a announcing a decision that the latent print comes from the suspect’s finger), and although the Bayesian conceptualization of scientific reasoning clarifies this distinction, one need not be a Bayesian to embrace the likelihood ratio (or its logarithm) as a measure of the weight of evidence. The intuition that evidence that is more probable under one hypothesis than another lends more support to the former than the latter can be taken as a starting point. (But counter-examples and criticisms the “law of likelihood” have been advanced. E.g., van Enk (2015); Mayo (2014).)

Second, whether the likelihood-ratio approach to presenting results is thought to be Bayesian or to rest on a distinct "law of likelihood," what stands in the way of its widespread adoption is conservatism and the absence of data-driven conditional probabilities with which to compute likelihood ratios. To be sure, even without accepted numbers for likelihoods, the analyst who reaches a categorical conclusion should have some sense of the likelihoods that underlie the decision. As subjective and fuzzy as these estimates may be, they can be the basis for reporting the results of a latent print examination as a qualitative likelihood ratio (NIST Expert Working Group on Human Factors in Latent Print Analysis 2012). Still, a question remains: How do we know that the examiner is as good at judging these likelihoods as at coming to a categorical decision without articulating them?

Looking forward to less opaquely ascertained likelihoods, Champod presents the following vision:

I foresee the introduction in court of probability-based fingerprint evidence. This is not to say that fingerprint experts will be replaced by a statistical tool. The human will continue to outperform machines for a wide range of tasks such as assessing the features on a mark, judging its level of distortion, putting the elements into its context, communicating the findings and applying critical thinking. But statistical models will bring assistance in an assessment that is very prone to bias: probability assignment. What is aimed at here is to find an appropriate distribution of tasks between the human and the machine. The call for transparency from the NRC report will not be satisfied merely with the move towards opinions, but also require offering a systematic and case-specific measure of the probability of random association that is at stake. It is the only way to bring the fingerprint area within the ethos of good scientific practice.

References

Christophe Champod, Fingerprint Identification: Advances Since the 2009 National Research Council Report, 370 Phil. Trans. Royal Soc’y B 20140259 (2015)
Christophe Champod & Ian W. Evett, A Probabilistic Approach to Fingerprint Evidence, 51 J. Forensic Identification 101 (2001)
Ian Evett, The Logical Foundations of Forensic Science: Towards Reliable Knowledge, 370 Phil. Trans. Royal Soc'y B 20140263 (2015)
Deborah G. Mayo, Why the Law of Likelihood Is Bankrupt–as an Account of Evidence, Error Statistics Philosophy, Nov. 15, 2014
NIST Expert Working Group on Human Factors in Latent Print Analysis, Latent Print Examination and Human Factors: Improving the Practice Through a Systems Approach (David H. Kaye ed., 2012)
Allyson A. Simons, Technical Working Group on Friction Ridge Analysis, Study and Technology (TWGFAST) Proposed Guidelines, 47 J. Forensic Identification 423 (1997)
Henry J. Swofford, The Emerging Paradigm Shift in the Epistemology of Fingerprint Conclusions, 65(3) J. Forensic Identification 201 (2015)
Steven J. van Enk, Betting, Risk, and the Law of Likelihood, 2 Ergo No. 5 (2015)

Acknowledgement: Thanks to Ted Vosk for telling me about the first article discussed here.

Forensic Science, Statistics & the Law

Pages