Monday, April 19, 2021

What is Accuracy?

The Organization of Scientific Area Committees for Forensic Science (OSAC) has an online "lexicon" that collects definitions of terms as they appear in published standards. 1/ These may or may not be the same as definitions in textbooks or other authoritative sources. 2/ They may or may not be accurate. (Yet, the drafters of OSAC standards sometimes point to the existence of a definition in the compendium as if it were a conclusive reason to perpetuate it. 3/)

Speaking of "accurate," the word "accuracy" has five overlapping definitions in OSAC's lexicon:

  • Closeness of agreement between a measured quantitiy [sic] value and a true quantity vlaue [sic] of a measurement.
  • The degree of agreement between a test result or measurement and the accepted reference value.
  • Closeness of agreement between a test result or measurement result and the true value. 1) In practice, the accepted reference value is substituted for the true value. 2) The term “accuracy,” when applied to a set of test or measurement results, involves a combination of random components and a common systematic error or bias component. 3) Accuracy refers to a combination of trueness and precision. [ISO 3534-2:2006].
  • The closeness of agreement between a test result and the accepted reference value. 1) In practice, the accepted reference value is substituted for the true value. 2) The term "accuracy," when applied to a set of test or measurement results, involves a combination of random components and a common systematic error or bias component. 3) Accuracy refers to a combination of trueness and precision.
  • Degree of conformity of a measure to a standard or true value.

Some of the definitions in the "lexicon" are designated "preferred terms." 4/ None of the definitions in the lexicon is marked preferred.

The main difficulty with the forensic scientists' set of definitions is that "accuracy" can refer to single measurements or estimates or to a process for making measurements or estimates. The longer definitions are confusing because they do not make it plain that "a combination of trueness and precision" applies to the accuracy of the process (or a large set of measurements from the process) and not so much to the accuracy of particular measurements.

"Precision" refers to the dispersion of repeated measurements under the same conditions. A precise estimate comes from a process that generates measurements that are typically tightly clustered around some value -- without regard to whether that value is the true one. A set of precise measurements -- ones that come from a process that tends to generate similar measurements when repeated -- may be far from the true value. Such measurements.(and the system that generates them) is statistically biased; these measurements have a systematic error component.

Conversely, an imprecise estimate -- one coming from a system that tends to produce widely divergent measurements -- may be essentially identical to the true value. Most other estimates from the same system would tend to stray farther from, the true value, but to say that an estimate that is spot on is not accurate sounds odd. The estimate may be unreliable (in the statistical sense of coming from a process that is highly variable), but it is practically 100% accurate (in this case). Even a generally inaccurate system may produce some accurate results.

The epistemological problem is that we should not rely on an unreliable system to ascertain the true value. For extremely imprecise point estimates, accuracy (in the sense of the absence of error and correspondence to the truth) becomes a matter of luck. It is unwise to act as if a particular measurement (or a small number of them) from an unreliable system adds much to our knowledge.

But the fact that the individual estimates provide little information is not well expressed by describing a result that is (luckily) correct as lacking accuracy.The investment analyst who said that a bitcoin will increase in value by 50% tomorrow is accurate if bitcoin's price did spike by approximately 50%. Nevertheless, this accurate prediction probably was unwarranted. Unless the analyst had a remarkable history of consistently predicting the ups and downs of bitcoin and an articulable and plausible basis for making the predictions, giving much credence to the prediction before the fact would have been unjustified.

Let's apply these elementary ideas to some forensic measurements. Suppose that analysts in a laboratory use an appropriate instrument to measure the refractive index of glass fragments. Most analysts are extremely proficient. Their measurements are both reliable (repeatability is high) and generally close to the true values. A smaller number of analysts are less proficient. Indeed, they are downright sloppy. They are not biased -- they err in both directions -- but the values they come up with are highly variable. An analyst from the proficient group obtains the value x for a particular fragment, and so does an analyst in the sloppy group.

Should we say that x is an accurate value when it comes from one of the former analysts and inaccurate when it comes from one of the latter? Some of the definitions from the standards suggest (or could be read as giving) one answer, whereas others suggest the opposite. It is far more straightforward to say that x is accurate (if it is close to the truth) in both cases.

To be sure, precision is a component of accuracy in the long run -- the imprecise analysts will tend to have lower accuracy (and higher error) rates. Their reports do not provide a sound basis for action. They are neither trustworthy nor statistically reliable. But it invites confusion to characterize every such report -- even ones that provide perfectly or approximately true values -- as inaccurate. When speaking of particular measurements, we simply need to distinguish between those that are wrong because they are far from the truth -- inaccurate -- and those that are accurate -- close to the truth either by good fortune or because of true knowledge. Systems that use luck to get the right answers are systematically inaccurate; properly functioning systems grounded on true knowledge are systematically accurate.

NOTES

  1. "The OSAC Forensic Lexicon should be the primary resource for terminology and used when drafting and editing forensic science standards and other OSAC work products. It is continually updated with the latest work from OSAC units, as well as terms from newly published documentary standards and standards elevated to the OSAC Registry." OSAC Registry, https://lexicon.forensicosac.org/ (undated).
  2. Cf. id. ("The terms and definitions in the OSAC Lexicon come from the published literature, including documentary standards, specialized dictionaries, Scientific Working Group (SWG) documents, books, journal articles, and technical reports. When a suitable definition can’t be located in any of these sources, an OSAC unit generates new or modifies existing definitions. Gradually terms are evaluated and harmonized by the OSAC to a single term. This process results in an OSAC Preferred Term."). 
  3. E.g.,  Comment Adjudication, OSAC 2021-N-0001, Wildlife Forensics Method-Collection of Known DNA Samples from Domestic Mammals, Feb. 11, 2021, at cells L25 & L27 (OSAC Proposed Standard added to the Registry Apr. 6, 2021) (link to Excel spreadsheet at https://www.nist.gov/osac/public-documents).
  4. Id. They should be called "preferred definitions" for terms, and terms that are not supposed to be used in standards anymore should be called "deprected terms," but I digress.
Last modified: 9/27/21 14:30 ET