Saturday, April 2, 2016

Sample Evidence: What’s Wrong with ASTM E2548-11 Standard Guide for Sampling Seized Drugs?

Samuel Johnson once observed that “You don't have to eat the whole ox to know that it is tough.” Or maybe he didn't say this, 1/ but the idea applies to many endeavors. One of them is testing of seized drugs. The law needs to—and generally does—recognize the value of surveys and samples in drug and many other kinds of cases. 2/ If the quantity of seized drugs is large, it is impractical and typically unnecessary to test every bit of the materials. Clear guidance on how to select samples from the population of seized matter would be helpful to courts and laboratories alike.

To accomplish this goal, the Chemistry and Instrumental Analysis Subject Area Committee of the Organization of Scientific Area Committees for Forensic Science (OSAC) has recommended the addition of ASTM International’s Standard Guide for Sampling Seized Drugs for Qualitative and Quantitative Analysis (known as ASTM E2548-11) to the National Institute of Standards and Technology (NIST) Registry of Approved Standards. Unfortunately, this "Standard Guide" is vague in its guidance, incomplete and out of date in its references, and nonstandard in its nomenclature for sampling.

The Standard does not purport to prescribe "specific sampling strategies." 3/ Instead, it instructs “[t]he laboratory ... to develop its own strategies” and “recommend[s] that ... key points be addressed.” There are only two key points. 4/ One is that “[s]tatistically selected units shall be analyzed to meet Practice E2329 if statistical inferences are to be made about the whole population.” 5/ But ASTM E2329 merely describes the kinds of analytical tests that can or should be performed on samples. It reveals nothing about how to draw samples from a population. So far, ASTM E2548 offers no guidance about sampling.

The other “key point” is that “[s]ampling may be statistical or non-statistical.” Although tautological (A is either X or not-X), X is not defined, and an explanatory note intensifies the ambiguity. It states that “[f]or the purpose of this guide, the use of the term statistical is meant to include the notion of an approach that is probability-based.” 6/  Does “probability-based” mean probability sampling (the subject of ASTM E105-10)? At least the latter has a well-defined meaning in sampling theory. 7/ It means that every unit in the sampling frame has a known probability of being drawn.

But even if this is what ASTM E2548-11 Standard Guide means by “probability-based,” the phrase is not congruent with "statistical." The note indicates that even sampling that is not “probability-based” still can be considered "statistical sampling." Later parts of the the Standard allow inferences to populations to be made from "statistical" samples but not from "non-statistical" ones. Using an undefined notion of "statistical" and "non-statistical" as the fundamental organizing principle departs from conventional statistical terminology and reasoning. The usual understanding of sampling differentiates between probability samples -- for which sampling error readily can be quantified -- and other forms of sampling (whether systematic or ad hoc) -- for which statistical analysis depends on the assumption that the sample is the equivalent of a probability sample.

Thus, the statistical literature on sampling commonly explains that
If the probability of selection for each unit is unknown, or cannot be calculated, the sample is called a non-probability sample. Non-probability samples are often less expensive, easier to run and don't require a frame. [¶] However, it is not possible to accurately evaluate the precision (i.e., closeness of estimates under repeated sampling of the same size) of estimates from non-probability samples since there is no control over the representativeness of the sample. 8/
In contrast, because the ASTM Standard does not focus on probability sampling as opposed to other "statistical sampling," the laboratory personnel (or the lawyer) reading the standard never learns that "it is dangerous to make inferences about the target population on the basis of a non-probability sample." 9/

Indeed, Figure 1 of ASTM 2548 introduces further confusion about "statistical sampling." In this figure, a statistical “sampling plan” is either “Hypergeometric,” “Bayesian,” or “Other probability-based.” But the sampling distribution of a statistic is not a “sampling plan” (although it could inform one). A sampling plan should specify the sample size (or a procedure for stopping the sampling if results on the sampled items up to that point make further testing unnecessary). For sampling from a finite population without replacement, the hypergeometric probability distribution applies to sample-size computations and estimates of sampling error. But how does that make the sampling plan hypergeometric? One type of “sampling plan” would be to draw a simple random sample of a size computed to have a good chance of producing a representative sample. Describing a plan for simple random sampling, stratified random sampling, or any other design as “hypergeometric,” “Bayesian,” or “other” is not helpful.

Similarly confusing is the figure’s trichotomy of “non-statistical” into the following “plans”: “Square root N,” “Management Directive,” and “Judicial Requirements.” Using the old √N + 1 rule of thumb for determining sample size may be sub-optimal, 10/ but it is “statistical” -- it uses a statistical computation to establish a sample size. So do any judicial or administrative demands to sample a fixed percentage of the population (an approach that a Standard should deprecate). No matter how one determines the sample size, if probability sampling has been conducted, statistical inferences and estimates have the same meaning.

Also puzzling are the assertions that “[a] population can consist of a single unit,” 11/ and that “numerous sampling plans ... are applicable to single and multiple unit populations.” 12/ If a population consists of “a single unit” (as the term is normally used), 13/ then a laboratory that tests this unit has conducted a census. The study design does not involve sampling, so there can be no sampling error.

When it comes to the issue of reporting quantities such as sampling error, the ASTM Standard is woefully inadequate. The entirety of the discussion is this:
7.1 Inferences based on use of a sampling plan and concomitant analysis shall be documented.

8.1 Sampling information shall be included in reports.
8.1.1 Statistically Selected Sample(s)—Reporting statistical inferences for a population is acceptable when testing is performed on the statistically selected units as stated in 6.1 above [that is, according to a standard that is on the NIST Registry with a disclaimer by NIST]. The language in the report must make it clear to the reader that the results are based on a sampling plan.
8.1.2 Non-Statistically Selected Sample(s)—The language in the report must make it clear to the reader that the results apply to only the tested units. For example, 2 of 100 bags were analyzed and found to contain Cocaine.
These remarks are internally problematic. For example, why would an analyst report the population size, the sample size, and the sample data for “non-statistical” samples but not for “statistical” ones?

More fundamentally, to be helpful to the forensic-science and legal communities, a standard has to consider how the results of the analyses should be presented in a report and in court. Should not the full sampling plan be stated — the mechanism for drawing samples (e.g., blinded, which the ASTM Standard calls “black box” sampling, or selecting from numbered samples by a table of random numbers, which it portrays as not “practical in all cases”); the sample size; and the kind of sampling (simple random, stratified, etc.)? It is not enough merely to state that “the results are based on a sampling plan.”

When probability sampling has been employed, a sound foundation for inferences about population parameters will exist. But how should such inference be undertaken and presented? A Neyman-Pearson confidence interval? With what confidence coefficient? A frequentist test of a hypothesis? Explained how? A Bayesian conclusion such as “There is a probability of 90% that the weight of the cocaine in the shipment seized exceeds X”? The ASTM Standard seems to contemplate statements about “[t]he probability that a given percentage of the population contains the drug of interest or is positive for a given characteristic,” but it does not even mention what goes into computing a Bayesian credible interval or the like. 14/

The OSAC Newsletter proudly states that "[a] standard or guideline that is posted on either Registry demonstrates that the methods it contains have been assessed to be valid by forensic practitioners, academic researchers, measurement scientists, and statisticians through a consensus development process that allows participation and comment from all relevant stakeholders." The experience with ASTM Standards 2548 and 2329 suggests that even before a proposed standard can be approved by a Scientific Area Committee, the OSAC process should provide for a written review of statistical content by a group of statisticians. 15/

Disclosure and disclaimer: I am a member of the OSAC Legal Resource Committee. The information and views presented here do not represent those of, and are not necessarily shared by NIST, OSAC, any unit within these organizations, or any other organization or individuals.

Notes
  1. According to the anonymous webpage Apocrypha: The Samuel Johnson Sound Bite Page, the aphorism is "apocryphal because it's not found in his works, letters, or contemporary biographies about Samuel Johnson. But it is similar to something he once said about Mrs. Montague's book on Shakespeare: 'I have indeed, not read it all. But when I take up the end of a web, and find it packthread, I do not expect, by looking further, to find embroidery.'"
  2. See, e.g., David H. Kaye et al., David E. Bernstein & Jennifer L. Mnookin, The New Wigmore: A Treatise on Evidence: Expert Evidence (2d ed. 2011); Hans Zeisel & David H. Kaye, Prove It with Figures: Empirical Methods in Law and Litigation (1997).
  3. See ASTM E2548-11, § 4.1. 
  4. Id., § 4.2.
  5. § 4.2.2.
  6. § 4.2.1 (emphasis added).
  7. E.g., Statistics Canada, Probability Sampling, July 23, 2013:
    Probability sampling involves the selection of a sample from a population, based on the principle of randomization or chance. Probability sampling is more complex, more time-consuming and usually more costly than non-probability sampling. However, because units from the population are randomly selected and each unit's probability of inclusion can be calculated, reliable estimates can be produced along with estimates of the sampling error, and inferences can be made about the population.
  8. National Statistical Service (Australia), Basic Survey Design, http://www.nss.gov.au/nss/home.nsf/SurveyDesignDoc/B0D9A40C6B27487BCA2571AB002479FE?OpenDocument (emphasis in original).
  9. Id.
  10. See J. Muralimanohar & K. Jaianan, Determination of Effectiveness of the “Square Root of N Plus One” Rule in Lot Acceptance Sampling Using an Operating Characteristic Curve, Quality Assurance Journal, 14(1-2): 33.37, 2011.
  11. § 5.2.2.
  12. § 5.3.
  13. Laboratory and Scientific Section, United Nations Office on Drugs and Crime, Guidelines on Representative Drug Sampling 3 (2009).
  14. Cf. James M. Curran, An Introduction to Bayesian Credible Intervals for Sampling Error in DNA Profiles, Law, Probability and Risk, 4, 115−126, 2011, doi:10.1093/lpr/mgi009
  15. Of course, no process is perfect, but early statistical review can make technical problems more apparent. Cf. Sam Kean, Whistleblower Lawsuit Puts Spotlight On FDA Technical Reviews, Science, Feb. 2, 2012.

No comments:

Post a Comment