Friday, July 27, 2018

The ACLU’s In-Your-Face Test of Facial Recognition Software

The ACLU has reported that Amazon’s facial recognition “software incorrectly matched 28 members of Congress, identifying them as other people who have been arrested for a crime.” [1] This figure is calculated to impress the very legislators the ACLU is asking to “enact a moratorium on law enforcement use of face recognition.” All these false matches, the organization announced, create “28 more causes for concern.” Inasmuch as there are 535 members of Congress (Senators plus Representatives), the false-match rate is 5%.

Or is it? The ACLU’s webpage states that
To conduct our test, we used the exact same [sic] facial recognition system that Amazon offers to the public, which anyone could use to scan for matches between images of faces. And running the entire test cost us $12.33 — less than a large pizza.

Using Rekognition, we built a face database and search tool using 25,000 publicly available arrest photos. Then we searched that database against public photos of every current member of the House and Senate. We used the default match settings that Amazon sets for Rekognition.
So there were 535 × 25000 = 13,375,000 comparisons. With that denominator, the false-match rate is about 2 per million (0.0002%).

But none of these figures—28, 5%, or 0.0002%— means very much, since the ACLU’s “test” used a low level of similarity to make its matches. The default setting for the classifier is 80%. Police agencies do not use this weak a threshold [2, 3]. Using a low figure like 80% ensures that there will more be false matches among so many comparisons. Amazon recommends that police who use its system raise the threshold to 95%. The ACLU apparently neglected to adjust the level (even though it would have cost less than a large pizza). Or, worse, it tried the system at the higher level and chose not to report an outcome that probably would have had fewer "causes for concern." Either way, public discourse would benefit from more complete testing or reporting.

It also is unfortunate that Amazon and journalists [2, 3] call the threshold for matches a “confidence threshold.” The percentage is not a measure of how confident one can be in the result. It is not the probability of a true match given a classified match. It is not a probability at all. It is a similarity score on a scale of 0 to 1. A similarity score of 0.95 or 95%, does not even mean that the paired images are 95% similar in an intuitively obvious sense.

The software does give a “confidence value,” which sounds like a probability, but the Amazon documentation I have skimmed suggests that this quantity relates to some kind of “confidence” in the conclusion that a face (as opposed to anything else) is within the rectangle of pixels (the “bounding box”). The Developer Guide states that [4]
For each face match, the response provides a bounding box of the face, facial landmarks, pose details (pitch, role, and yaw), quality (brightness and sharpness), and confidence value (indicating the level of confidence that the bounding box contains a face). The response also provides a similarity score, which indicates how closely the faces match.
and [5]
For each face match that was found, the response includes similarity and face metadata, as shown in the following example response [sic]:
    "FaceMatches": [
            "Similarity": 100.0,
            "Face": {
                "BoundingBox": {
                    "Width": 0.6154,
                    "Top": 0.2442,
                    "Left": 0.1765,
                    "Height": 0.4692
                "FaceId": "84de1c86-5059-53f2-a432-34ebb704615d",
                "Confidence": 99.9997,
                "ImageId": "d38ebf91-1a11-58fc-ba42-f978b3f32f60"
            "Similarity": 84.6859,
            "Face": {
                "BoundingBox": {
                    "Width": 0.2044,
                    "Top": 0.2254,
                    "Left": 0.4622,
                    "Height": 0.3119
                "FaceId": "6fc892c7-5739-50da-a0d7-80cc92c0ba54",
                "Confidence": 99.9981,
                "ImageId": "5d913eaf-cf7f-5e09-8c8f-cb1bdea8e6aa"
From a statistical standpoint, the ACLU’s finding is no surprise. Researchers encounter the false discovery problem with big data sets every day. If you make enough comparisons with a highly accurate system, a small fraction will be false alarms. Police are well advised to use facial recognition software in the same manner as automated fingerprint identification systems—not as simple, single-source classifiers, but rather as a screening tool to generate a list of potential sources. And, they can have more confidence in classified matches from comparisons in a small database of images of, say, dangerous fugitives than in a reported hit to one of thousands upon thousands of mug shots.

These observations do not negate the privacy concerns with applying facial recognition software to public surveillance systems. Moreover, I have not discussed the ACLU’s statistics on differences in false-positive rates by race. There are important issues of privacy and equality at stake. In addressing these issues, however, a greater degree of statistical sophistication would be in order.

  1. Jacob Snow, Amazon’s Face Recognition Falsely Matched 28 Members of Congress with Mugshots, July 26, 2018, 8:00 AM,
  2. Natasha Singer, Amazon’s Facial Recognition Wrongly Identifies 28 Lawmakers, A.C.L.U. Says, N.Y. Times, July 26, 2018,
  3. Ryan Suppe. Amazon's Facial Recognition Tool Misidentified 28 Members of Congress in ACLU Test, USA Today, July 26, 2018,
  4. Amazon Rekognition Developer Guide: CompareFaces,
  5. Amazon Rekognition Developer Guide: SearchFaces Operation Response,

No comments:

Post a Comment