Face Detection

The model analyzed in this card detects one or more faces within an image or a video frame, and returns a box around each face along with the location of the faces' major landmarks. The model's goal is exclusively to identify the existence and location of faces in an image. It does not attempt to discover identities or demographics.

On this page, you can learn more about how well the model performs on images with different characteristics, including face demographics, and what kinds of images you should expect the model to perform well or poorly on.

Model description

Image of a woman's face with points on it.

Input: Photo(s) or video(s)

Output: For each face detected in a photo or video, the model outputs:

  • Bounding box coordinates
  • Facial landmarks (up to 34 per face)
  • Facial orientation (roll, pan, and tilt angles)
  • Detection and landmarking confidence scores.

No identity or demographic information is detected.

Model architecture: MobileNet CNN fine-tuned for face detection with a single shot multibox detector.

View public API documentation

Performance

loading

Overall model performance, and performance sliced by different image and face characteristics, were assessed, including:

  • Derived characteristics (face size, facial orientation, and occlusion)
  • Face demographics (human-perceived gender presentation, age, and skin tone)

Overall performance measured with Precision-Recall (PR) values and Area Under the PR Curve (PR-AUC) - standard metrics for evaluating computer vision classifiers. Download raw performance results data here.

Disaggregated performance measured with Recall, which captures how often the model misses faces with specific characteristics. Equal recall across subgroups corresponds to the “Equality of Opportunity”fairness criterion.

Performance evaluated on: Three research benchmarks distinct from the training set:

See Performance section for details.

Go to performance

Limitations

The following factors may degrade the model’s performance.

Image of a crowd with highlighted face.

Face Size:

Depending on image resolution, faces that are distant from the camera (a pupillary distance of < 10px) might not be detected. Not designed for estimating the size of a crowd.

Faces greater than 90% of image height or width might not be detected.

Image of a face with lines to indicate the face is rotated.

Facial Orientation:

Needs visible facial landmarks such as eyes, noses, and mouths to work correctly. Faces that are looking away from the camera (pan > 90°, roll > 45°, or tilt > 45°) might not be detected.

Image of a spotlight shining in the dark.

Lighting:

Poorly illuminated faces might not be detected.

Image of a person holding a smart phone in front of her face.

Occlusion:

Partially hidden or obstructed faces might not be detected.

Image of a blurry face.

Blur:

Blurry faces might not be detected.

Image of a busy city street with motion blur.

Motion (in video):

Rapid movement between frames might degrade performance.

Trade-offs

Sometimes models exhibit performance issues under particular circumstances. In this section we'll discuss situations in which you might discover that the model performs less than optimally, and should plan accordingly.

Image Resolution vs. Latency:

Latency increases proportionally with image pixel count. Processing time for a 400x400 image will be ~4x that of a 200x200 image.

Performance

Here you can dig into the model's performance on a selection of evaluation datasets drawn from different data sources than the training data. You can assess model performance across variables such as face size and facial orientation, as well as human-perceived skin tone, gender presentation, and age. Annotations for demographic variables were made by humans and used purely for testing; the model cannot detect them.

Summary

  • Area under the P-R curve (PR-AUC) is 0.84 (Open Images subset), 0.92 (Face Detection Dataset and Benchmark), and 0.94 (Labeled Faces in the Wild).
  • Face size, facial orientation, and degree of occlusion all have a significant impact on model performance, with the model performing least well on faces that appear large (>25% of the image area), are looking to the left or right, and/or obstructed in some way.
  • Disparities in recall are relatively small (< 3% gap) for all human-annotated demographic variables evaluated (perceived skin tone, gender presentation, age).
  • You can further explore model performance by following steps 1-3 listed below, or download the raw data underlying all performance charts here.

P-R Curves

loading
  1. Choose an evaluation dataset.

    • Data Source

      FACES is a face detection evaluation dataset sampled from Open Images Dataset V4 by Google researchers. Learn more about Open Images V4 Dataset.

    • Source of Labels

      • Head bounding boxes, facial landmarks, and demographic annotations provided by paid human annotators.
      • Demographic annotations (perceived age, gender presentation, and skin tone) and occlusion were assigned by human annotators, not by the people portrayed in the images or the model under evaluation.
      • Other facial characteristics, like face size and facial orientation, were derived algorithmically.
    • Data Snapshot

      94,866
      Number of images
      178,325
      Number of faces labelled
      1.9
      Average number of faces per image

    Similar Images

  2. Select variables for analysis.

    About Face Size

    Face size represents how much of an image is taken up by a face. Face size was estimated based on the number of pixels within the human-annotated bounding box.

    Note: Performance results are shown for categories that have more than 400 instances in the evaluation dataset.

    Data Distribution: Face Size

    15413307216353364010313810452ExtremeClose UpClose UpSelfieWaist upSmallGroupCrowd042,98285,963128,945Number of Faces

    Extreme close-up

    Face takes up > 50% of the image area.

    Close Up

    Face takes up 25-50% of the image area.

    Selfie

    Face takes up 10-25% of the image area.

    Waist up

    Face takes up 5-10% of the image area.

    Small group

    Face takes up 1-5% of image area

    Crowd

    Face takes up less than 1% of image area.

  3. View performance results for your selections.

    Recall for Face Size

    0.930.930.940.930.910.93Close UpSmall GroupSelfieWaist upExtreme Close UpCrowd

    Recall Vs Decision Threshold Curve

    Close UpSmall GroupSelfieWaist upExtreme Close UpCrowd
    Decision Threshold 100%Recall 100%0

    Face Size Performance Summary

    Recall varies significantly across face sizes, with better performance on crowd and group photos and worse performance on faces that appear large in an image (extreme close-ups and close-ups).

Test your own images

See how the model works on your own image here (we will not keep a copy).

Feedback

We’d love your feedback on the information presented in this card and/or the framework we’re exploring here for model reporting. Please also share any unexpected results. Your feedback will be shared with model and service owners. Click here to provide feedback.