Face Detection
The model analyzed in this card detects one or more faces within an image or a video frame, and returns a box around each face along with the location of the faces' major landmarks. The model's goal is exclusively to identify the existence and location of faces in an image. It does not attempt to discover identities or demographics.
On this page, you can learn more about how well the model performs on images with different characteristics, including face demographics, and what kinds of images you should expect the model to perform well or poorly on.
Model description
data:image/s3,"s3://crabby-images/eb3ea/eb3eaa7db3c95c756c44c43787997573b929654c" alt="Image of a woman's face with points on it."
Input: Photo(s) or video(s)
Output: For each face detected in a photo or video, the model outputs:
- Bounding box coordinates
- Facial landmarks (up to 34 per face)
- Facial orientation (roll, pan, and tilt angles)
- Detection and landmarking confidence scores.
No identity or demographic information is detected.
Model architecture: MobileNet CNN fine-tuned for face detection with a single shot multibox detector.
View public API documentationPerformance
Overall model performance, and performance sliced by different image and face characteristics, were assessed, including:
- Derived characteristics (face size, facial orientation, and occlusion)
- Face demographics (human-perceived gender presentation, age, and skin tone)
Overall performance measured with Precision-Recall (PR) values and Area Under the PR Curve (PR-AUC) - standard metrics for evaluating computer vision classifiers. Download raw performance results data here.
Disaggregated performance measured with Recall, which captures how often the model misses faces with specific characteristics. Equal recall across subgroups corresponds to the “Equality of Opportunity”fairness criterion.
Performance evaluated on: Three research benchmarks distinct from the training set:
See Performance section for details.
Go to performanceLimitations
The following factors may degrade the model’s performance.
data:image/s3,"s3://crabby-images/8041e/8041e835edfefc3bd132ac29daae113c4ca8dd82" alt="Image of a crowd with highlighted face."
Face Size:
Depending on image resolution, faces that are distant from the camera (a pupillary distance of < 10px) might not be detected. Not designed for estimating the size of a crowd.
Faces greater than 90% of image height or width might not be detected.
data:image/s3,"s3://crabby-images/20d6f/20d6f852b1f73b5e19a8d3105a9f0a0dd40cef59" alt="Image of a face with lines to indicate the face is rotated."
Facial Orientation:
Needs visible facial landmarks such as eyes, noses, and mouths to work correctly. Faces that are looking away from the camera (pan > 90°, roll > 45°, or tilt > 45°) might not be detected.
data:image/s3,"s3://crabby-images/ea913/ea913fa92df926580cf4077e561fe7cf65631951" alt="Image of a spotlight shining in the dark."
Lighting:
Poorly illuminated faces might not be detected.
data:image/s3,"s3://crabby-images/5b374/5b374cec1fb402b9cfd7099e96631c06466c4af8" alt="Image of a person holding a smart phone in front of her face."
Occlusion:
Partially hidden or obstructed faces might not be detected.
data:image/s3,"s3://crabby-images/598ce/598cef52b91ceddcdb9063d25e3c92033f67ebe8" alt="Image of a blurry face."
Blur:
Blurry faces might not be detected.
Trade-offs
Sometimes models exhibit performance issues under particular circumstances. In this section we'll discuss situations in which you might discover that the model performs less than optimally, and should plan accordingly.
Image Resolution vs. Latency:
Latency increases proportionally with image pixel count. Processing time for a 400x400 image will be ~4x that of a 200x200 image.
Performance
Here you can dig into the model's performance on a selection of evaluation datasets drawn from different data sources than the training data. You can assess model performance across variables such as face size and facial orientation, as well as human-perceived skin tone, gender presentation, and age. Annotations for demographic variables were made by humans and used purely for testing; the model cannot detect them.
Summary
- Area under the P-R curve (PR-AUC) is 0.84 (Open Images subset), 0.92 (Face Detection Dataset and Benchmark), and 0.94 (Labeled Faces in the Wild).
- Face size, facial orientation, and degree of occlusion all have a significant impact on model performance, with the model performing least well on faces that appear large (>25% of the image area), are looking to the left or right, and/or obstructed in some way.
- Disparities in recall are relatively small (< 3% gap) for all human-annotated demographic variables evaluated (perceived skin tone, gender presentation, age).
- You can further explore model performance by following steps 1-3 listed below, or download the raw data underlying all performance charts here.
P-R Curves
Choose an evaluation dataset.
Select variables for analysis.
View performance results for your selections.
Test your own images
See how the model works on your own image here (we will not keep a copy).
Feedback
We’d love your feedback on the information presented in this card and/or the framework we’re exploring here for model reporting. Please also share any unexpected results. Your feedback will be shared with model and service owners. Click here to provide feedback.