Object Detection
The model analyzed in this card detects one or more physical objects within an image, from apparel and animals to tools and vehicles, and returns a box around each object, as well as a label and description for each object.
On this page, you can learn more about how the model performs on different classes of objects, and what kinds of images you should expect the model to perform well or poorly on.
Model Description

Input: Photo(s) or video(s)
Output: The model can detect 550+ different object classes. For each object detected in a photo or video, the model outputs:
- Object bounding box coordinates
- Knowledge graph ID (“MID”)
- Label description
- Confidence score
Model architecture: Single shot detector model with a Resnet 101 backbone and a feature pyramid network feature map.
View public API documentationPerformance
Performance evaluated for specific object classes recognized by the model (e.g. shirt, muffin), and for categories of objects (e.g. apparel, food).
Two performance metrics are reported:
- Average Precision (AP)
- Recall at 60% Precision
Performance evaluated on two datasets distinct from the training set:
- Open Images Validation set, which contains ~40k images and 600 object classes, of which the model can recognize 518.
- An internal Google dataset of ~5,000 images of consumer products, containing 210 object classes, all of which model can recognize.
Limitations
The following factors may degrade the model’s performance.
Object size:
Object size must be at least 1% of the image area to be detected.
“Things” vs “stuff”:
Model was designed to detect discrete objects with clearly discernible shapes (“things”), not a group of overlapping objects or background clutter (“stuff”).
Lighting:
Poor or harsh, high-contrast illumination (e.g. nighttime, back-lit, side-lit) may degrade model performance.
Occlusion or clutter:
Partially obstructed or truncated objects may not be detected. For example, a shirt underneath a jacket, or where less than 25% of an object is visible in the image.
Camera positioning and lens type:
Camera angle and positioning (e.g. oblique angles, long-distance), and lens type (e.g. fisheye) may impact model performance.
Blur or noise:
Blurry objects, rapid movement between frames, or encoding/decoding noise may degrade model performance.
Image resolution:
Minimum image resolution of 300K pixels recommended.
Object type:
Model accuracy varies across different object types (see Performance section).
Performance
Here you can learn more about the model's performance on two evaluation datasets drawn from different sources than the training data. You can assess model performance across 500+ different object classes and two different performance metrics: Average Precision (“AP”) and Recall at 60% Precision (“Recall@60%”).
Summary
- Aggregate performance varies across the two evaluation datasets (mAP of 0.43 and Recall@60% Precision of 0.42 on Open Images Validation set; vs. mAP of 0.34 and Recall@60% Precision of 0.36 on the Google Internal test set).
- Performance varies across object classes. For example, based on the Open Image Validation set, the model exhibited higher performance (>70% AP and Recall@60%) for object categories like cars and animals, and lower performance (<30% AP and Recall@60%) for toys, food, and certain appliances.
- The level of granularity within a category significantly affects performance - a model asked to differentiate several different types of food tends to perform worse according to standard metrics than a model simply asked to identify all food as a single category.
- P-R curves were generated by recording Precision and Recall values for all decision thresholds between 0.2 and 1.0. The curves appear truncated because no P-R values were generated for thresholds below 0.2.
P-R Curves
Choose an evaluation dataset.
Select performance metric.
View performance results for your selections.
Test your own images
See how the model works on your own image here (we will not keep a copy).
Feedback
We’d love your feedback on the information presented in this card and/or the framework we’re exploring here for model reporting. Please also share any unexpected results. Your feedback will be forwarded to model and service owners. Click here to provide feedback.