Patent attributes
A computing system is configured to train an object classifier. Monocular image data and ground-truth data are received for a scene. Geometric context is determined including a three-dimensional camera position relative to a fixed plane. Regions of interest (RoI) and a set of potential occluders are identified within the image data. For each potential occluder, an occlusion zone is projected onto the fixed plane in three-dimensions. A set of occluded RoIs on the fixed plane are generated for each occlusion zone. Each occluded RoI is projected back to the image data in two-dimensions. The classifier is trained by minimizing a loss function generated by inputting information regarding the RoIs and the occluded RoIs into the classifier, and by minimizing location errors of each RoI and each occluded RoI of the set on the fixed plane based on the ground-truth data. The trained classifier is then output for object detection.