Patent attributes
The disclosure relates to technology for object detection in which a vision system receives training datasets including a set of two-dimensional (2D) images of the object from multiple views. A set of 3D models is reconstructed from the set of 2D images based on salient points of the object selected during reconstruction to generate one or more salient 3D models of the object that is an aggregation of the salient points of the object in the set of 3D models. A set of training 2D-3D correspondence data are generated between the set of 2D images in a first training dataset of the training datasets and the salient 3D model of the object generated using the first training dataset. A deep neural network is trained using the set of training 2D-3D correspondence data generated using the first training dataset for object detection and segmentation.