Patent attributes
An image processing system identifies objects within images or video segments. To identify an object within an image, the system identifies one or more regions of an image that contain an object. In some examples, a tracklet is used to track an object though a plurality of image frames within a video segment allowing more than one image frame to be used in object detection, and thereby increasing detection accuracy. Various embodiments utilize a deep learning based object detection framework and similar object search framework that models the correlations present between various object categories. The system determines a category for each object detected using a hierarchical tree of categories to learn the visual similarities between various object categories. The hierarchical tree is estimated by analyzing the errors of an object detector which does not use any correlation between the object categories.