Patent attributes
Devices and techniques are generally described for weakly-supervised object segmentation in image data. In various examples, a first frame of image data may be received. The first frame may include a first bounding box surrounding a first set of pixels, wherein first subset of pixels of the first set of pixels represent a first object of a first class and wherein second subset of pixels of the first set of pixels represent background image data. Cross-entropy loss may be determined for the first set of pixels. In some examples, a spatial attention map may be determined for the first set of pixels. In further examples, parameters of a convolutional neural network may be determined by modulating the cross-entropy loss for the first set of pixels using the spatial attention map. The convolutional neural network may be used to generate a segmentation map.