Segmentation first breaks the images into segments or regions, with the segments of the region having text or symbols. The segmented image is separately applied to two different CNN-based models. Each model produces text boxes where potential text might exist. Then, a selective NMS algorithm is applied to the output of each model to produce a final group of text regions. These text regions are analyzed and actions are taken.