Advanced Search
Computer Vision

Computer Vision

Computing field for recognizing information from images and videos

Computer Vision is a field in computer science which deals with building applications to extract relevant information from visual imagery by training with known models. Computer vision deals with camera imaging geometry, image formation, depth perception, feature detection, matching, motion estimation and tracking, classification.

Computer vision is a multidisciplinary field that could be called a subfield of artificial intelligence and machine learning. Because of this, computer vision borrows and reuses techniques from a range of disparate engineering and computer science field. Computer vision methods and systems are highly application dependent. Some systems can stand-alone to solve specific measurement or detection problems. Others are sub-systems of a large design which work in the larger system for control of mechanical actuators, planning, information database, and man-machine interfaces. The specific implementation of a computer vision subsystem will also depend on its intended functionality.

Early experiments into computer vision started in the 1950s. By the 1970s, computer vision was put to use commercially to distinguish between typed and handwritten text. While the applications for computer vision have grown.

Applications of computer vision


3D modeling

Computer vision can be used for 3D modeling of objects or environments, including medical image analysis or topographical analysis.

Automotive safety

The use of computer vision is being used for automotive safety systems, including detecting driver drowsiness, or stopping possible collisions.

Autonomous vehicles

Autonomous vehicles, including cars, submersibles, drones, robots, trucks, land-based vehicles, and unmanned vehicles, can use computer vision to create fully autonomous vehicles. In these systems, computer vision can be used for navigation, environment mapping, obstacle warning, and the detection of task specific events.

Facial recognition

Can be used to recognize faces for security systems, or for surveillance and tracking purposes, or for biometric systems based on facial recognition.

Gaming and controls

Computer vision systems can be used for virtual reality and game control systems to allow people playing games to do so in a more immersive way and without hand-based control.

Machine vision

Computer vision systems in manufacturing processes, often called machine vision, can be used to automatically inspect products for quality control; this can also be used for robotic systems on automated manufacturing lines; and machine vision is used in agricultural processes to remove undesirable food stuff.


In medicine, computer vision can be used for image processing to pull data and help diagnose patients. This includes the detection of tumors, arteriosclerosis, organ measurements, blood flow; and computer vision can support medical research by providing new information through the enhancement of images to reduce the influence of noise.


Military applications of computer vision include the detection of enemy soldiers or vehicles. They use computer vision for more advanced missile guidance systems which use locally acquired image data for target selection. And computer vision systems are being tested to process large amounts of data and reduce battlefield complexity.

Motion capture

Used for visual effects creation for cinema, broadcast, or video games, computer vision can aid motion capture systems and help camera tracking.

Optical character recognition

Optical character recognition uses computer vision to recognize whether text is handwritten or typed. This can also be used for recognizing types of character based information to index databases of text, images, and image sequences.


Computer vision systems can be used to automated retail checkout systems by identifying the shopper and the objects in the cart as they leave the store and automatically charging them; these systems are being piloted in Amazon Go stores.


Computer vision systems can be used in surveillance for the detection of people, objects, or out-of-place objects in a given space.

Tactile feedback

Small sensors using computer vision have been used in tactile feedback systems for robotic hand systems to understand when they are gripping or touching specific surfaces.

Computer vision companies

Computer vision recognition

In computer vision, image processing, and machine vision the purpose of determining whether or not an image data contains a specific object, feature, or activity. Computer vision for recognition includes tasks such as image classification or identification, object localization and object detection.

The best algorithms for recognition tasks are based on convolutional neural networks. The algorithms still struggle with objects that are small or thin, such as an ant on a stem of a flower or a person holding a narrow pen. And they have trouble with images which have been distorted with filters.

Specialized recognition tasks


2D code reading

This task works to read 2D codes such as data matrix or QR codes.

Content-based image retrieval

This task asks a computer vision system to find all images in a larger set of images which contain specific content. The content can be specified in different ways, such as in terms of a target image or in terms of high-level search criteria.

Facial recognition

This task works to recognize specific faces in images, and can be used for biometric systems.

Optical character recognition

This task works to identify characters in images of printed or handwritten text, usually to encode text in a format more amenable to editing or indexing.

Pose estimation

This task works to estimate the position or orientation of a specific object relative to the camera. An example would be assisting a robot arm in retrieving objects in an assembly line or picking parts from a bin.

Shape recognition technology

This task work to differentiate different objects; often used to differentiate human beings from objects.

Image classification

In image classification, a computer vision model is tasked with classifying images into distinct categories. This is done by training the computer with examples of each image class to develop learning algorithms to understand image classes and learn about the visual appearance of each class.

The purpose of image classification is for automation of the performance of a task. This can include the labelling of an image through tagging, the location of an object in an image, or as part of a larger system for guiding an autonomous car. Image classification is also used in surveillance systems to detect threats, camera occlusions, and emergency situations. And the technique has been used for facial recognition systems for biometrics, even being used for iris recognition as a biometric technique. And image classification has been used in robotics for automated systems.

Object localization

In object localization, a computer vision system is trained to define objects within images and outputting bounding boxes and labels for individual objects. In object localization, a single dominant object is detected and placed in a bounding box for object classification. The classification becomes those images within the box and those object outside of the boxes. This can be used for detecting all cars within an image for autonomous car models. Or it can be used in surveillance for identifying specified objects in images.

Object detection

In object detection, a computer vision system will process image data for a specific condition. This includes the detection of possible abnormal cells or tissues in medical images or the detection of a vehicle in an automatic road toll system. Detection based on simple and fast computations is sometimes used for finding smaller regions of interesting image data which can be further analyzed by more computationally demanding techniques to produce a correct interpretation.

New systems of image detection models include You Look Only Once (YOLO) model which uses a single neural network trained end to end that takes a photograph as input and predicts bounding boxes and class labels for each bounding box. And the Fast R-CNN model (Fast Region-based convolutional neural network) which has been improved for both speed of training and detection. The Fast R-CNN model is designed to both propose and refine region proposals as part of the training process. These regions are then used to improve the number of region proposals and accelerate the test-time operation of the model.

Object Tracking

Object tracking refers to the use of computer vision systems for following a specific object or multiple objects of interest in a given scene. This has traditionally been used in video and real-world interactions where observations are made following an initial object detection. Object tracking is useful for autonomous vehicle systems to understanding the directionality of objects around the car. Object tracking methods can be divided by the observation model into a generative method and discriminative method.

The generative method uses generative models to describe the characteristics and minimize the reconstruction error to search an object.

The discriminative method can be used to distinguish between the object and the background. This method is also referred to as tracking-by-detection. To achieve tracking-by-detection, the method uses deep learning to recognize the wanted object from candidates. It does through two basic network models: stacked auto encoders (SAE) and convolutional neural networks (CNN).

Object tracking tasks



This task works to determine the 3D rigid motion of the camera from an image sequence.

Optical Flow

This task works to determine, for each point in an image, how that point is moving relative to the image plane, or its apparent motion.


This task works to follow the movements of a set of interest points or objects in the image sequence.


The process of segmentation in computer vision divides images into pixel groupings which can in turn be labelled and classified. Especially in semantic segmentation, the model works to semantically understand the role of each pixel in the image. Such that it works to recognize whether the pixels describe a car, a bike, a person, or a pole, and works to delineate the boundaries of each object. Segmentation, unlike image classification, works to produce dense pixel-wise predictions from models.

Visual representation of segmentation layers.

Most segmentation is done through fully convolutional networks (FCN) which provide architectures for dense predictions without fully connected layers. This allows segmentation maps to be generated for images of any size and are generated faster compared to other approaches. FCN networks also use downsampling and upsampling in the network to reduce inefficiencies at original image resolution. Downsampling layers are known as striped convolution, and an upsampling layer is known as transposed convolution.

Instance segmentation

Instance segmentation segments different instances of classes, such as labelling five cars with five different colors. This works similar to semantic segmentation. In instance segmentation and classification, there is generally an image with a single object as the focus and the task is to say what the image is. A computer vision system will locate the objects in the image with bounding boxes, and works to classify the different objects, identify their boundaries, and understand their relations to each other.

Scene Reconstruction

Given one or more images of a scene or a video, scene reconstruction aims at computing a 3D model of the scene. In the simplest case the model can be a set of 3D points. Other methods produce complete 3D surface models. The use of scene reconstruction or 3D imaging not requiring motion or scanning, and related image processing algorithms, has enabled advances in scene reconstruction. Using scene reconstruction, a digital version of real world objects can be developed.

Image Restoration

Part of computer vision, image restoration is a family of inverse problems for obtaining a high quality image from a corrupted input image. The corruption may occur due to the image-capture process (from signal noise or lens blur), post-processing (from file compression), or photography in non-ideal conditions (such that there is haze or motion blur). The computer vision systems will restore the images by analyzing the image data in terms of the local image structures, such as lines or edges, and control the filtering based on these structures for better image noise removal compared to simpler approaches.

Computer vision products

Computer vision at the edge




Further reading


A Gentle Introduction to Computer Vision

Jason Brownlee


March 18, 2019

A Gentle Introduction to Object Recognition With Deep Learning

Jason Brownlee


May 21, 2019

Everything You Ever Wanted To Know About Computer Vision.

Ilija Mihajlovic


April 25, 2019

The 5 Computer Vision Techniques That Will Change How You See The World

James Le


April 12, 2018

Documentaries, videos and podcasts




Dr. Michael Zeller

San Diego, CA

End-to-end AI solutions and computer vision systems

Michael Maximilian Moss

London, UK

Video event labelling and archiving solutions powered by computer vision.


University of Central Florida
May 27, 2021
/PRNewswire/ -- A new master's degree in computer vision will launch this fall at the University of Central Florida, the first public university in the country...
Jason Cartwright
April 16, 2021
Delivering the best of technology news from Australia and around the globe since 2006.
Jason Cartwright
April 16, 2021
Delivering the best of technology news from Australia and around the globe since 2006.
Jason Cartwright
April 13, 2021
Delivering the best of technology news from Australia and around the globe since 2006.
Jason Cartwright
April 10, 2021
Delivering the best of technology news from Australia and around the globe since 2006.
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
January 12, 2021
/PRNewswire/ -- The Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), the world's first graduate-level, research-based artificial intelligence...
December 29, 2020
, , , , • The second competition, anticipated in spring by an innovative scientific presentation symposium, will be held in autumn in the Leonardo's Aircraft headquarters in Turin , , • The Leonardo Drone Contest - a university competition on artificial intelligence, unique on the international scene - will take place in the 2021 edition as a real game between drones that, in cooperation with ground sensors, will compose the reality around them, in order to perform the mission and maximize the...
J.R. Bookwalter
February 10, 2020
Image metadata editor with an automatic image cropper.
July 12, 2019
WebWire, the platform for intelligent retail automation, announced today that it has been named the winner of VentureBeat's Transform AI Innovation Award for Computer Vision. is an end-to-end automation platform that uses AI and computer vision to extract retail catalog data and analyze human behavior to help global brands like Macy's, thredUp and MercadoLibre improve customer experiences, drive conversions and reduce costs. From personalized styling and recommendations to c...


Golden logo
Text is available under the Creative Commons Attribution-ShareAlike 4.0; additional terms apply. By using this site, you agree to our Terms & Conditions.