Vision system types used in warehousing and distribution environments
There are three types or primary applications of vision systems used in warehousing and distribution environments. They are inspection and mapping, pick-and-place without deep-learning, and pick-and-place with deep-learning. All types of vision systems include three main elements: an input (camera), a processor (computer and program), and an output (robot). All types may use similar cameras and robots. The program is the difference.
Inspection and Mapping
Vision systems for inspection are used in a variety of industrial robot applications, providing outputs of “pass/fail”, “present/not present”, or a measurement value. The result dictates the next step in a process. An example is using a vision system in a manufacturing cell to check for quantity present, color, or other pre-defined attributes (e.g., 3 red, 1 yellow, 2 blue). The results are communicated to an external processing system that takes a prescribed set of pre-determined actions.
Mapping systems are less frequently used but are similar to inspection systems, in that vision maps do not directly translate into machine action. An example is vision-navigation-based mobile robots (e.g., Seegrid). The map is created and stored in a database. The desired routes are pre-calculated. When the robot is driving through the system along pre-programmed paths, the vision system provides the ability to determine the robot X-Y position on a known map. An external routing algorithm provides instructions to the robot (continue forward, turn left, etc.) using the known map and the live camera feed.
Inspection and mapping systems can be very sophisticated, including the routing algorithms that guide the mobile robots, but they do not require deep learning or artificial intelligence.
Pick-and-Place, No Learning
Pick-and-place vision systems are deployed on most robotic cells installed today. A typical application is pick and place in manufacturing environments with limited variables. For example, pick up part A, B, or C from a defined zone and place them in a defined zone. These systems can differentiate between objects and the background based on simple features such as: shape, size, and color. The cameras direct the motion of the robot through closed-loop feedback, enabling the robots to operate very quickly and accurately, within their prescribed parameters.
These systems do not have a “learning loop” that enables the system to be smarter today than the day it was programmed. They are pre-programmed for a fixed set of objects and instructions. While these systems are “smart”, they do not add intelligence or learning over time.
By way of comparison, this would be like owning a self-driving automobile that could only drive on known roads and in weather and traffic conditions that had been pre-programmed. The car could speed up and slow down, change lanes, stop at the lights… but if a new road is built, the car would not be able to drive on it. Would it be awesome technology? For sure. Would it have its limitations? Yes.
Deep Learning (a.k.a. Artificial Intelligence)
The most sophisticated vision systems employ “deep learning”. These systems are often described with sensational terms such as “artificial intelligence”. Complicating things further, many non-learning systems are marketed as if they have intelligent (learning) capability, leading to confusion. Deep-learning systems are a type or subset of “artificial intelligence”.
Deep-learning engineers use a small set of objects as the learning base and teach the computer program (algorithm) to recognize a broad array of objects based on the characteristics of a small sample. For example, if you can recognize a few types of stop signs, you can apply that knowledge to recognize many types of stop signs.
The deep-learning program learns features that are independent of the objects, so that it can generalize over a wide spectrum of objects. For example, through such a program, robots can recognize the edge of an object no matter the exposure of the camera or the lighting conditions.
Deep-learning systems do not rely on a single variable, such as color, because something as simple as an exposure change or lighting would ruin the result. Color may be one of the variables, but additional more abstract variables are used for object recognition in the deep-learning program.
By way of comparison, these deep-learning systems used for robotic picking applications are like driving a Tesla in full autonomous mode. Park anywhere and navigate from A to B, using the best travel route in (most) any weather condition, on all road types.