Categories of Robot Visual Servoing Systems

Overview

Robot visual servoing integrates machine vision and robot control. It is a nonlinear, strongly coupled system involving image processing, robot kinematics and dynamics, and control theory. Improvements in camera hardware cost-effectiveness, computing speed, and related theory have made visual servoing increasingly viable for practical applications, and its technical challenges remain active research topics.

Machine Vision Definition

Machine vision, as defined by the Society of Manufacturing Engineers (SME) machine vision division and the Robotic Industries Association (RIA) automation vision division, is the automatic acquisition and processing of images of real objects using optical devices and non-contact sensors to obtain required information or to control robot motion.

As a bionic system analogous to the human visual system, machine vision broadly covers acquisition and processing of information through optical devices. This includes visible and non-visible spectra, and may involve obtaining internal object information not directly observable by human vision.

Visual Servoing Definition

Visual servoing refers to the complete process from visual signal acquisition and processing to robot control. Early work in the 1960s performed vision and motion in an open-loop manner, where the vision system provided pose information once and then ceased to participate. The term "visual servo" was proposed in 1979 to emphasize the end-to-end integration of visual processing and control, beyond simple extraction of feedback signals from images.

History

Research on robot vision began in the 1960s. Initially the vision system provided target pose once and did not participate continuously. By the 1970s and 1980s, as computing and imaging hardware advanced, visual servoing developed into a distinct interdisciplinary area spanning robotics, control, and image processing. Since the 1980s, both theory and applications have progressed substantially, and visual servoing is frequently the subject of academic research and conferences.

Classification of Visual Servoing Systems

Visual servoing systems can be classified in several ways:

By number of cameras: Monocular, stereo, and multi-camera systems. A monocular system yields only a 2D image and cannot directly obtain depth. Multi-camera systems provide richer information from multiple viewpoints but increase image processing load and make system stability harder to guarantee. Stereo systems are commonly used.
By camera placement: Eye-in-hand and eye-to-hand (or fixed camera). Eye-in-hand systems can achieve precise control in theory but are sensitive to calibration and robot motion errors. Fixed camera systems are less sensitive to robot kinematic errors, but typically provide lower pose estimation accuracy under similar conditions.
By control reference frame: Position-based visual servoing (PBVS) and image-based visual servoing (IBVS). PBVS extracts the target pose relative to the camera and robot after image processing, requiring calibration of camera, target, and robot models; calibration accuracy directly affects control accuracy. PBVS converts desired pose changes into joint angle commands sent to joint controllers.
By control architecture: Systems using closed-loop joint controllers can be divided into dynamic observation–motion systems and direct visual servoing. In the former, joint feedback stabilizes the robot inner loop while image processing computes camera velocity or position increments that are sent to joint controllers. In direct visual servoing, image processing generates control inputs for joint motions directly.

In IBVS, the key problem is establishing the image Jacobian matrix that relates changes in image feature errors to changes in robot pose velocities. Since images are 2D, computing the image Jacobian generally requires estimating target depth, which remains a difficult problem in computer vision. Methods for obtaining the Jacobian include analytic derivation, calibration, online estimation, and learning approaches such as neural networks.

Main Challenges

Image processing methods: both theoretical formulation and real-time computation are major challenges for visual servoing.
Modeling the relationship between image features and robot joint motion after image processing is difficult.
Many current control methods cannot guarantee global stability over large operating ranges, so research on robust control approaches is still necessary.

Research Directions

Future research directions for visual servoing include:

Fast, robust feature extraction in real-world environments. Given the large data volume of images and advances in programmable devices, hardware implementations of common algorithms may accelerate processing and advance this area.
Developing dedicated theory and software platforms tailored to robot vision systems. Purpose-built platforms could reduce development effort and allow hardware acceleration of vision processing to improve system performance.
Applying artificial intelligence methods. Neural networks have been used, but many AI methods remain underutilized. Overreliance on mathematical modeling leads to high computational loads that limit system responsiveness; AI approaches might reduce computation while meeting real-time requirements.
Incorporating active vision techniques. Active vision enables the system to selectively acquire features according to rules, helping solve problems that are otherwise difficult.
Sensor fusion with non-visual sensors. Combining vision with other sensors can provide complementary information, but introduces challenges in information fusion and redundancy management.

Conclusion

Visual servoing has advanced significantly in recent years, and practical robot vision systems are increasingly deployed in China and abroad. Many technical challenges are expected to see progress in ongoing research. Visual servoing will continue to play a prominent role in robotics and is likely to see expanded use in industrial applications.