PC/BC-DIM Network Based Hybrid Saliency Visual Perception Model for Humanoid Robots
Keywords:
Predictive Coding Biased Competition Divisive Input Modulation network (PC/BC-DIM); hybrid visual saliency; bottom-up model; top-down model; Laplacian of Gaussian.Abstract
Research on developing visual attention models and saliency detection for humanoid robots has exploded in recent years. A hybrid visual attention model for human-robot interaction can be created by combining the top-down and bottom-up visual saliency detection methods. Due to their high computational cost and complexity, most hybrid visual saliency models are not computationally viable for real-world deployment on humanoid robots. The primary flaw in most visual attention models is that while they can identify the important object in natural photos with a simple background, they struggle to function properly in images with a chaotic or textured background. When there are several prominent items in an image, most global contrast-based techniques do not yield effective results. The problem with hybrid models based on local and global contrast-based techniques is that they tend to predict background regions as salient regions. This study presents a hybrid stereo saliency model that effectively identifies salient objects in backdrop pictures that are simple, crowded, and textured. The suggested paradigm is ideal for implementation on humanoid robots because of its additional benefits, which include simplicity, robustness, and CPU-based execution. Multiple salient items can be detected using the suggested saliency detection model, which computes saliency maps using a Decisive Input Modulation (DIM) neural network, predictive coding (PC), and biased computation (BC). To reduce the complexity of scene analysis preprocessing has been performed using double opponent colors, intensity, and orientation features in the hybrid saliency model. Laplacian of Gaussian (LOG) filter plays a crucial role in processing features such as intensity and orientation features. The top-down factor enhances the saliency of a salient region. The PC/BC-DIM network computes the saliency of preprocessed images after passing through the network. The stereo visual attention model performs preprocessing and saliency map computation separately for each eye and it used depth information as a cue for stereo saliency detection. At the end, binocular saliency maps are combined using the disparity map calculation technique for the extraction of the stereo saliency map. The mean absolute error (MAE) score for the monocular hybrid saliency model was 0.22 and for stereo saliency model MAE score was 0.375. Both monocular and binocular models are computationally efficient and cost-effective for implementation on humanoid robots.